WEARABLE SENSOR TECHNOLOGY FOR MONITORING TRAINING LOAD AND HEALTH IN THE ATHLETIC POPULATION

EDITED BY : Billy Sperlich, Hans-Christer Holmberg and Kamiar Aminian PUBLISHED IN : Frontiers in Physiology

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88963-462-0 DOI 10.3389/978-2-88963-462-0

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# WEARABLE SENSOR TECHNOLOGY FOR MONITORING TRAINING LOAD AND HEALTH IN THE ATHLETIC POPULATION

Topic Editors:

Billy Sperlich, Julius Maximilian University of Würzburg, Germany Hans-Christer Holmberg, Mid Sweden University, Sweden Kamiar Aminian, École Polytechnique Fédérale de Lausanne, Switzerland

Several internal and external factors have been identified to estimate and control the psycho-biological stress of training in order to optimize training responses and to avoid fatigue, overtraining and other undesirable health effects of an athlete.

An increasing number of lightweight sensor-based wearable technologies ("wearables") have entered the sports technology market. Non-invasive sensor-based wearable technologies could transmit physical, physiological and biological data to computing platform and may provide through human-machine interaction (smart watch, smartphone, tablet) bio-feedback of various parameters for training load management and health.

However, in theory, several wearable technologies may assist to control training load but the assessment of accuracy, reliability, validity, usability and practical relevance of new upcoming technologies for the management of training load is paramount for optimal adaptation and health.

Citation: Sperlich, B., Holmberg, H.-C., Aminian, K., eds. (2020). Wearable Sensor Technology for Monitoring Training Load and Health in the Athletic Population. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88963-462-0

# Table of Contents

*06 Editorial: Wearable Sensor Technology for Monitoring Training Load and Health in the Athletic Population*

Billy Sperlich, Kamiar Aminian, Peter Düking and Hans-Christer Holmberg


Alice J. Sweeting, Stuart J. Cormack, Stuart Morgan and Robert J. Aughey

*28 The Use of Body Worn Sensors for Detecting the Vibrations Acting on the Lower Back in Alpine Ski Racing*

Jörg Spörri, Josef Kröll, Benedikt Fasel, Kamiar Aminian and Erich Müller

*37 Criterion-Validity of Commercially Available Physical Activity Tracker to Estimate Step Count, Covered Distance and Energy Expenditure During Sports Conditions*

Yvonne Wahl, Peter Düking, Anna Droszez, Patrick Wahl and Joachim Mester

*49 Weak Relationships Between Stint Duration, Physical and Skilled Match Performance in Australian Football*

David M. Corbett, Alice J. Sweeting and Sam Robertson


Andrea Nicolò, Carlo Massaroni and Louis Passfield

*83 The Case for Adopting a Multivariate Approach to Optimize Training Load Quantification in Team Sports*

Dan Weaving, Ben Jones, Kevin Till, Grant Abt and Clive Beggs

*86 Discovery of a Sweet Spot on the Foot With a Smart Wearable Soccer Boot Sensor That Maximizes the Chances of Scoring a Curved Kick in Soccer*

Franz Konstantin Fuss, Peter Düking and Yehuda Weizman


Peter Düking, Hans-Christer Holmberg and Billy Sperlich

*120 The Impact of Web-Based Feedback on Physical Activity and Cardiovascular Health of Nurses Working in a Cardiovascular Setting: A Randomized Trial* Jennifer L. Reed, Christie A. Cole, Madeleine C. Ziss, Heather E. Tulloch, Jennifer Brunet, Heather Sherrard, Robert D. Reid and Andrew L. Pipe *130 Application of dGNSS in Alpine Ski Racing: Basis for Evaluating Physical Demands and Safety* Matthias Gilgien, Josef Kröll, Jörg Spörri, Philip Crivelli and Erich Müller *141 Whole-Body Vibrations Associated With Alpine Skiing: A Risk Factor for Low Back Pain?* Matej Supej, Jan Ogrin and Hans-Christer Holmberg *150 Validity and Reliability of 10-Hz Global Positioning System to Assess In-line Movement and Change of Direction* Pantelis T. Nikolaidis, Filipe M. Clemente, Cornelis M. I. van der Linden, Thomas Rosemann and Beat Knechtle *157 Estimation of Vertical Ground Reaction Forces and Sagittal Knee Kinematics During Running Using Three Inertial Sensors* Frank J. Wouda, Matteo Giuberti, Giovanni Bellusci, Erik Maartens, Jasper Reenalda, Bert-Jan F. van Beijnum and Peter H. Veltink *171 Intra-session and Inter-day Reliability of the Myon 320 Electromyography System During Sub-maximal Contractions* Graeme G. Sorbie, Michael J. Williams, David W. Boyle, Alexander Gray, James Brouner, Neil Gibson, Julien S. Baker, Chris Easton and Ukadike C. Ugbolue *178 Validity of the Catapult ClearSky T6 Local Positioning System for Team Sports Specific Drills, in Indoor Conditions* Live S. Luteberget, Matt Spencer and Matthias Gilgien *188 Muscle Performance Investigated With a Novel Smart Compression Garment Based on Pressure Sensor Force Myography and its Validation Against EMG* Aaron Belbasis and Franz Konstantin Fuss *201 Heart Rate Monitoring in Team Sports—A Conceptual Framework for Contextualizing Heart Rate Measures for Training and Recovery Prescription* Christoph Schneider, Florian Hanakam, Thimo Wiewelhove, Alexander Döweling, Michael Kellmann, Tim Meyer, Mark Pfeiffer and Alexander Ferrauti *220 Accurate Estimation of Running Temporal Parameters Using Foot-Worn Inertial Sensors* Mathieu Falbriard, Frédéric Meyer, Benoit Mariani, Grégoire P. Millet and Kamiar Aminian *230 Measurement, Prediction, and Control of Individual Heart Rate Responses to Exercise—Basics and Options for Wearable Devices* Melanie Ludwig, Katrin Hoffmann, Stefan Endler, Alexander Asteroth and

Josef Wiemeyer *245 A Critical Review of Consumer Wearables, Mobile Applications, and Equipment for Providing Biofeedback, Monitoring Stress, and Sleep in Physically Active Populations*

Jonathan M. Peake, Graham Kerr and John P. Sullivan

#### *264 Quantified Soccer Using Positional Data: A Case Study*

Svein A. Pettersen, Håvard D. Johansen, Ivan A. M. Baptista, Pål Halvorsen and Dag Johansen

*270 Exercise Intensity During Cross-Country Skiing Described by Oxygen Demands in Flat and Uphill Terrain*

Øyvind Karlsson, Matthias Gilgien, Øyvind N. Gløersen, Bjarne Rud and Thomas Losnegard

*282 Dynamics of Recovery of Physiological Parameters After a Small-Sided Game in Women Soccer Players* Rafaela B. Mascarin, Vitor L. De Andrade, Ricardo A. Barbieri, João P. Loures,

Carlos A. Kalva-Filho and Marcelo Papoti

*292 Tracking Performance in Endurance Racing Sports: Evaluation of the Accuracy Offered by Three Commercial GNSS Receivers Aimed at the Sports Market*

Øyvind Gløersen, Jan Kocbach and Matthias Gilgien

# Editorial: Wearable Sensor Technology for Monitoring Training Load and Health in the Athletic Population

#### Billy Sperlich<sup>1</sup> \*, Kamiar Aminian<sup>2</sup> , Peter Düking<sup>1</sup> and Hans-Christer Holmberg3,4,5

1 Integrative and Experimental Exercise Science & Training, University of Würzburg, Würzburg, Germany, <sup>2</sup> Laboratory of Movement Analysis and Measurement, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, <sup>3</sup> Department of Health Sciences, Mid Sweden University, Östersund, Sweden, <sup>4</sup> School of Kinesiology, University of British Columbia, Vancouver, BC, Canada, <sup>5</sup> Department of Physiology and Pharmacology, Biomedicum C5, Karolinska Institutet, Stockholm, Sweden

Keywords: wearables, data analysis, personalized medicine, monitoring, sensor, biofeedback, innovation, digital health

#### **Editorial on the Research Topic**

#### Edited by:

Matt Brughelli, Auckland University of Technology, New Zealand

#### Reviewed by:

Grant Abt, University of Hull, United Kingdom Monoem Haddad, Qatar University, Qatar Pantelis Theodoros Nikolaidis, University of West Attica, Greece

#### \*Correspondence:

Billy Sperlich billy.sperlich@uni-wuerzburg.de

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 23 May 2019 Accepted: 03 December 2019 Published: 08 January 2020

#### Citation:

Sperlich B, Aminian K, Düking P and Holmberg H-C (2020) Editorial: Wearable Sensor Technology for Monitoring Training Load and Health in the Athletic Population. Front. Physiol. 10:1520. doi: 10.3389/fphys.2019.01520

#### **Wearable Sensor Technology for Monitoring Training Load and Health in the Athletic Population**

Various measures of the internal and external loads on athletes, as well as parameters related to their health are now being provided to a greater and greater extent by wearable sensors (wearables) (Düking et al., 2018a,b,c). These devices, including sensors and software embedded in e.g., textiles, watches and patches located on or in proximity to the body, collect, transmit, and analyse a range of physiological and biomechanical data designed to improve performance, recovery, and/or other aspects of health (Düking et al., 2018a). However, it is still unclear to what extent wearables are actually useful for monitoring load in connection with different sports and settings.

In 2017, we launched a special coverage of the Research Topic "Wearable Sensor Technology for Monitoring Training Load and Health in the Athletic Population" in Frontiers in Physiology with the following aims:


One hundred thirteen authors have now published 28 articles in Frontiers in Physiology on this Research Topic, including 18 original articles based on field and laboratory data, four (mini) reviews, three opinion papers, one perspective and one technology report. **Table 1** summarizes the main features of all of these studies. With more than 148,000 views (as of November 2019), this Research Topic is among those published in the Physiology section of Frontiers in Physiology that have received most interest. To achieve the aims described above, we have grouped these articles in the table on the basis of the specific sport involved or evaluation of new technologies without consideration of any specific population, describing only those articles we consider to be of primary importance in the field.


(Continued) Editorial: Wearables for

Athlete Monitoring

#### TABLE 1 | Continued


(Continued)

Editorial: Wearables for

Athlete Monitoring


TABLE 1 | Continued

(Continued) Editorial: Wearables for

Athlete Monitoring

Sperlich et al.

Sperlich et al.


(Continued)

Editorial: Wearables for

Athlete Monitoring

Sperlich et al.

#### TABLE 1 | Continued


n.i., not indicated.

## EVALUATION OF THE QUALITY OF NEW TECHNOLOGY

Peake et al. critically reviewed consumer-grade wearables, mobile applications, and equipment designed to provide biofeedback to physically active individuals. While acknowledging that wearable technology has much to offer, these investigators concluded that only 5% of the technologies they reviewed have been formally validated and that manufacturers should invest in studies on the effectiveness of their products.

Wahl et al. showed that under different sporting conditions, the majority of 11 wrist-worn wearables demonstrated acceptable validity with respect to counting steps, whereas the distance covered and energy expenditure could not be assessed validly.

Reviewing the relevant literature, Koehler and Drenowatz concluded that while the SenseWear armband can estimate energy expenditure validly in the general population, it tends to underestimate this parameter during high-intensity exercise (>10 METs).

### WEARABLES IN CONNECTION WITH WINTER SPORTS

Wearables are often utilized to assess parameters associated with different skiing disciplines. Employing a global navigation satellite system, Karlsson et al. found that cross-country skiers repeatedly perform at intensities that exceed their maximal aerobic power, with more pronounced oxygen deficits during uphill skiing than on flat terrain.

Gilgien et al. applied a differential global navigation satellite system (dGNSS) to evaluate the physical demands and safety associated with different skiing disciplines. The physical demands made by giant slalom, super-G and downhill skiing differ substantially. Furthermore, these researchers concluded that to increase safety, skiing speed can best be reduced by enhancing the friction between the skis and snow and in the case of giant slalom and super-G, whereas for downhill skiing an elevation in air drag force might be equally effective.

Using five accelerometers and a global navigation satellite system, Supej et al. found that low-frequency whole-body vibrations during alpine skiing enhance the risk for pain in the lower back, particularly in combination with large ground reaction forces. They concluded that the number of runs involving such vibrations (e.g., during side-skidding) should be reduced, especially in the case of younger skiers.

Spörri et al. evaluated vibrations acting on different body segments during giant slalom and slalom skiing with 6 wearable inertial measurement units. Power distribution over frequency (PSD) was largest with frequencies of <30 Hz in the case of the shank, with vibrations being attenuated by the knee and hip joints. PSD values were pronounced at frequencies between 4 and 10 Hz, increasing the risk of overuse back injuries in alpine skiers.

Applying 11 inertial measurement units, Fasel et al. could assess the kinematics of the relative center of mass and positions of joint centers of alpine skiers with sufficient accuracy and precision, while the ankle joints were only just within the acceptable range of accuracy and precision.

### WEARABLES IN CONNECTION WITH TEAM SPORTS

In their original article, Fuss et al. employed a pressure-sensitive sensor matrix incorporated into a soccer shoe to identify a "sweet spot" on the foot that maximizes the chances of hitting the goal with a direct curved free kick of 58–86◦ . This sensor may allow soccer players to analyse their foot-to-ball impact and improve their technique.

In connection with team sports, tracking technologies, such as global positioning (GPS), local positioning (LPS), and visionbased (VBS) systems, allow activity profiles to be monitored. Analysis of these profiles may be influenced by the relative amount of time spent in different velocity or acceleration zones and Sweeting et al. emphasize in their review article that there is presently no generally accepted definition of a sprint or acceleration, not even within a given team sport, which complicates comparison of different studies.

With respect to training load, Weaving et al. argue that no single parameter is likely to capture the complexity of this parameter and, moreover, practitioners can be overwhelmed by the amount of data they receive. A multivariate approach employing selected orthogonal composite variables may be helpful in providing sufficient data without "flooding."

For quantifying aspects of external loading in connection with indoor team sports, Roell et al. found a wearable inertial unit designed to measure average and peak acceleration to be acceptably valid in all three orthogonal axes.

In their case study, Pettersen et al. demonstrate that wearable radio-based positioning systems can provide insights into the performance of individual soccer players and their teams.

### WEARABLES IN CONNECTION WITH RUNNING AND CYCLING

Belbasis and Fuss found that a pressure-sensitive sensor located inside compression garments provided data on the activity on five thigh muscles during cycling comparable to that obtained by electromyography (EMG). Arguably, this smart compression garment monitors mechanical muscle activity (i.e., the pressure exerted by the contracting muscle on the sensor), whereas EMG measures neural activity and may therefore be more suitable for biomechanical modeling.

In the case of runners, Falbriard et al. showed that temporal parameters, involving ground contact, flight, step, and swing times can be estimated accurately, but that the results obtained are dependent on the speed.

Wouda et al. found that estimation of the peak vertical ground reaction force, as well as maximal knee flexion-extension angles during stance in runners by three inertial measurement systems in combination with artificial neural networks did not differ significantly from the reference values.

### CONCLUDING REMARKS

The 28 articles on this Research Topic have clearly improved our knowledge concerning the use of wearables for monitoring training load and health in athletes involved in a different sports. Novel technologies have been introduced and technologies already existing evaluated. New approaches to monitoring and analyzing (training) load in connection with different sports have been described. Nonetheless, much remains to be determined concerning the usage of wearables by athletic populations.

While some findings involve physiological parameters, e.g., those of Nicolò et al., most of the wearable technology investigated provides biomechanical data. Therefore, we encourage future studies on physiological parameters in this area of research. Since future monitoring frameworks (Düking et al., 2018a) may provide instant feedback concerning internal load to coaches and athletes, such research is certainly warranted. Appropriate combination of physiological and psychological data with biomechanical data will be a future challenge in connection with providing relevant and seamless feedback to the athlete.

### REFERENCES


Currently, on the basis of the articles included here, it remains unclear whether monitoring with wearables is actually beneficial for controlling the load and improving the health of athletes. To date, no publication has addressed these questions directly.

Future advancements in smart technology will involve devices designed to share and interact with their users, as well as with other smart devices. Wearables should, however, be convenient and usable without hindering the athlete with cumbersome sensors. Optimal **i**ntegration of sensors into equipment (e.g., ski boots, garments) will require the involvement of manufacturers of sporting equipment.

Today, 3 years after we launched "Wearable Sensor Technology for Monitoring Training Load and Health in Athletes" as a Research Topic, interest remains quite high, as indicated, among other things, by global fitness trends (Thompson, 2019). We look forward to the novel insights arising from future research in this growing field.

## AUTHOR CONTRIBUTIONS

BS, PD, KA, and H-CH wrote and edited the manuscript.

Thompson, W. R. (2019). Worldwide survey of fitness trends for 2020. ACSM's Health Fitness J. 23, 10–18. doi: 10.1249/FIT.0000000000 000526

**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Sperlich, Aminian, Düking and Holmberg. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Instant Biofeedback Provided by Wearable Sensor Technology Can Help to Optimize Exercise and Prevent Injury and Overuse

Peter Düking<sup>1</sup> \*, Hans-Christer Holmberg2, 3, 4 and Billy Sperlich<sup>1</sup>

1 Integrative and Experimental Exercise Science, Institute for Sport Sciences, Julius-Maximilians University, Würzburg, Germany, <sup>2</sup> Swedish Winter Sports Research Centre, Mid Sweden University, Östersund, Sweden, <sup>3</sup> Department of Physiology and Pharmacology, Karolinska Institute, Stockholm, Sweden, <sup>4</sup> School of Sport Sciences, UiT Arctic University of Norway, Tromsø, Norway

Keywords: performance monitoring, health monitoring, sports technology, coaching, training optimization

#### Edited by:

Luca Paolo Ardigò, University of Verona, Italy

#### Reviewed by:

Leonardo Alexandre Peyré-Tartaruga, Universidade Federal do Rio Grande do Sul, Brazil

> \*Correspondence: Peter Düking peterdueking@gmx.de

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 24 January 2017 Accepted: 07 March 2017 Published: 03 April 2017

#### Citation:

Düking P, Holmberg H-C and Sperlich B (2017) Instant Biofeedback Provided by Wearable Sensor Technology Can Help to Optimize Exercise and Prevent Injury and Overuse. Front. Physiol. 8:167. doi: 10.3389/fphys.2017.00167 With great interest, we have been following the developing variety and popularity of commercially available wearable sensor technologies, as well as the discussion concerning their usefulness for improving fitness and health (Duking et al., 2016; Halson et al., 2016; Sperlich and Holmberg, 2016). Although many of these devices may not necessarily fulfill scientific criteria for quality (Sperlich and Holmberg, 2016) or may pose a threat to the security of personal data (Austen, 2015), we would like to emphasize here that many individuals who seek to improve their health or physical performance do so on their own, without the guidance of professionals to design their fitness training. Although professional guidance is, of course, important, such individuals and, especially beginners, would find instantaneous (bio)feedback beneficial for optimal adaptation and prevention of overuse or injury. We believe wearable sensor technologies, in conjunction with appropriate (mobile) applications, data mining and machine learning algorithms, can provide biofeedback that is useful in many ways.

In this context, biofeedback is considered to be individual data related to the body (e.g., heart rate and motion, including acceleration of body segments and much more). Such biofeedback, provided either haptically, audibly and/or visually, can augment or even replace a sensory organ, allowing the individual to react appropriately (Fuss, 2014). For example, visual biofeedback provided by wearable sensors can help modulate gait in a manner that reduces loading of the legs while running, thereby lowering the risk for stress fracture of the tibia (Crowell and Davis, 2011).

Current and ongoing improvements in wearable sensor technologies and their applications provide vibrotactical biofeedback (Afzal et al., 2016) and/or auditory signals through so-called "(h)earables" or other types of receivers. Visual biofeedback may be given by smartwatches and/or – phones and in the near future by smart glasses or contact lenses (Hosseini et al., 2014). We believe that such easily accessible biofeedback from wearable sensors that are (i) unobtrusive and do no harm, (ii) reliable and valid, and (iii) provide relevant information can help individuals make their training more effective.

Clearly, objective biofeedback provided by wearable sensors can reveal aspects of an individual's health and training, which simply cannot be otherwise accessed. Examples include neuromuscular fatigue and forces acting upon the cruciate ligaments (Belbasis et al., 2015), certain aspects of a soccer player's kicking technique (Weizman and Fuss, 2015), metabolites and electrolytes in sweat (Anastasova et al., 2017), and hydration status and shifts of fluid in the body (Villa et al., 2016). In addition, many other types of monitoring are presently under development.

To summarize, we believe that the provision of haptic, audible and/or visual biofeedback by high-quality wearable sensors in connection with data mining and machine learning algorithms will assist athletes, especially beginners, in optimizing their training and health by helping to prevent overuse and injury.

### REFERENCES


## AUTHOR CONTRIBUTIONS

All authors listed have made substantial, direct and intellectual contribution to this work and approved it for publication.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Düking, Holmberg and Sperlich. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# When Is a Sprint a Sprint? A Review of the Analysis of Team-Sport Athlete Activity Profile

Alice J. Sweeting1, 2, 3, Stuart J. Cormack <sup>4</sup> , Stuart Morgan<sup>5</sup> and Robert J. Aughey <sup>1</sup> \*

1 Institute of Sport, Exercise and Active Living (ISEAL), Victoria University, Footscray, VIC, Australia, <sup>2</sup> Netball Australia, Fitzroy, VIC, Australia, <sup>3</sup> Performance Research, Australian Institute of Sport, Bruce, ACT, Australia, <sup>4</sup> School of Exercise Science, Australian Catholic University, Fitzroy, VIC, Australia, <sup>5</sup> Department of Rehabilitation, Nutrition and Sport, School of Allied Health, La Trobe University, Bundoora, VIC, Australia

The external load of a team-sport athlete can be measured by tracking technologies, including global positioning systems (GPS), local positioning systems (LPS), and vision-based systems. These technologies allow for the calculation of displacement, velocity and acceleration during a match or training session. The accurate quantification of these variables is critical so that meaningful changes in team-sport athlete external load can be detected. High-velocity running, including sprinting, may be important for specific team-sport match activities, including evading an opponent or creating a shot on goal. Maximal accelerations are energetically demanding and frequently occur from a low velocity during team-sport matches. Despite extensive research, conjecture exists regarding the thresholds by which to classify the high velocity and acceleration activity of a team-sport athlete. There is currently no consensus on the definition of a sprint or acceleration effort, even within a single sport. The aim of this narrative review was to examine the varying velocity and acceleration thresholds reported in athlete activity profiling. The purposes of this review were therefore to (1) identify the various thresholds used to classify high-velocity or -intensity running plus accelerations; (2) examine the impact of individualized thresholds on reported team-sport activity profile; (3) evaluate the use of thresholds for court-based team-sports and; (4) discuss potential areas for future research. The presentation of velocity thresholds as a single value, with equivocal qualitative descriptors, is confusing when data lies between two thresholds. In Australian football, sprint efforts have been defined as activity >4.00 or >4.17 m·s −1 . Acceleration thresholds differ across the literature, with >1.11, 2.78, 3.00, and 4.00 m·s <sup>−</sup><sup>2</sup> utilized across a number of sports. It is difficult to compare literature on field-based sports due to inconsistencies in velocity and acceleration thresholds, even within a single sport. Velocity and acceleration thresholds have been determined from physical capacity tests. Limited research exists on the classification of velocity and acceleration data by female team-sport athletes. Alternatively, data mining techniques may be used to report team-sport athlete external load, without the requirement of arbitrary or physiologically defined thresholds.

#### Keywords: velocity thresholds, acceleration, data mining, player tracking, match analysis

#### Edited by:

Billy Sperlich, University of Würzburg, Germany

#### Reviewed by:

Pascal Edouard, University Hospital of Saint-Etienne, France Beat Knechtle, University of Zurich, Switzerland

> \*Correspondence: Robert J. Aughey robert.aughey@vu.edu.au

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 23 January 2017 Accepted: 06 June 2017 Published: 20 June 2017

#### Citation:

Sweeting AJ, Cormack SJ, Morgan S and Aughey RJ (2017) When Is a Sprint a Sprint? A Review of the Analysis of Team-Sport Athlete Activity Profile. Front. Physiol. 8:432. doi: 10.3389/fphys.2017.00432

**16**

## INTRODUCTION

The quantification of athlete external load is of interest to scientists and practitioners, for the planning and monitoring of training or competition. Team-sport athlete external load can be quantified using accelerometers, global positioning systems (GPS), local positioning systems (LPS), and optical tracking systems. Except for accelerometers, these systems calculate displacement, velocity and acceleration over time. The analysis of external load over a match or training session is termed activity profile (Aughey, 2011a). Information from the activity profile is used to monitor change across a competitive season or tournament (Bradley et al., 2009; Jennings, D. et al., 2012) and allow for the design of specific training drills (Boyd et al., 2013).

The activity profile of field-based team-sport athletes is welldocumented (Aughey, 2011a; Mooney et al., 2011; Jennings, D. H. et al., 2012; Bradley et al., 2013). Activity profile analysis typically includes time spent in velocity or acceleration zones. These zones are defined according to threshold values and determined arbitarily, by the proprietary software of tracking systems or expressed relative to a physiological test. Currently, there is no consensus on how to determine a velocity or acceleration threshold. Large discrepancies exist in the classification of a sprint effort. The comparison of activity profiles across and within team-sports is consequently difficult.

The aim of this narrative review is to examine the varying velocity and acceleration thresholds used to analyze team-sport athlete external load. Applying a global velocity or acceleration threshold does not account for individual differences. Whilst thresholds can be individualized, physiological tests comprising continuous or linear movement do not reflect changes of direction and acceleration. The current techniques used to analyze external load are therefore inappropriate. Alternate methods, including unsupervised data mining techniques, are considered. These techniques find trends within external data and may be useful in informing thresholds.

### ATHLETE TRACKING TECHNOLOGIES

Team-sport athlete external load is collected by tracking technologies. Manual video analysis is an inexpensive method to estimate external load. Athletes are filmed by cameras positioned around a playing area, with footage subjectively coded into locomotor categories (Spencer et al., 2004). Manual video analysis requires substantial time demand to examine activity. Validity also has not been established, due to the subjective estimation of athlete movement. A tracking system must be valid so meaningful changes in athlete activity profile can be detected. The capacity of a human to consistently reproduce results is also a major limitation of manual video analysis. Semi-automated tracking systems were designed to remove the laborious and subjective classification of athlete activity. Commercial systems, including ProZone (Di Salvo et al., 2006) and Amisco (Castellano et al., 2014), can detect the position of multiple team-sport athletes. However, the required equipment is expensive and nonportable. Activity profiles therefore cannot be collected without the elaborate infrastructure. Athlete movement is also collected in a two-dimensional plane, with changes in position due to vertical movement going undetected (Barris and Button, 2008).

Accelerometers are wearable sensors that directly quantify athlete load in three-dimensional planes. Accelerometers have been utilized in field-based (Mooney et al., 2013) and court-based (Cormack et al., 2014) team-sports however, accelerometers cannot calculate an athlete's position relative to a playing area. Consequently, the time and distance covered by an athlete at varying velocities are unable to be quantified. The use of GPS to collect the distance and velocities of field-based team-sport athletes is well-documented (Buchheit et al., 2010b; Jennings, D. H. et al., 2012; Varley et al., 2013b). A recent review has examined factors influencing the setup, analysis and reporting of GPS data, for use in team-sports (Malone et al., 2016).

Large variations exist in GPS estimates of changes in velocity, between models and units from the same manufacturer (Buchheit et al., 2014). During simultaneous capture of a sled dragging exercise, small to very large between-model and unit differences were observed in 15 Hz GPS units (Buchheit et al., 2014). These units were manufactured with a 10 Hz GPS but upsampled to 15 Hz (Aughey, 2011a). In 10 Hz GPS, acceleration and deceleration movements have a large between-unit coefficient of variation (CV) of 31–56% (Varley et al., 2012). A variety of factors may influence GPS measures of acceleration and velocity. The accuracy of GPS to measure instantaneous velocity is limited by unit processing speed, location, antenna volume, and chipset capacity. Quantification of instantaneous velocity is up to three times more accurate in 10 Hz GPS units compared to 5 Hz (Varley et al., 2012). When measuring acceleration and deceleration, 10 Hz units still differ by ∼10% when compared to a laser device (Varley et al., 2012).

Whilst GPS quantifies the position and velocities of fieldbased team-sport athletes (Aughey, 2011a), GPS cannot be used with court-based sports held indoors, due to no satellite reception. The development of radio-frequency (RF) based LPS, including the Wireless ad hoc System for Positioning (WASP), allows athlete movement to be captured indoors (Hedley et al., 2010). Local position systems (LPS) sample at up to 1000 Hz with generally superior accuracy compared to GPS (Stevens et al., 2014). During varying speed and change of direction movement, the average acceleration and deceleration derived from LPS was within 2% of Vicon (Stevens et al., 2014). Although, accuracy for peak acceleration and deceleration is limited, LPS can measure average change in velocity or time spent in various acceleration thresholds.

## DISTANCE COVERED

A common athlete activity profile measure is the total distance covered. English Premier League athletes cover an average of 10,714 m during matches (Bradley et al., 2009), less than One Day International (ODI) cricketers at 15,903 m per match (Petersen et al., 2009). Elite Australian footballers may record total distances of up to 12,939 m (Coutts et al., 2010). The total distance covered during matches varies across athlete age (Buchheit et al., 2010a), position and competition level (Jennings, D. H. et al., 2012). When total distance covered is expressed per minute of match duration, soccer athletes cover 104 m·min−<sup>1</sup> (Varley et al., 2013b). Australian footballers may average 157 m·min−<sup>1</sup> (Aughey, 2011b) whilst elite rugby league players cover up to 97 m·min−<sup>1</sup> (Varley et al., 2013b). Sport-specific constraints, including positional or tactical roles, may contribute to these differences. The higher total distance in Australian football may be attributed to the unlimited interchange policy (removed in 2015), and the smaller field size available to soccer and rugby league athletes (Varley et al., 2013b). The total distance covered should be presented per minute of match duration or time spent on field/ in a training drill (Aughey, 2011a).

Court-based athletes have a smaller playing area compared to their field-based counterparts, yet cover similar meters per minute. There is limited activity profile research on court-based athletes. State-level female basket ballers cover 127–136 m·min−<sup>1</sup> during matches (Scanlan et al., 2012), higher than junior males (115 m·min−<sup>1</sup> ) and similar to state- (126–132 m·min−<sup>1</sup> ) and national (130–133 m·min−<sup>1</sup> ) male basketballers (Scanlan et al., 2011). In semi-elite netball, center (C) athletes cover up to 133 m·min−<sup>1</sup> compared to goal keepers (GK) and goal shooters (GS), who average 71 and 70 m·min−<sup>1</sup> , respectively (Davidson and Trewartha, 2008). These differences could be due to the spatial restrictions imposed by each playing position although manually estimating distance covered from video may also provide unreliable estimates (Barris and Button, 2008).

In court-based sports, the ball may frequently and chaotically change direction. Court-based athletes must be responsive to movement of the ball, their team-mates and opposition in a small area. Athletes may change direction and complete short, high-intensity movements to cover or create space. Although, there are more spatial limitations compared to field-based sports, the high frequency of these actions performed by court-based athletes may result in a comparable meters per minute profile. Whilst reporting meters per minute gives an understanding of intensity, granular periods of activity at different velocities are lost by aggregating to the total distance covered. Quantifying the time spent and distance covered at varying velocities may be useful in programming training and monitoring load.

### VELOCITY THRESHOLDS

During matches or training, the instantaneous velocity of an athlete is binned into different zones via threshold values. Velocity thresholds are defined by proprietary software providers (Cunniffe et al., 2009), modified from published research (Jennings, D. H. et al., 2012) or determined arbitrarily (Mohr et al., 2003). There is no consensus on how to determine a velocity threshold and large discrepancies exist, even within a single teamsport (**Table 1**). The comparison of activity profile research is consequently difficult.

The inconsistency between velocity thresholds extends to qualitative descriptors. For example, activity may be labeled as low-velocity or low-intensity movement. Low-velocity movement, including walking and jogging, could be activity between 0 and up to 5.40 m·s −1 (Varley et al., 2013b). Yet in the same sport, activity >4.00 m·s <sup>−</sup><sup>1</sup> was classed as high-speed running (Sullivan et al., 2013). The classification of high-velocity or high-intensity movement is also without consistent definition. The varying definitions make for a difficult comparison between studies. In Australian football, sprint efforts have been defined as activity >4.00 m·s −1 (Sullivan et al., 2013) while a threshold of >4.17 m·s <sup>−</sup><sup>1</sup> has also been utilized (Aughey, 2010; Mooney et al., 2011). The presentation of thresholds as a single > or < value, with ambiguous descriptors, is confusing when velocity data falls between two thresholds. For example, running by professional soccer athletes is described as velocities between 4.00 and 5.47 m·s <sup>−</sup><sup>1</sup> whilst activity >5.50 m·s <sup>−</sup><sup>1</sup> was considered high-intensity movement (Carling et al., 2012). It is unclear if velocities within the 0.03 m·s <sup>−</sup><sup>1</sup> upper and lower ranges of the two classifications were removed from analysis. Deletion of these values may influence the frequencies and durations reported. Research describing thresholds in this manner should detail how instantaneous velocities are binned into different zones. If researchers use discrete values, it is recommended that thresholds be presented as ≥ or ≤ values.

The confusion in velocity thresholds also extends to the duration of a sprint. In elite female rugby union (Clarke et al., 2014), hockey (Vescovi, 2014), and professional male soccer (Carling et al., 2012) matches, sprinting must occur for a minimum of 1 s. However, in other studies (Buchheit et al., 2010a; Jennings, D. H. et al., 2012; Varley et al., 2013b; Kempton et al., 2015b), the minimum duration is not stated. It is unclear what effect these inconsistent minimum threshold durations have on the activity profile. Researchers should state the minimum duration required to record a sprint effort. The inconsistency of sprint thresholds in the literature is likely due to values being arbitrarily determined or taken from proprietary software.

### ACCELERATION THRESHOLDS

Acceleration is a metabolically demanding activity, requiring more energy than constant running (Osgnach et al., 2010). During team-sport matches, a large number of high intensity efforts are short in duration and commence from a low velocity. In elite soccer matches, more than 85% of maximal accelerations did not exceed the high-speed (4.17 m·s −1 ) threshold (Varley and Aughey, 2013). Maximal accelerations (>2.78 m·s −2 ) occurred eight times more than sprinting, classified as >6.94 m·s <sup>−</sup><sup>1</sup> but <10.00 m·s −1 (Varley and Aughey, 2013). The starting velocity is critical when measuring accelerations or decelerations, although quantification of these variables is dependent upon the validity and reliability of athlete tracking systems.

There are large inconsistencies between acceleration thresholds used throughout the literature. In field-based team-sports, accelerations have been classified as >1.11 m·s −2 (Wisbey et al., 2010), 2.78 m·s −2 (Varley et al., 2013a), 3.00 m·s −2 (Hodgson et al., 2014), and 4.00 m·s −2 (Farrow et al., 2008). Accelerations have also been categorized into moderate (2.00– 4.00 m·s −2 ) or high (>4.00 m·s −2 ) zones, with a minimum duration of 0.40 s (Higham et al., 2012). The rationale used to select these zones is unknown. The 2.78 m·s −2 threshold used


in soccer (Varley and Aughey, 2013) and Australian Football (Aughey, 2010) originated from a standing start maximal acceleration of between 2.50 and 2.70 m·s −2 , performed by non-athletes (Varley et al., 2012). Since elite Australian Football athletes often maximally accelerate from a moving start during matches (Aughey and Falloon, 2008), a 4.00 m·s −2 threshold was considered too high and 1.11 m·s −2 too low (Aughey, 2010). It appears the threshold of 2.78 m·s <sup>−</sup><sup>2</sup> was determined arbitrarily (Aughey, 2010). Acceleration thresholds of 1.50, 3.00, and 4.00 m·s <sup>−</sup><sup>2</sup> have been used in a single study (Buchheit et al., 2014). Specifying thresholds in this manner has implications for quantifying activity profile and monitoring change over time, particularly when large variations in the measurement of acceleration are common between GPS models from the same manufacturer (Buchheit et al., 2014).

The velocity distribution of elite field-based team-sport athletes was used to create sport-specific threshold values (Dwyer and Gabbett, 2012). Match data from five elite female and male soccer, hockey and professional male Australian Football athletes were collected from GPS sampling at 1 Hz (Dwyer and Gabbett, 2012). A frequency distribution of speed (0–7 m·s −1 ) in 0.1 m·s −1 increments was computed from the 25 data sets and an average distribution calculated (Dwyer and Gabbett, 2012). Four normally distributed Gaussian curves were then fitted to the averaged velocity distribution curves and the intersecting points used to determine thresholds for each sport (Dwyer and Gabbett, 2012). A frequency distribution of acceleration from each data set was calculated and a threshold was based on the highest 5% of accelerations performed (Dwyer and Gabbett, 2012). This threshold was then calculated for each pre-determined velocity range and used to identify sprints (Dwyer and Gabbett, 2012). The average velocity distribution for all field-based team-sports was similar. Differences between sexes from the same sport were larger than differences across sports (Dwyer and Gabbett, 2012). Six additional sprints, of a short duration, would not have been recorded using the traditional threshold (Dwyer and Gabbett, 2012). While the decision to include five movement categories comprising standing, walking, jogging, running, and sprinting, appear to have been arbitrarily determined, this is a novel idea compared to the traditional analysis of athlete velocity. This approach was utilized to profile the activity of national level lacrosse (Polley et al., 2015) and youth female field hockey (Vescovi, 2014) athletes. However, the 1 Hz GPS units used have a very large (77.2%) CV when measuring short sprint efforts (Jennings et al., 2010). Consequently, data obtained from 1 Hz GPS during these movements, and the results presented, should be interpreted with extreme caution. The small sample size is also limited in detecting meaningful change across and between sports. Decelerations or negative changes in velocity were also removed from the analysis, likely due to the poor capacity of GPS to accurately quantify these movements (Buchheit et al., 2014).

The ability to reduce velocity is termed deceleration. An athlete's capacity to efficiently decelerate is important for changing direction (Kovacs et al., 2008). The major components of deceleration include dynamic balance, power, reactive, and eccentric strength (Kovacs et al., 2008). In elite team-sport athletes, the substantial eccentric loading during repeated decelerations is likely to have a detrimental effect on subsequent 40 m sprint test performance (Lakomy and Haydon, 2004). In collegiate team-sport athletes, muscle damage was induced post 15 × 30 m repeated sprints with a rapid deceleration, interspersed with 60 s of passive recovery (Howatson and Milak, 2009). Increased muscle soreness, swelling, creatine kinase efflux and decreased maximum isometric contract was also observed 48–72 h post exercise (Howatson and Milak, 2009). Collectively, these results demonstrate the magnitude of muscle and performance damage when team-sport athletes perform repeated deceleration efforts.

Investigation into the decelerations of team-sport athletes during matches is limited. In elite male rugby seven matches, decelerations were classified as moderate (−4.00 to −2.00 m·s −2 ) or high (> 4.00 m·s −2 ) and occurred for a minimum of 0.40 s (Higham et al., 2012). It is unclear why these zones were chosen. A 35 and 25% difference in moderate and high decelerations, respectively, existed between standards of play (Higham et al., 2012). The large error of 5 Hz GPS to accurately quantify these movements may account for the difference between playing levels. The deceleration of professional rugby league athletes were investigated during two competitive seasons (Delaney et al., 2015). Differences in the maximum value recorded over a rolling average, from 1 to 10 min in duration, was compared across playing positions (Delaney et al., 2015). Compared with a 10 min rolling average, a large effect was observed for acceleration and decelerations of 1–2 min. A moderate to small effect for 3–7 min duration was also recorded (Delaney et al., 2015). While this approach presents the maximum load of an athlete over varying durations, all acceleration and deceleration measures were modified to estimate the total number of accelerations performed (Delaney et al., 2015). This approach could be misleading as energetically, the ability to accelerate and decelerate is different. Using this approach, the specific training prescription of deceleration is consequently limited.

The deceleration output of court-based team-sport athletes remains largely unknown. Decelerations account for up to 18% of total distance covered during professional football match play (Akenhead et al., 2013). Decelerations, and their distribution over varying epochs, should therefore be included in the activity profiles of court-based team-sport athletes, to ensure appropriate training design for competition. The inconsistency previously described in defining velocity thresholds is also evident in research on decelerations. There is currently no consensus on how to define acceleration or deceleration thresholds. While presenting the acceleration frequency of team-sport athletes provides a global representation of high-intensity movements, limited research exists on the individualization of acceleration thresholds. The classification of accelerations is also dependent upon the sampling epoch utilized, which may alter the magnitude of frequencies reported.

#### FILTERING OF DATA

Athlete tracking data may be filtered during the post-processing phase. Filtering involves the smoothing of position and reduction of noise using various mathematical algorithms (Carling et al., 2008). Noise can be removed by numerous techniques, each with different results. Curve fitting involves a low-order polynomial curve fitted to raw trajectory data. Although, this technique is best for repetitive movements including jumping, error may be introduced through poor selection of specific points that the curve is fitted to (Winter, 2009). These points are determined from the raw data and consequently, are influenced by the very noise the filter is trying to eliminate (Winter, 2009). Bandpass filtering converts raw data from the spatial to the time domain, typically using a Fast Fourier Transform (FFT). High-frequency signal, uncharacterize of normal human movement, is eliminated before data is converted back into the spatial domain through an inverse FFT (Wundersitz, D. et al., 2015). However, the threshold used as the optimal cut-off frequency is arbitary and typically chosen via visual inspection (Wundersitz, D. et al., 2015). Digital filtering analyzes the frequency spectrum of both signal and noise. The signal typically occupies the lower end of a frequency spectrum and overlaps with the noise, which is typically observed at a higher frequency (Winter, 2009). A low-pass filter permits the lower frequency signals while consequently reducing the higher frequency noise. Low-pass filtering can be used when analyzing trajectory data (Winter, 2009).

The filtering of athlete external load data is dependent upon the tracking system utilized. Filtering may occur on raw positional data at the instruction of the tracking system manufacturer (Stevens et al., 2014). Derived measures, including metabolic power from GPS (Di Prampero et al., 2005; Osgnach et al., 2010) are also filtered at unspecified frequencies during the post-processing stage. Butterworth (Stevens et al., 2014) and Kalman (Sathyan et al., 2012) filters are typically used for LPS data. There is limited information on how filters are used in optical player tracking systems and GPS. Filtering may account for the 24% difference in sprint distance between real-time and post-match Australian football GPS data (Aughey and Falloon, 2010) although no detail was presented on how the manufacturer explains these discrepancies. It is important to know how the manufacturer of an athlete tracking system filters raw data, particularly when inferences from external load are used to make decisions on programming training (Borresen and Lambert, 2009; Rogalski et al., 2013). The filtering of accelerometer data has recently been examined (Boyd et al., 2011). Only one of the 13 filters was strongly related (mean bias; −0.01 ± 0.27 g; CV 5.5%) to the criterion measure, Vicon (Wundersitz, D. et al., 2015). Information on filtering is rarely presented from GPS or LPS data when time spent or distance covered in velocity bands are reported. The filtering of raw data from an athlete tracking system has a substantial impact on the frequencies and distances covered in velocity or acceleration zones (Wundersitz, D. et al., 2015). Prior to reporting team-sport athlete activity profiles, researchers should detail the type of filtering applied to raw data.

#### INDIVIDUALIZED THRESHOLDS

Activity profile data reported as an average across a team (Aughey, 2011b) or position (Mooney et al., 2011; Varley and Aughey, 2013) does not account for differences in individual physical capacity. The use of a single sprinting or high-velocity threshold, for all athletes within a team, also does not consider the differences between individual athletes. Although, team-sport matches are contested at an absolute level, the same external load calculated by a high-velocity or sprinting threshold, for two athletes could represent a different internal load based on individual characteristics (Impellizzeri et al., 2004). Athlete movement may be expressed relative to a physiologically defined variable. High-intensity activity can be classified as greater than the second ventilatory threshold (VT2), obtained during a maximal aerobic capacity (VO2max) test. The VT<sup>2</sup> is the point where CO<sup>2</sup> production exceeds O<sup>2</sup> consumption during exercise (Davis, 1985). It is assumed that activity beyond this point cannot be sustained for prolonged periods due to the athlete no longer being in a steady state (Davis, 1985). During team-sport matches, activity below the VT<sup>2</sup> can likely be continued for a prolonged duration. In male soccer athletes, distance covered at or greater than vVT<sup>2</sup> was 167% higher or a very large effect when compared to a threshold of 5.50 m·s −1 (Abt and Lovell, 2009). A 44% variation in athlete rank, calculated by distance covered at highspeed, was observed between the two thresholds (Abt and Lovell, 2009). Individual VT<sup>2</sup> has also been measured in professional soccer athletes (Lovell and Abt, 2012). The resulting vVT<sup>2</sup> was compared to an arbitrary velocity (4.00 m·s −1 ) threshold (Lovell and Abt, 2012). High-speed running distance was overestimated by 9% when arbitrary thresholds were used (Lovell and Abt, 2012). For individual athletes, this range could be between 22% lower and 33% higher (Lovell and Abt, 2012). In elite female rugby sevens athletes, a physiologically-defined threshold corresponding to treadmill speed at VT<sup>2</sup> was compared to a cohort average (3.50 m·s −1 ) value (Clarke et al., 2014). When individualized thresholds were used, high-intensity running was up to 14% over or under-estimated compared to the cohort mean VT<sup>2</sup> derived threshold (Clarke et al., 2014). Distance covered at high-speed may therefore be underestimated by traditional thresholds.

While the individualization of velocity thresholds is a wellreasoned approach to assess external load, conjecture exists on the implementation of an incremental treadmill protocol, conducted within a laboratory, and its application to teamsports. The individualization of velocity thresholds, derived from a continuous running protocol, does not consider the change of direction and acceleration movements, frequent in teamsports (Lovell and Abt, 2012). Whilst speed thresholds have been individualized in field-based team-sports (Abt and Lovell, 2009; Lovell and Abt, 2012; Clarke et al., 2014), limited research exists on court-based team-sports.

Athlete thresholds for external load can be expressed relative to maximum speed attained during sprint testing. The external load of junior-elite male soccer athletes was compared using absolute (>5.27 m·s −1 ) or individual thresholds by obtaining the peak running velocity during the fastest 10 m split of a 40 m sprint (Buchheit et al., 2010b). Athletes in the highest playing standard (U18 years of age) performed more repeated-sprint efforts when activity was assessed using absolute thresholds (Buchheit et al., 2010b). Younger players (U13 and U14 years of age) recorded more sprinting activity with individualized thresholds (Buchheit et al., 2010b). In junior male rugby league athletes, when an individualized threshold of peak velocity obtained during the final 20 m of a 40 m sprint test was compared with absolute speed (>5.00 m·s −1 ) thresholds, younger athletes (U13) performed likely (effect size = 0.43–0.58) greater high-speed running compared to their older (U14 and U15 years of age) counterparts (Gabbett, 2015). The total high-intensity running performed by junior athletes may be altered when expressed relative to a movement threshold obtained during maximal sprinting (Buchheit et al., 2010b; Gabbett, 2015). Inconsistencies therefore exist in the recorded sprinting distance according to the velocity threshold used.

Expressing a team-sport athlete's data relative to a physiologically defined threshold is an individualized approach that may benefit the training prescription for players. Although, an advancement on the use of arbitrarily derived velocity thresholds, limited research exists on how to individualize accelerations. Accelerations require more energy than constant velocity (Osgnach et al., 2010). Without information on how to classify accelerations, individualized thresholds are therefore limited in their use for team-sport athletes, including those who participate in court-based sports.

### RELATIONSHIP OF HIGH-INTENSITY ACTIVITY TO MATCH PERFORMANCE

The capacity to accelerate and sprint is important for teamsport match performance. In junior-elite Australian Football, athletes faster over a 5 and 20 m split acquired the most kicks and disposals during matches, compared with their slower counterparts (Young and Pryor, 2007). During elite matches, a relationship exists between athlete physical capacity and the number of disposals. This relationship is mediated by the amount of high intensity-running (HIR) m·min−<sup>1</sup> or distance traveled at >4.17 m·s −1 (Mooney et al., 2011). Sophisticated modeling techniques may therefore be able examine the effect of contextual and match-related factors on team-sport athlete running intensity.

The relationship between physical capacity and match performance in professional soccer was examined across three top English leagues (Bradley et al., 2013). Total distance covered and HIR >5.50 m·s <sup>−</sup><sup>1</sup> was captured via semi-automatic tracking (Bradley et al., 2013). Less total and HIR distance occurred at a higher than a lower playing standard. Physical capacity, defined as score on the Yo-Yo intermittent recovery two (IR2) test, was correlated with HIR distance (Bradley et al., 2013). In junior-elite male soccer athletes, the relationship between external load, defined as movement >4.47 m·s −1 and physical capacity, quantified as score on the Yo-Yo IR1, was position dependent. Poor correlations were observed between match running performance and athlete physical capacity in all positions except strikers. However, the 1 Hz GPS units used have poor validity (CV% of 11–30%) for assessing HIR (Coutts and Duffield, 2010). To truly quantify the relationship between athlete match external load and physical capacity, tracking technologies that are accurate at detecting movement within a range of intensities should also be used. Although, the relationship between match outcomes, athlete performance, and external load have been examined, research has applied a mean velocity threshold to all athletes within a team (Mooney et al., 2011; Bradley et al., 2013). The justification for these thresholds is typically based on other literature or arbitarily determined. Individualizing velocity thresholds may allow for a detailed analysis of the relationship between athlete external load and match outcome, although physiologically defined thresholds are limited in their application for defining accelerations (Varley and Aughey, 2013). The majority of research on the relationship between athlete performance and external load has focused on males competing in team-sports, with limited information on female athletes (Costello et al., 2014).

## THRESHOLDS FOR MALE AND FEMALE TEAM-SPORT ATHLETES

Men and women compete in team-sports at an elite level. Tracking technologies, including GPS, are used to collect the activity profiles of male and female team-sport athletes (Gabbett and Mulvey, 2008; Dwyer and Gabbett, 2012; Vescovi, 2014). There are differences in physiological capacities between sexes, including aerobic fitness and absolute sprinting ability (Mujika et al., 2009). Consequently, the physiological cost of high-speed running may be substantially different for male and female teamsport athletes. Although, lower speed thresholds are suggested for female team-sport athletes (Dwyer and Gabbett, 2012), limited research exists on the application of these thresholds. An underor over-estimation of external load may occur if female athletes use thresholds initially developed for male athletes.

Thresholds developed for male team-sport athletes have been applied to female external load data. During international female hockey matches, the average number (17) of sprints completed was lower than the mean number (30) performed by male athletes (Macutkiewicz and Sunderland, 2011). However a sprinting threshold of 5.2 m·s −1 , adapted from research on male soccer athletes (Bangsbo, 1992), was applied to female match data. Since there are sex differences in sprinting speed (Mujika et al., 2009), the reduction in mean sprints observed during international female hockey could be due to the inappropriate use of a velocity threshold designed for males. In soccer, male velocity thresholds have also been applied to female external load data (Krustrup et al., 2005; Mohr et al., 2008). However, the sprinting speed of female soccer athletes varies across age (Vescovi et al., 2011) and differs compared to males (Mujika et al., 2009). To develop female specific values, varying velocity thresholds have been used in soccer (Vescovi, 2012). During competitive matches, sprinting by professional female soccer athletes accounts for 5.3% of total distance covered when categorized as activity >5.0 m·s −1 (Vescovi, 2012). However, if the threshold is increased to >6.9 m·s −1 , similar to thresholds used for male team-sport athletes (Varley et al., 2013b), little to no sprinting is recorded (Vescovi, 2012). A ceiling effect may therefore be present when using thresholds originally developed

for male team-sport athletes. Although, the use of varying velocity thresholds is a guide in the development of sprinting values for female soccer, this approach does not consider the individual physiological differences between athletes.

The individualization of velocity thresholds for female athletes has recently been examined. In elite female rugby sevens athletes, a male velocity threshold (5.0 m·s −1 ), individual and cohort mean vVT<sup>2</sup> speed, was used to determine distance covered at high-intensity (Clarke et al., 2014). The absolute amount of match high-intensity running was underestimated by up to 30% when using a velocity threshold designed for male athletes (Clarke et al., 2014). The individualized threshold underor over-estimated high-intensity running by up to 14% when compared to the cohort mean vVT<sup>2</sup> speed threshold of 3.5 m·s −1 (Clarke et al., 2014). Individualizing the high-intensity running threshold, assessed via a linear physiological test, of female teamsport athletes may allow for customized training prescription. However, individualization requires a time-consuming and expensive laboratory-based VO2max test, which can be difficult to implement with a large number of athletes in a team-sport setting. Alternatively, the maximal aerobic speed (MAS) of an athlete is highly-correlated with maximal oxygen uptake (Léger and Boucher, 1980) and reflects running economy (Di Prampero et al., 1986). Assessment of MAS can occur on a large number of athletes during an incremental field running test (Buchheit et al., 2013). The relationship between MAS and high-intensity running has been assessed in youth male soccer athletes (Buchheit et al., 2013) although, to date, no research exists on individualizing the velocity thresholds of female team-sport athletes using MAS testing results. For female team-sport athletes who cannot complete individualized physiological or field testing, a threshold of 3.5 m·s −1 could be used as guide for high-intensity running, although differences between playing position and standard are not accounted for with this fixed threshold.

The development and implementation of female-specific thresholds, according to playing standard and position, should be investigated. Although, thresholds have been developed for female athletes competing in field-based sports (Dwyer and Gabbett, 2012; Clarke et al., 2014), there are no thresholds specifically for court-based sports. Netball, for example, is a court-based team-sport played indoors by elite female athletes. Due to the lack of research on female court-based sports, there is limited information on how to quantify velocity and acceleration thresholds for netball athletes.

### ALTERNATE APPROACHES TO CLASSIFY ATHLETE ACTIVITY

Data mining is a research area that aims to discover regularity from within large datasets and yield insights that are not possible using conventional statistics (Chen et al., 1996). Large databases, such as the external load obtained from tracking technologies, can therefore be investigated. Knowledge may be extracted through data mining techniques including classification, where data are sorted into predefined classes based on some common features (Chen et al., 1996). These methods are alternative approaches to the individualization of team-sport athlete external load. For example, the latent properties of external load from a single athlete can be found using data mining approaches. Velocity or acceleration thresholds are therefore derived directly from the sampled data and can be examined across age, sex, playing standard, or position.

Relationships between latent properties in data that may impact athletic performance can be uncovered using data mining (Ofoghi et al., 2013). Machine learning, a data mining technique, has been used to discover the physiological capacities required to medal in sprint cycling (Ofoghi et al., 2010). A recent review (Ofoghi et al., 2013) highlighted the lack of a contemporary framework for analyzing the match performance data of elite athletes. For example, a traditional statistical analysis on the performance of a team-sport athlete during passing chains may consider a direct relationship with a dependent variable. However, this type of analysis ignores the context of data collection (Ofoghi et al., 2013). Using data mining techniques, the hidden features that may impact upon passing quality could be examined, going beyond a superficial analysis (Ofoghi et al., 2013).

An alternative approach is mediation analysis, a statistical technique that examines the relationship between the dependent variable and independent variables to identify plus explain process. Mediation analysis has been applied in elite Australian Football to examine inter-relationships between athlete capacity, match intensity and performance (Mooney et al., 2011). Playing position and experience influence the relationship between an athlete's capacity, match activity profile and possession output (Mooney et al., 2011). Linear techniques including discriminant analysis (Castellano et al., 2012) and generalized linear modeling have also been used to examine team-sport performance. However, linear techniques may not be an optimum method to analyze the match performance of dynamic and chaotic teamsports.

In contrast, non-linear data mining techniques are not constrained to a single linear variable. Decision trees, a nonlinear technique, have been used to explain match outcome in Australian football (Robertson et al., 2016), classify teamsport activities from a wearable sensor (Wundersitz, D.W. et al., 2015) and explore the attacker and defender interaction during invasion sports (Morgan et al., 2013). Decision trees involve the repeated partitioning of data, based on input fields that create branches which can be further split to differentiate the dependent variable. Decision trees can handle missing data and provide an intuitive analysis of a dataset (Morgan et al., 2013). Unlike clustering, decision tree induction is not dependent on the selection of a prior distribution.

Clustering is a data mining technique that could be used to find unknown patterns in large datasets by classification, whereby data is grouped based on similarity (Chen et al., 1996). A large dataset can be meaningfully divided into smaller components or categories using clustering (Punj and Stewart, 1983). These categories may be mutually exclusive (Fayyad et al., 1996). Categories can also be sorted in a hierarchical or overlapping manner. Gaussian mixture models, a cluster method that contains a prior belief about group assignment, have been used to classify shot making in tennis (Wei et al., 2013). These clustering methods represent sub-populations within a dataset and express the uncertainty about cluster assignment. The kmeans clustering algorithm divides a dataset into a user-specified number of k clusters (Wu et al., 2008). The k-means algorithm starts with k centroids, selected at random. Each data point within the wider dataset is assigned to its nearest centroid, based on similarity. The centroids are updated each time a data point is assigned (Wu et al., 2008). The centroid mean is then calculated from the data points allocated to that cluster (Wu et al., 2008). The size of the dataset determines the number of repetitions required for the k-means algorithm to reach completion (Wu et al., 2008). Clustering, via the k-means algorithm, could be used in a variety of sport settings, including grouping the external load of an athlete.

Complex statistical or data mining techniques, including clustering, may uncover unknown patterns or counter prior beliefs. These approaches could be used to guide the development of athlete velocity and acceleration thresholds. Self-organizing maps (SOM) and clustering have been utilized in elite rugby union to uncover playing styles related to team success (Croft et al., 2015). The coordination patterns during three different basketball shots from varying distances have also been classified using SOM (Lamb et al., 2010). The lowest variability was recorded in the three-point and hook shots. The SOM displayed a movement output that differed unexpectedly from traditional analysis, including visual inspection and time series data (Lamb et al., 2010). A movement analyst with experience and prior knowledge or bias may have been distracted by other information compared to a SOM, that has a more objective methodology (Lamb et al., 2010). These approaches could also be used to group athlete velocity data, without the requirement of a human input threshold based on a physiologically defined or arbitary value. These groups could be formed irrespective of an athlete's age, sex, position, or playing standard. Patterns within athlete movement, including velocities and accelerations performed, could be derived by applying clustering techniques to external load data.

The accelerometer derived PlayerLoadTM data of elite female netball athletes was grouped by k-means clustering (Young et al., 2016). Optimal clustering was the greatest Euclidean distance obtained from two to five clusters (Young et al., 2016). The seven netball playing positions were divided into two groups according to playing intensity and relative time spent in a lowintensity zone (Young et al., 2016). The PlayerLoadTM for the goal based positions was lower than the attacking and wing positions, likely due to the time spent performing low intensity activity (Young et al., 2016). This study was the first to use data mining techniques, including k-means clustering, to examine athlete load data. However, only accelerometer data was investigated and not the position of an athlete, from GPS or LPS. Capturing the position of an athlete allows for the calculation of displacement, velocity and acceleration. With the large volume of data obtained from athlete tracking systems, data mining represents a technique to gain further insight into athlete activity profiles. Consequently, athlete external load could be analyzed without the requirement of an arbitrary or software-implemented threshold.

## RECOMMENDATIONS

A range of velocity thresholds are utilized to classify the sprint effort of a team-sport athlete. Although, thresholds may be individualized (Abt and Lovell, 2009; Clarke et al., 2014), applying a global velocity or acceleration threshold may allow for examination of positional and individual differences over time. A practical issue for those monitoring activity profiles is determining velocity and acceleration thresholds for a cohort of athletes. Selection of these global thresholds is often arbitary and dependent upon the cohort profiled. We recommend that practitioners choose thresholds of an equal bandwidth, for example, 0–5, 15–10, 15–20, 20–25, and ≥25 km·h. The minimum duration required for a sprint effort to be recorded should also be stated.

For elite female team-sport athletes competing in fieldbased sports, a fixed threshold of 3.5 m·s <sup>−</sup><sup>1</sup> may be used to detect high-speed activity across a cohort of players (Clarke et al., 2014). Since a consensus is yet to be reached on the physiological tests to determine velocity or acceleration thresholds, we recommend that practitioners chose a test deemed most appropriate for their sport. Alternatively, data mining approaches could be used to examine the velocity and acceleration output of team-sport athletes. Recently, the velocity, acceleration and angular velocity output of court-based team-sport athletes was examined without arbitary thresholds (Sweeting et al., 2017). Rather than comparing the velocity, acceleration and angular velocities performed by individuals as a function of time, the similarities between playing positions according to the movement sequences performed. This approach may have application for coaching and conditioning. Knowledge of the movements performed, angle of attack and accelerations may assist with planning sport-specific training. Practitioners and scientists can subsequently focus on training the specific movement sequences frequently performed by athletes in each playing position. These sequences can also be examined across different playing standards, such as elite and junior-elite levels. Profiling the activity profile across the athlete pathway may assist in preparing team-sport athletes during transition from lower to higher levels.

### CONCLUSION

Athlete position, velocity, and acceleration can be measured during matches or training via optical tracking, GPS and LPS. The analysis of distance, velocity, and acceleration over a specified time epoch is termed athlete activity profile. It is difficult to compare literature on field-based sports due to inconsistencies in velocity and acceleration thresholds, even within a single sport. Velocity and acceleration thresholds have been determined from physiological and physical capacity tests. Limited research also exists on female team-sport athletes and how to classify their velocity plus acceleration. Alternatively, data mining can derive patterns from large datasets. With the large volume of data obtained from athlete tracking systems and advancements in classifying movement patterns during skill or endurance performance, data mining is a technique to gain further insight into athlete activity profiles. Consequently, athlete external load could be analyzed without velocity or acceleration thresholds. Future work should focus on using data mining techniques to analyze the movement performed by team-sport athletes, particularly elite females and those participating in court-based sports.

#### REFERENCES


### AUTHOR CONTRIBUTIONS

Conceived and designed the experiments: AS, SC, SM, and RA. Drafted manuscript and prepared tables/figures: AS. Edited, critically revised paper, and approved final version of manuscript: AS, SC, SM, and RA.


Int. J. Sports Physiol. Perform. 12(Suppl. 2), S218–S226. doi: 10.1123/ijspp.20 16-0236


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Sweeting, Cormack, Morgan and Aughey. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Use of Body Worn Sensors for Detecting the Vibrations Acting on the Lower Back in Alpine Ski Racing

Jörg Spörri 1, 2 \*, Josef Kröll <sup>1</sup> , Benedikt Fasel <sup>3</sup> , Kamiar Aminian<sup>3</sup> and Erich Müller <sup>1</sup>

<sup>1</sup> Department of Sport Science and Kinesiology, University of Salzburg, Hallein-Rif, Austria, <sup>2</sup> Department of Orthopedics, Balgrist University Hospital, Zurich, University of Zurich, Zurich, Switzerland, <sup>3</sup> Laboratory of Movement Analysis and Measurement, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland

This study explored the use of body worn sensors to evaluate the vibrations that act on the human body in alpine ski racing from a general and a back overuse injury prevention perspective. In the course of a biomechanical field experiment, six male European Cup-level athletes each performed two runs on a typical giant slalom (GS) and slalom (SL) course, resulting in a total of 192 analyzed turns. Three-dimensional accelerations were measured by six inertial measurement units placed on the right and left shanks, right and left thighs, sacrum, and sternum. Based on these data, power spectral density (PSD; i.e., the signal's power distribution over frequency) was determined for all segments analyzed. Additionally, as a measure expressing the severity of vibration exposure, root-mean-square (RMS) acceleration acting on the lower back was calculated based on the inertial acceleration along the sacrum's longitudinal axis. In both GS and SL skiing, the PSD values of the vibrations acting at the shank were found to be largest for frequencies below 30 Hz. While being transmitted through the body, these vibrations were successively attenuated by the knee and hip joint. At the lower back (i.e., sacrum sensor), PSD values were especially pronounced for frequencies between 4 and 10 Hz, whereas a corresponding comparison between GS and SL revealed higher PSD values and larger RMS values for GS. Because vibrations in this particular range (i.e., 4 to 10 Hz) include the spine's resonant frequency and are known to increase the risk of structural deteriorations/abnormalities of the spine, they may be considered potential components of mechanisms leading to overuse injuries of the back in alpine ski racing. Accordingly, any measure to control and/or reduce such skiing-related vibrations to a minimum should be recognized and applied. In this connection, wearable sensor technologies might help to better monitor and manage the overall back overuse-relevant vibration exposure of athletes in regular training and or competition settings in the near future.

Keywords: injury prevention, overuse injuries, wearable sensors, spine, back pain, athletes, alpine skiing, training load management

# INTRODUCTION

On the topic of the relationship between training load and sports injuries, there is emerging evidence that poor load management (i.e., an insufficient balance between loading and recovery) is a major injury risk factor (Drew and Finch, 2016). Accordingly, monitoring the external loads that act on the human body is key to better understanding the occurrence of (and potentially to

#### Edited by:

Luca Paolo Ardigò, University of Verona, Italy

#### Reviewed by:

Supej Matej, University of Ljubljana, Slovenia Yves Henchoz, Centre Hospitalier Universitaire Vaudois (CHUV), Switzerland H-C Holmberg, Mid Sweden University, Sweden

> \*Correspondence: Jörg Spörri joerg.spoerri@balgrist.ch

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

> Received: 06 April 2017 Accepted: 06 July 2017 Published: 20 July 2017

#### Citation:

Spörri J, Kröll J, Fasel B, Aminian K and Müller E (2017) The Use of Body Worn Sensors for Detecting the Vibrations Acting on the Lower Back in Alpine Ski Racing. Front. Physiol. 8:522. doi: 10.3389/fphys.2017.00522

**28**

avoid) injuries in competitive sports (Soligard et al., 2016). In this context, body worn inertial measurement units (IMU) may offer a pervasive way to measure both load-related body postures, as well as vibrations acting on the human body during outdoor sports activities (Kim et al., 1993; Chardonnens et al., 2013; Seel et al., 2014; Fasel et al., 2017). Moreover, they may provide important information regarding training or competition time, movement repetitions and/or the accelerations acting on the different segments of the human body (Chardonnens et al., 2012, 2014; Rawashdeh et al., 2016; Yu et al., 2016; Whiteside et al., 2017). Thus, particularly for investigating the link between load and injury, as well as for monitoring and/or managing training and competition load, sensor-based wearable technologies might serve as an essential tool in the near future. In the current study, their practical usefulness will be demonstrated through the sport of alpine ski racing.

In alpine ski racing, the relatively high risk of injury is well documented and recognized (Pujol et al., 2007; Flørenes et al., 2009; Westin et al., 2012; Bere et al., 2013). In recent years, substantial research efforts concerning injury causes and prevention measures have been undertaken (Spörri et al., 2016b). However, most alpine ski racing-related research has focused on traumatic injuries, while overuse injuries have received little attention (Supej et al., 2017). Accordingly, exploring the potential causes of overuse injuries in order to provide evidence-based recommendations for their prevention has been suggested to be an important task for the future alpine ski racing-related research agenda (Supej et al., 2017).

Similar to other competitive sports, in alpine ski racing the athlete's back has been reported to be one of those body parts that is particularly prone to overuse injuries (Bergstrom et al., 2004; Hildebrandt and Raschner, 2013; Spörri et al., 2015a). As early as adolescence, competitive alpine skiers were discovered to have significantly more prevalent radiographic abnormalities as non-athletic age-matched controls (Rachbauer et al., 2001; Todd et al., 2015). Furthermore, several studies have documented such abnormalities as being associated with a higher risk of developing low-back pain later, either during or after the sports career (Luoma et al., 2000; Lundin et al., 2001; Ogon et al., 2001; Iwamoto et al., 2004). From a biomechanical perspective, several factors may contribute to the development of overuse injuries of the back in alpine ski racing.

First, similar to other competitive sports, an accumulation of heavy mechanical loads exceeding the athletes' capacities, particularly if the recovery time between the loadings is insufficient, may lead to tissue damage and overuse injuries (Soligard et al., 2016). This appears quite plausible, as an association between cumulative low back loads and low back pain has already been demonstrated for different athletic (i.e., other than alpine skiing) and occupational cohorts (Kujala et al., 1996; Heneweer et al., 2011; Coenen et al., 2013).

Second, with the use of body worn sensors, recent studies of alpine ski racing explored that typical loading patterns of the back include a combined occurrence of frontal bending, lateral bending and torsion in the trunk, as well as high peak loads (Spörri et al., 2015a,b, 2016a). Since a combination of these factors is known to be related to high spinal disc loading (Nachemson, 1981; Wilke et al., 1999; Haid and Fischler, 2013), and has been suggested to be attributable to different types of spine deteriorations (Rachbauer et al., 2001; Hangai et al., 2009), they may be considered important mechanisms leading to overuse injuries of the back in alpine ski racing (Spörri et al., 2015a,b, 2016a).

Third, there is strong scientific evidence that excessive exposure to whole-body vibrations, particularly at frequencies close to the resonant frequency of the spine [∼4–10 Hz according to Izambert et al. (2003), Guo et al. (2009), Guo et al. (2011), and Baig et al. (2014)], increases the risk of structural deteriorations/abnormalities of the spine and of developing low back pain (Hill et al., 2009; Burström et al., 2015). For that and other reasons, there are international standards such as, ISO 2631 (ISO, 1997) or Directive 2002/44/EC of the European Union (EU, 2002) that define minimum health and safety requirements for the exposure of workers arising from whole-body vibrations (Griffin, 2004).

Regarding the vibrations that occur while skiing, earlier studies primarily focused on recreational skiing (Kugovnik et al., 2000; Federolf et al., 2009; Supej, 2013; Tarabini et al., 2015) and/or the ski-plate-binding-boot unit level (Kugovnik et al., 2000; Federolf et al., 2009; Tarabini et al., 2015). However, it is reasonable that vibrations in alpine ski racing are markedly different than those occurring in recreational skiing. Based on the preliminary findings of two pilot studies, it is known that vibrations are damped when being transmitted through the skier's body (Supej, 2013; Fasel et al., 2016a). Thus, in the context of alpine ski racing, it is not a priori clear which frequencies and signal powers the occurring vibrations possess, and how much of them are actually transmitted to the lower back. Moreover, in alpine ski racing it is so far largely unexplored whether the vibrations acting on the lower back should be considered to be harmless, or whether they might act as potential contributors for developing overuse injuries.

Therefore, the aims of the current study were: (1) to describe power spectral density (i.e., the signal's power distribution over frequency) of the vibrations acting on the different body segments in the competition disciplines giant slalom (GS) and slalom (SL); and (2) to quantify and compare the root-meansquare (RMS) accelerations acting on the lower back (i.e., the severity of vibration exposure) while skiing GS and SL turns.

### MATERIALS AND METHODS

### Measurement Protocol and Experimental Setup

Six male European Cup-level athletes (85.3 ± 4.9 kg) participated in the study. Within the framework of a biomechanical field experiment, for each athlete the data of two GS runs and two SL runs were collected. For each run performed, an eight-turn section in the middle of a 16 gate-course was considered for further data analysis, resulting in a total 192 included turns (**Figure 1**). The GS course was set with linear gate distances of 25 m and gate offsets of 6.5 m. The SL course had linear gate distances of 10 m and gate offsets of 3 m. Both courses were set

on a constantly inclined slope (19◦ ) with very compact artificial snow conditions, as are typically encountered in the sport of alpine ski racing. Accordingly, on both courses only minor ruts and grooves resulted from the 12 runs performed. The protocol was approved by the ethics committee of the Department of Sport Science and Kinesiology at the University of Salzburg and all subjects gave written informed consent.

#### Data Collection and Instruments

The three-dimensional (3D) accelerations acting on the skier's body segments while skiing were measured at a sampling rate of 500 Hz with six inertial measurement units (Physilog IV; Gait Up; CH) placed on the right and left shanks, right and left thighs, the sacrum and the sternum. The sensors' dimensions were 50 × 39 × 9.2 mm with a 19-gram weight. They were electronically synchronized by radio frequency pulses. In order to minimize the occurrence of any self-resonance and/or soft tissue artifacts, the sensors were fixed to the corresponding body segments on predefined anatomical locations using a skintight custom made underwear suits. For the shank, this was on the medial surface of the tibia bone above the ski boot top and for the thigh, at the mid-distance between the knee and hip joint center (slightly on the lateral side). The sacrum and sternum sensors were fixed directly on the corresponding anatomical landmarks. Additional fixation of the sensors was provided by the athletes wearing their own very close-fitting racing suit. The accelerometers included in the inertial measurement units were set to capture a range of ±16 g and were calibrated following the procedure of Ferraris et al. (1995). To align the sensor frames with the anatomical frames of the body segments, before each analyzed run, a functional calibration procedure consisting of upright still standing, slow squats, vertical trunk rotation and hip abduction and adduction movements was performed. The anatomical frames were defined in accordance to the guidelines of the International Society of Biomechanics (Wu and Cavanagh, 1995). All data processing, parameter computation and statistical analysis steps were performed using the software MATLAB R2012b and/or IBM SPSS Statistics 22.

### Data Processing and Parameter Computation

During analog-to-digital conversion, all acceleration and angular velocity raw data was low-pass filtered at IMU manufacturerpredefined cut-off frequencies of 94 and 98 Hz, respectively. In order to automatically segment each run and to extract the relevant eight-turn section, 3D segment orientations and a 3D body segment model were calculated as described in detail in previous studies (Fasel et al., 2016b, 2017). For each time instance, the distances between the athlete's center of mass and the left and right ankle joint centers were computed. Turn switches were defined as the crossing points of these two distances, as suggested and validated by Fasel et al. (2016c). Inertial acceleration was computed by transforming the measured acceleration in the global frame, removing the gravity component, and transforming the resulting acceleration back into the anatomical frame.

Power spectral density (PSD) was estimated with the singlesided amplitude spectrum (SSAS) of the inertial acceleration. First, the amplitude spectrum (AS) was computed as the square of the norm of the Fast Fourier Transform (FFT) coefficients of the inertial acceleration along the segment's longitudinal axis. Second, to obtain the SSAS, AS was normalized by the sampling frequency and total number of FFT coefficients and was multiplied by two. For illustration purposes, the final PSD was obtained by smoothing SSAS with a moving average of length 5 and interpolating it between 0.5 and 75 Hz in 0.1 Hz steps.

Root-mean-square acceleration (RMS) acting on the lower back (i.e., sacrum sensor) during the analyzed eight-turn section was determined based on the inertial acceleration data along the sacrum's longitudinal axis. In accordance with the international standard ISO 2631 (ISO, 1997), the inertial acceleration data was filtered in the frequency domain prior to computing the RMS according to the ISO filter specifications [frequency weighting Wk (vertical direction) with k = 1]. This filter amplifies accelerations at frequencies close to the resonant frequency of the spine [∼4–10 Hz according to Izambert et al. (2003), Guo et al. (2009), Guo et al. (2011), and Baig et al. (2014); (**Figure 2**)]. RMS was then equal to the RMS value of this filtered acceleration.

Following this procedure, for each run and athlete, one PSD curve and one RMS value were obtained. For providing more representative subject/competition discipline curves and values, finally, the PSD curves and RMS values of two eight-turn sections performed by the same athlete and in the same competition discipline were averaged.

#### Statistical Analysis

The statistical analysis consisted of the following steps: (1) for each body segment and competition discipline, group average PSD curves were computed based on the aforementioned six representative subject average PSD curves; (2) these group average PSD curves were visualized as the areas of uncertainty around the estimate of the mean (i.e., ± the standard error (SE) boundaries); (3) for each competition discipline, group average RMS accelerations acting on the lower back (i.e., sacrum sensor) were calculated based on the aforementioned six representative subject average RMS values and, subsequently, were reported as mean ± standard deviation (SD); and (4) potential differences in the lower back (i.e., sacrum sensor) RMS values between GS and SL were tested using a paired sample t-test (level of significance: p < 0.05), and effect sizes (Cohen d) were calculated.

#### RESULTS

The group average PSD curves of all segments representing GS and SL skiing are depicted in **Figures 3**, **4**. Generally, in both GS and SL, the PSD values of the vibrations acting on the shank were largest for frequencies below 30 Hz. While being transmitted through the body, vibrations were found to be attenuated by each joint (i.e., vibrations at the shank sensor > thigh sensor > sacrum sensor > sternum sensor). Moreover, while at the shank sensor and thigh sensor, PSD values were especially pronounced for frequencies between 10 and 20 Hz; at the lower back (i.e., sacrum sensor), between 4 and 10 Hz PSD values were particularly high. Comparatively, small PSD values were observed at the sternum sensor. At frequencies of below 4 Hz, in the PSD curves of all segments another peak was observed.

The PSD curves that explicitly illustrated the vibrations that acted on the lower back (i.e., sacrum sensor) in GS and SL are presented in **Figure 5**. At frequencies between 4 and 10 Hz, PSD values and, therefore, signal powers of the vibrations acting on the lower back were larger in GS than in SL. Lower back (i.e., sacrum sensor) RMS values were found to be 11.10 ± 1.20 m/s<sup>2</sup> in GS and 9.35 ± 0.77 m/s<sup>2</sup> in SL, whereas these values significantly differed at p < 0.001 (**Table 1**).

#### DISCUSSION

### PSD of the Vibrations Acting on Different Body Segments in GS and SL

As observed previously for recreational skiing (Federolf et al., 2009; Supej, 2013), in both GS and SL skiing the PSD values of the vibrations acting at the level of the shank sensor were found to be largest for the frequency range below 30 Hz (**Figures 3**, **4**). In this context, it is worth discussing that PSD peaks within this particular range might have different origins. PSD peaks below 4 Hz can most likely be ascribed to the frequency of turns and/or the skier's basic movement patterns. For GS, previous studies revealed turn frequencies of 0.7 Hz and basic movement frequencies of 1.4 Hz, while for SL, turn frequencies of 1.1 Hz and basic movement frequencies of 2.2 Hz were observed (Reid, 2010; Spörri et al., 2012, 2016a). PSD peaks above 4 Hz are most likely a direct consequence of uneven or bumpy snow surfaces and the chattering of the skis when interacting with the snow surface while turning. In this context, it is already known that ski chattering and, therefore, vibrations around 15 Hz to 25 Hz are strongly dependent on the skier's turn technique (skidding vs. carving), the ski's sidecut, and the occurring snow conditions (Kugovnik et al., 2000; Federolf et al., 2009; Supej, 2013).

Starting from the aforementioned vibrations acting on the shank, in both GS and SL vibrations were found to be successively attenuated while being transmitted through the body (**Figures 3**, **4**). While the knee joint mainly attenuated the signal power of all occurring vibrations, the hip joint damped the vibrations, particularly at frequencies >10 Hz, which is in line with previous findings of a pilot study in GS skiing (Fasel et al., 2016a) and fundamental studies under laboratory conditions (Rubin et al., 2003; Kiiski et al., 2008). A distinctive attenuation of ski racing-specific vibrations at frequencies between 4 to 10 Hz, was performed by the spinal structures between the sacrum and sternum sensors. Thus, knowing that vibrations of those frequencies (i.e., close to the resonant frequency of the spine) are the most damaging vibrations for spinal structures and increase the risks of developing low back pain (Hill et al., 2009; Burström et al., 2015), they may be considered potential components of mechanisms leading to overuse injuries of the back in alpine ski racing. Accordingly, special emphasis should be placed on controlling and/or reducing them to a minimum (Griffin, 2004), and protecting athletes by adequate prevention measures. This consideration especially applies to youth athletes whose bodies are still in growth stages.

### Vibration Exposure of the Lower Back While Skiing GS and SL Turns

Comparing the competition disciplines GS and SL, distinct differences regarding the vibrations acting on the lower back (i.e., sacrum sensor) were identified: for the back overuse-relevant frequencies of 4 to 10 Hz, PSD values were apparently larger in GS than in SL (**Figure 5**). Moreover, lower back (i.e., sacrum sensor) RMS values, for which calculation accelerations in the range of 4 to 10 Hz are particularly more weighted, were found to be significantly larger for GS than SL (**Table 1**). This might be explained by the larger average angle between the ski axis and the instant direction of motion (i.e., higher amount of skidding) in GS than in SL (Reid, 2010; Spörri et al., 2012) and, therefore, the more intense vibrations that result when the skis slide more transversally (and less longitudinally) over damaged and/or bumpy snow surfaces. A skidding-induced increase of "usual" chattering of the skis when interacting with undamaged and/or smooth snow surfaces might not serve as an explanation, because this phenomenon is known to be typically related to frequencies around 15 to 25 Hz (Supej, 2013). However, whether the observed competition-discipline specific differences are of

FIGURE 4 | Group average power spectral density (PSD) curves of all segments in SL skiing visualized as the area of uncertainty around the estimate of the mean (±SE). Red, right shank sensor; blue, right thigh sensor; gray, sacrum sensor; green, sternum sensor.

curves for frequencies below 30 Hz in GS and SL. Light gray, GS; dark gray, SL.

TABLE 1 | Descriptive and inferential statistics of the root-mean-square accelerations (RMS) that act on the lower back (i.e., sacrum sensor) in the competition disciplines giant slalom (GS) and slalom (SL).


Level of significance: \*\*\*p < 0.001.

clinicalrelevance needs to be verified by future studies combining both health and load monitoring.

#### Methodological Considerations

The current study provided valuable insights on the vibrations acting on the human body in GS and SL skiing from a general and a back overuse injury prevention perspective, though there is a potential limitation that needs to be considered when interpreting the study findings. Since the IMU sensors were fixed on the skin and not directly on the bones, particularly for the thigh segment, relative movements between the IMU sensors and the underlying bones might have occurred. These relative movements mainly can be ascribed to soft tissue artifacts, relative displacements of the fixation suit and the resonance of the attached sensors. As a consequence, peak accelerations may be overestimated by ∼12%, as it was estimated in a previous study comparing the accelerations measured by skinfixed and bone-fixed sensors (Kim et al., 1993). However, in view of the major challenges when collecting kinematic data under field conditions and on an alpine ski racing course, a bone fixation was not a feasible option for the current study.

#### PERSPECTIVES

### Load Monitoring in Alpine Ski Racing with Body Worn Sensor Technology

One approach for keeping the occurrence of lower back vibration exposure of athletes, and in particular that of youth athletes, within a minimal or healthy dose might be found in the systematic management of training load and recovery time. For that purpose, both continuous load monitoring and a profound injury monitoring are fundamental, implying an evident need for precise assessment tools (Soligard et al., 2016). In the near future, sensor-based wearable technologies might serve as an essential tool, especially for monitoring the cumulative exposure to external loads. In the context of overuse injuries of the back and alpine ski racing, the IMU sensor-based methodology used in this study objectively illustrates the great potential such technologies can have.

On the one hand, with the use of only one IMU sensor, it might be possible to quantify the overall severities of lower back vibration exposures during entire training sessions and/or to specifically monitor vibrations at dangerous frequencies. On the other hand, with the use of two IMU sensors and pressure insoles, it might be feasible to assess the overall trunk movement components and peak loads (enabling a rough estimate of the patterns of spinal disc loading) by long-term measurements during regular training. In the context of alpine ski racing, such an approach has already been applied to short experimental trials under field conditions (Spörri et al., 2015b, 2016a); indicating the small remaining gap toward a direct real-time biofeedback during regular training sessions and or competitions.

### Where to Go from Here?

Nevertheless, for finding broad application in sport practical settings, there are several preceding steps that need to be taken: from an engineering perspective, body worn sensor technologies still need to be optimized regarding their size, fixation and usability, as well as their real-time and embedded data-processing. In addition, custom-made and applicationspecific algorithms that take advantage of the characteristics of the specific movement analyzed need to be developed. Finally, prior to the wearable devices/algorithms being launched on the market, rigorous and independent validation and reliability studies are indispensable (Halson et al., 2016; Sperlich and Holmberg, 2016). From a scientific perspective, future research should primarily focus on investigating the relationship between sport-specific external loads and injury risks in order to be able to identify the most relevant parameters for monitoring purposes, and to verify their predictive validity.

In a working-related context, the evaluation of exposures to whole-body vibration is based on the calculation of daily exposure expressed as either: (i) an equivalent continuous RMS acceleration over an 8 h period, or (ii) the vibration dose value (VDV) (Griffin, 2004). Such single measures with corresponding action/limit criteria might serve a more intuitive and perhaps "more coach friendly" approach than the PSD analyses presented in this study. Thus, also in a sports-related context such measures might work. The only missing steps are the definition of sportrelated testing protocols and the exploration of appropriate action/limit criteria, which indispensably need to be associated with exposure time. However, as it was nicely illustrated in Griffin (2004), there is a large internal inconsistency within the Directive 2002/44/EC of the European Union for short duration exposures to whole-body vibration, for instance. In this case, the aforementioned two alternative methods (RMS and VDV) may give very different action/limit values. Accordingly, it might appear more prudent to base actions on the qualitative guidance (i.e., reducing risk to a minimum) rather than only refer to the contradicting quantitative guidance values (Griffin, 2004). Catching up this line of argumentation, also in sportsrelated context, it might be a reasonable alternative approach to just monitor the vibrations acting on the lower back and try (regardless of exposure time) to reduce them to a minimum.

### CONCLUSION

The findings of this study lead to the conclusion that in addition to the previously suggested combined occurrence of frontal bending, lateral bending and torsion in the highly loaded trunk, the vibrations acting on the lower back also may be considered potential components of mechanisms leading to overuse injuries of the back in alpine ski racing. Accordingly, prevention measures should also aim to control and/or reduce to a minimum the vibrations acting on the lower back while skiing. A particular focus should concentrate on vibrations occurring with a frequency around 4 to 10 Hz because these are known to be the most damaging to the spine. In addition,

### REFERENCES


the current study clearly illustrated the great potential of wearable sensor technologies to monitor and manage the external loads that act on alpine skiers during regular training and/or competitions.

### AUTHOR CONTRIBUTIONS

JS, BF, JK, KA, and EM conceptualized the study design. JS, JK, and BF conducted the data collection. JS and BF contributed to the analysis and interpretation of the data. JS drafted the manuscript, all other authors revised it critically. All authors approved the final version and agreed to be accountable for all aspects of this work.

### FUNDING

This study was financially supported by the International Ski Federation (FIS). The project was also partly supported by the Fondation de soutien à la recherche dans le domaine de l'orthopédie-traumatologie. The funding sources had no involvement in: (i) the study design; (ii) the collection, analysis or interpretation of data; (iii) writing the manuscript; and (iv) the decision to publish this work.

### ACKNOWLEDGMENTS

The authors would like to thank Ass.-Prof. Dr. Christian Haid of the Department of Orthopedic Surgery at the Innsbruck Medical University for his consultancy with regard to orthopedic aspects of the current study.

low back pain: a prospective cohort study. J. Occup. Rehabil. 23, 11–18. doi: 10.1007/s10926-012-9375-z


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer HH declared that he is hosting a Research Topic with one of the authors KA, and the handling Editor states that the process met the standards of a fair and objective review.

Copyright © 2017 Spörri, Kröll, Fasel, Aminian and Müller. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Criterion-Validity of Commercially Available Physical Activity Tracker to Estimate Step Count, Covered Distance and Energy Expenditure during Sports Conditions

Yvonne Wahl 1, 2 \*, Peter Düking<sup>3</sup> , Anna Droszez <sup>2</sup> , Patrick Wahl 2, 4 and Joachim Mester <sup>2</sup>

1 Institute of Biomechanics and Orthopedics, German Sport University Cologne, Cologne, Germany, <sup>2</sup> German Research Centre of Elite Sport, German Sport University Cologne, Cologne, Germany, <sup>3</sup> Integrative and Experimental Exercise Science, Department of Sport Science, University of Würzburg, Würzburg, Germany, <sup>4</sup> Department of Molecular and Cellular Sport Medicine, Institute of Cardiovascular Research and Sport Medicine, German Sport University Cologne, Cologne, Germany

#### Edited by:

Kamiar Aminian, École Polytechnique Fédérale de Lausanne (EPFL), Switzerland

#### Reviewed by:

Fabien Andre Basset, Memorial University of Newfoundland, Canada Louis Passfield, University of Kent, United Kingdom

> \*Correspondence: Yvonne Wahl y.wahl@dshs-koeln.de

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 15 May 2017 Accepted: 06 September 2017 Published: 22 September 2017

#### Citation:

Wahl Y, Düking P, Droszez A, Wahl P and Mester J (2017) Criterion-Validity of Commercially Available Physical Activity Tracker to Estimate Step Count, Covered Distance and Energy Expenditure during Sports Conditions. Front. Physiol. 8:725. doi: 10.3389/fphys.2017.00725 Background: In the past years, there was an increasing development of physical activity tracker (Wearables). For recreational people, testing of these devices under walking or light jogging conditions might be sufficient. For (elite) athletes, however, scientific trustworthiness needs to be given for a broad spectrum of velocities or even fast changes in velocities reflecting the demands of the sport. Therefore, the aim was to evaluate the validity of eleven Wearables for monitoring step count, covered distance and energy expenditure (EE) under laboratory conditions with different constant and varying velocities.

Methods: Twenty healthy sport students (10 men, 10 women) performed a running protocol consisting of four 5 min stages of different constant velocities (4.3; 7.2; 10.1; 13.0 km·h −1 ), a 5 min period of intermittent velocity, and a 2.4 km outdoor run (10.1 km·h −1 ) while wearing eleven different Wearables (Bodymedia Sensewear, Beurer AS 80, Polar Loop, Garmin Vivofit, Garmin Vivosmart, Garmin Vivoactive, Garmin Forerunner 920XT, Fitbit Charge, Fitbit Charge HR, Xaomi MiBand, Withings Pulse Ox). Step count, covered distance, and EE were evaluated by comparing each Wearable with a criterion method (Optogait system and manual counting for step count, treadmill for covered distance and indirect calorimetry for EE).

Results: All Wearables, except Bodymedia Sensewear, Polar Loop, and Beurer AS80, revealed good validity (small MAPE, good ICC) for all constant and varying velocities for monitoring step count. For covered distance, all Wearables showed a very low ICC (<0.1) and high MAPE (up to 50%), revealing no good validity. The measurement of EE was acceptable for the Garmin, Fitbit and Withings Wearables (small to moderate MAPE), while Bodymedia Sensewear, Polar Loop, and Beurer AS80 showed a high MAPE up to 56% for all test conditions.

Conclusion: In our study, most Wearables provide an acceptable level of validity for step counts at different constant and intermittent running velocities reflecting

**37**

sports conditions. However, the covered distance, as well as the EE could not be assessed validly with the investigated Wearables. Consequently, covered distance and EE should not be monitored with the presented Wearables, in sport specific conditions.

Keywords: wearables, validity, monitoring, biofeedback, athletes

#### INTRODUCTION

In the past years, there was an increasing development of physical activity trackers (Wearables) which earned them the first place in the ACSM Worldwide Survey of Fitness Trends in 2016 and 2017, leaving popular topics like "High-intensity interval training" and "strength training" behind (Thompson, 2015, 2016).

Besides having applications for physical fitness and health in the general population by monitoring a plethora of different variables like step count, covered distance and energy expenditure (EE), Wearables may be useful for (elite) athletes as well. In these populations, Wearables might be used to monitor aspects of training load (Düking et al., 2016) as well as physical activity during leisure time and provide biofeedback to optimize exercises (Düking et al., 2017).

However, before Wearables can be used beneficially, the parameters they provide need to be scientifically trustworthy which implies that Wearables have sufficient validity which unfortunately is often an issue especially with commercially available Wearables (Sperlich and Holmberg, 2016). Several studies, recently summarized by Evenson et al. (2015) and Düking et al. (2016), tackled this issue and investigated the scientific trustworthiness of different Wearables under a variety of different conditions like walking, jogging, cycling, or resistance exercise under laboratory as well as under free-living conditions. Yet, scientific evaluations are strictly speaking only meaningful for the specific conditions the device was tested in and transfer of the results of these studies should be done carefully (Bassett et al., 2012). For recreational people, testing under walking or light jogging conditions might be sufficient. For (elite) athletes, however, scientific trustworthiness needs to be given for a broad spectrum of velocities or even fast changes in velocities reflecting the demands of the sport. There is scarce literature stating the validity of consumer level Wearables under sport specific conditions, even though some of the herein analyzed wearables are validated in the general population (El-Amrawy and Nounou, 2015; Alsubheen et al., 2016; An et al., 2017; Price et al., 2017).

Therefore the aim of the present study was to investigate the (concurrent) criterion-validity of eleven consumer Wearables concerning the amount of step count, covered distance and EE during running at four different velocities, an intermittent profile reflecting conditions in a soccer match and a 15-min outdoor trial at a constant velocity.

#### MATERIALS AND METHODS

For the determination of the validity of step count, covered distance and EE, the criterion measures are described below. In order to test the validity of the eleven Wearables in a standardized situation under laboratory conditions, participants performed a running protocol of a total duration of 25 min, which consisted of four stages of different constant velocities lasting 5 min each, as well as a 5 min period of intermittent velocity. Validity for outdoor conditions was subsequently tested during a 15-min run at a constant velocity. The validity of the Wearables for step count, covered distance and EE was assessed during a single session of treadmill walking and running, using methods similar to previous validation studies (Takacs et al., 2014).

#### Subjects and Ethics Statement

A total of 20 healthy and active sport students (10 male and 10 female) volunteered to participate in this study. All subjects gave written informed consent to the participation in the study. The study was performed in accordance with the declaration of Helsinki and approved by the Ethic Committee of the German Sport University Cologne.

#### Instruments

#### Criterion Measures

The Optogait system (OPTOGait, Microgate Srl, Bolzano, Italy) was used as the criterion measure for monitoring step count on the treadmill. The system is integrated within the sidebars of the treadmill (Pulsar, h/p/ cosmos sports and medical GmbH, Traunstein, Germany) and uses a photoelectric cell system to precisely measure the number of step count, which is a reliable (ICC = 0.962) and valid (ICC = 0.997) method for measuring step counts during treadmill trials (Lee et al., 2014). Step count was additionally assessed by a manual counter, which was also used in the outdoor condition.

The covered distance measured by the treadmill was used as a criterion measure and was determined based on the calibrated treadmill output (displayed on the electronic output of the treadmill in meters, based on the speed of the treadmill belt and time for each revolution of the belt) according to Takacs et al. (2014). The slope of the treadmill was automatically set at 1%.

The Metamax 3B (Metamax 3B, CORTEX Biophysik GmbH, Leipzig, Germany) is a portable gas analyzer allowing measurements of oxygen uptake under laboratory and freeliving conditions, which was used in this study to calculate EE via indirect calorimetry as the criterion measure for EE. For the calculation of EE, oxygen uptake (VO2) was measured continuously breath by breath during the whole exercise and calculated according to previous reports (Scott et al., 2006). Before each session, the Metamax 3B flowmeter and gas analyzers were calibrated using a 3-liter syringe and a known gas mixture (15% O<sup>2</sup> and 5% CO2). During calibration of the gas analyzer (O<sup>2</sup> and CO<sup>2</sup> sensors), the Metamax3B alternates sampling of the known gas mixture and ambient air. The Metamax 3B is a valid and reliable system for measuring oxygen uptake (Vogler et al., 2010). Methods of indirect calorimetry are the most commonly used to quantify human EE in both laboratory and field settings, typically by measuring oxygen uptake (Hills et al., 2014).

#### Wearables

Eleven Wearables were tested, including: Bodymedia Sensewear MF (300e, BodyMedia Inc, Pittsburgh, PA), Polar Loop (50e; Polar Electro, Kempele, Finnland), Beurer AS80 (30e; Beurer GmbH, Ulm, Germany), Fitbit Charge and Fitbit Charge HR (80e, 100e; Fitbit Inc, San Francisco, CA), Garmin Vivofit (90e), Garmin Vivosmart (100e), Garmin Vivoactive (250e), Garmin Forerunner 920XT (470e) (Garmin, Olathe, Kansas), Withings Pulse O<sup>x</sup> (100e) (Withings SA, Issy-les-Moulineaux, France), Xiaomi MiBand (15e; Xiaomi Inc, Beijing, China). All devices use a triaxial accelerometer; Garmin Vivoactive and Garmin Forerunner 920XT also include a GPS sensor. The Fitbit Charge HR and all Garmin devices also use heart rate to calculate EE using photoplethysmography or chest belt sensors, respectively.

#### Exercise Study Protocol

After arriving in the laboratory, anthropometric (weight, height, body fat) and personal data (date of birth, sex, handedness) of the participants were collected and transferred to all devices. Afterward, eleven Wearables were fixed at the wrist in a randomized order. The Bodymedia Sensewear armband and one Withings Pulse O<sup>x</sup> device were placed on the backside of the upper arm and the hip, respectively. For the measurement of heart rate of the Garmin Wearables, the participants were fitted with a heart rate chestbelt.

First, the participants were asked to lay down for 20 min. After the first 10 min, the measurement of resting EE was started using indirect calorimetry technique. Second, the running protocol was started, consisting of four 5 min stages of different constant velocities (walking: 4.3; 7.0; running: 10.1; 13.0 km·h −1 ) each separated by 5 min of passive rest. After these constant velocities stages, a 5 min period of intermittent velocity followed. This protocol was extracted from a smoothed running trial during a real soccer match (Amisco Data from a soccer match of the 1. German soccer league). The mean running velocity was 9.1 km·h −1 , including twelve sprints with a maximal velocity of 22.4 km·h −1 . Maximal acceleration and deceleration were 5.47 km·h −2 (1.52 m·s −2 ) and −4.88 km·h −2 (−1.36 m·s −2 ), respectively. Remaining time was covered with walking, defined by velocities smaller than 7.33 km·h −1 , which is considered as preferred transition speed between walking and running (Rotstein et al., 2005). Besides the tests under laboratory conditions, ten participants (5 men, 5 women) performed a run of 2.4 km at a constant velocity of 10.1 km·h <sup>−</sup><sup>1</sup> under free-living conditions (**Figure 1**).

### Statistical Analysis

Descriptive statistics (mean ± SD) summarize the characteristics of the participants, including age, weight, height and percent of body fat. All data were tested for normality with no further transformation needed. The validity of the Wearables was determined, as previously performed by other validation studies (Kooiman et al., 2015; Bai et al., 2016; An et al., 2017), by several statistical tests:


All statistical analyses of the data were performed by using a statistics software package SPSS (version 23.0, IBM SPSS Statistics).

## RESULTS

For the laboratory study, 20 participants were included (10 males, mean ± SD age: 26.1 ± 2.8 years; height: 182.3 ± 7.4 cm; weight: 81.1 ± 11.2 kg; body fat 11.5 ± 2.6%, and 10 females mean ± SD age: 24.2 ± 1.9 years; height: 168.2 ± 6.7 cm; weight: 60.2 ± 5.5 kg; body fat 17.9 ± 4.9%). The outdoor condition and the Withings Pulse Ox (Hip) were tested with a fewer number of participants (5 males and 5 females). Due to the high amount of lacking data, we excluded the Xaomi Miband from any data analysis.

The mean differences (criterion–wearable), 95% CI for step count, distance, and EE for all velocities are shown in **Figures 2**–**4**. MAPE, ICC, TE, and LoA are shown in **Table 1** (step count), **Table 2** (distance), **Table 3** (EE).

### Step Count

The mean step count (± SD) measured by the criterion measure was: 538 ± 29 (4.3 km·h −1 ); 785 ± 38 (7.2 km·h −1 ); 822 ± 51 (10.1 km·h −1 ); 863 ± 56 (13.0 km·h −1 ); 1,231 ± 127 (intermittent); 2,456 ± 145 (outdoor) steps. Bodymedia Sensewear, Polar Loop, and Beurer AS80 showed a substantial MAPE up to 16%, a low to moderate ICC, a large TE (up to 100 steps), and the broadest LoA. The other Wearables showed a small MAPE (<2%) for all test conditions as well as a good to excellent ICC. Garmin Vivosmart, Garmin Vivoactive, Fitbit Charge HR, Withings Pulse Ox Hip showed a small TE, and the narrowest LoA.

### Covered Distance

The mean covered distance (± SD) by the criterion measure was: 358 ± 4 (4.3 km·h −1 ); 601 ± 6 (7.2 km·h −1 ); 845 ± 12 (10.1 km·h −1 ); 1,088 ± 21 (13.0 km·h −1 ); 1,139 ± 45 (intermittent); 2,400 ± 0 (outdoor) m. Beurer AS80 showed a high MAPE (17.6

up to 51.9%) for all test conditions. Garmin Vivofit, Vivosmart, Vivoactive, Forerunner, Fibit Charge, Charge HR and Withings showed a moderate MAPE (1.3–29.9%) for all test conditions expect 7.2 km·h −1 . The ICC for all Wearables was very low (< 0.1). Garmin Vivosmart, Garmin Vivoactive, Fitbit Charge, and Fitbit Charge HR showed a small TE, and the narrowest LoA.

## Energy Expenditure

The mean EE (± SD) by the criterion measure were: 24 ± 6 (4.3 km·h −1 ); 47 ± 10 (7.2 km·h −1 ); 61 ± 13 (10.1 km·h −1 ); 74 ± 17 (13.0 km·h −1 ); 96 ± 18 (intermittent); 210 ± 49 (outdoor) kcal.

Bodymedia Sensewear, Polar Loop, Beurer AS80 showed a high MAPE up to 56% for all test conditions. The Garmin, Fitbit and Withings Wearables showed a small to moderate MAPE (1.3–21.2 %) for 10.1 km·h −1 , 13.0 km·h −1 , and the Outdoor condition. Garmin Vivofit, Vivosmart, Vivoactive, Fitbit Charge and Charge HR showed a moderate to good ICC, whereas Bodymedia Sensewear, Polar Loop, Beurer AS80, Garmin Forerunner 920XT and Withings Pulse Ox showed a low ICC. Bodymedia Sensewear, Garmin Vivofit, Garmin Vivoactive, Fitbit Charge showed a small TE, and the narrowest LoA.

## DISCUSSION

The aim of the present study was to investigate the criterionvalidity of eleven Wearables for step count, covered distance and EE over a large spectrum of constant and intermittent velocities reflecting sports conditions. The results indicate that most Wearables, except Beurer AS80, Polar Loop, Bodymedia Sensewear provide an acceptable level of validity concerning step count for all constant velocities, the intermittent protocol as well as for the outdoor condition. The parameters covered distance and EE, however, exhibited a low validity for any of the conditions for most of the Wearables. The Xaomi Miband did lack a high amount of data and we, therefore, want to discourage using this Wearable to monitor step count, distance, and EE in sports conditions.

### Step Count

In line with the present study, other laboratory-based studies also showed generally high correlations for step count between the criterion measure and Wearables (Takacs et al., 2014; Diaz et al., 2015; Evenson et al., 2015). Tudor-Locke et al. (2006) stated that Wearables generally should not exceed a MAPE of 1% compared to the criterion measure during walking on a treadmill at a speed of 4.8 km·h −1 in order to be considered accurate. Garmin Vivosmart, Garmin Vivoactive, Garmin Forerunner 920 XT, Fitbit Charge HR, and Withings Pulse O<sup>x</sup> (Hip) had a MAPE <1% over all test conditions. Fitbit Charge and Garmin Vivofit had a slightly higher MAPE of <3%, still representing good results. Bodymedia Sensewear, Polar Loop, and Beurer AS80 had MAPE between 3.7 and 15.5%, whereby all devices underestimated the number of steps taken. When errors were higher, the direction tended to be an under-estimation of step count by the tracker compared to the criterion. This may be particularly problematic at slow walking speeds (Evenson et al., 2015). Garmin Vivosmart, Garmin Vivoactive, Fitbit Charge HR, and Withings Pulse Ox indicated the narrowest LoA (less than 50 steps for the constant velocities). This can be considered as a relatively small range. The range between the upper and lower LoA of Bodymedia Sensewear, Polar Loop, and Beurer AS80 (up to 200 steps) are considered to be too large to be used interchangeably with the criterion measure. In a sport specific condition like a marathon run with an average velocity of 10.1 km·h −1 an average step count of 60.000 steps represents an error of +60 steps for Fitbit Charge HR or −7.500 steps for Bodymedia Sensewear.

For the intermittent velocities, which are typical for most sport disciplines, the discrepancy was high, revealing an underestimation for all Wearables between −14 ± 40 steps (Garmin Vivosmart) up to −198 ± 91 (Withings Pulse O<sup>x</sup> Wrist). For intermittent sports, like a 90 min competitive soccer game, players will cover on average about 13.000 steps, which represents a small error of −143 steps for Fitbit Charge HR/Garmin Vivosmart up to a high underestimation of 2.106 steps for Beurer AS80.

The outdoor condition, which resembled the same velocity as the third speed on the treadmill (10.1 km·h −1 ), showed similar results as the laboratory testing using constant velocities.

In summary, the step count for most of the Wearables, except Bodymedia Sensewear, Polar Loop, and Beurer AS80 showed to be valid. However, generally, there is a tendency to underestimate the number of steps. One might speculate, that a reduced arm movement while walking/running leads to an underestimation of the step count. Furthermore, it might be a problem of the adjustment of the sensitivity of the accelerometers and different algorithms. The manufacturers have the problem, that wearables should not count every single arm movement during daily life as a step. Therefore, the acceleration needs to exceed a certain

km·h <sup>−</sup><sup>1</sup> = 863 ± 56; intermittent = 1,231 ± 127; outdoor = 2,456 ±145 steps. SW, Bodymedia Sensewear; PL, Polar Loop; B80, Beurer AS80; GVF, Garmin Vivofit; GVS, Garmin Vivosmart; GVA, Garmin Vivoactive; GFR, Garmin Forerunner 920XT; FC, Fitbit Charge; FHR, Fitbit Charge HR; WPO H, Withings Pulse Ox Hip; WPO W, Withings Pulse Ox Wrist.

threshold to be processed by the algorithm and to be counted as a step.

#### Covered Distance

The measurement of covered distance showed no consistent discrepancy over the different velocities between the Wearables and the criterion measure. The Wearables mainly showed an overestimation of distance for constant slower velocities (4.3 and 7.2 km·h −1 ) and an underestimation of distance for higher velocities (13.0 km·h −1 ). This is in line with the study of Takacs et al. (2014), showing an overestimation for slower speeds (3.2–4.7 km·h −1 ) and an underestimation for faster speeds (6.4 km·h −1 ). In elite sport fast running velocities often occur, and consequently, the covered distance will be underestimated in these instances with the presented Wearables. The highest MAPE (−18.1 to 58.3%) of all Wearables was reached at the velocity of 7.2 km·h −1 , whereas the lower velocity of walking (4.3 km·h −1 ) showed a better MAPE (1.3 to 19%). The ICC

km·h <sup>−</sup><sup>1</sup> = 1,088 ± 21; intermittent = 1,139 ± 45; outdoor = 2,400 ± 0 meter. B80, Beurer AS80; GVF, Garmin Vivofit; GVS, Garmin Vivosmart; GVA, Garmin Vivoactive; GFR, Garmin Forerunner 920XT; FC, Fitbit Charge; FHR, Fitbit Charge HR; WPO H, Withings Pulse Ox Hip; WPO W, Withings Pulse Ox Wrist.

ranged from 0.0 to 0.2 for all tested conditions, indicating poor agreement with the criterion measure. This is line with the study of Takacs et al. (2014), showing small ICC between 0.0 and 0.05. Although Garmin Vivosmart, Garmin Vivoactive, Fitbit Charge, and Fitbit Charge HR showed the narrowest LoA, the range is still insufficiently high. In sport specific situations, like a marathon run at 10.1 km·h −1 , covered distance will be overestimated by ∼2.94 km with Garmin Forerunner 920XT, or underestimated by ∼16.9 km with Beurer AS80.

In the intermittent protocol, the covered distance derived from Wearables show a high discrepancy compared to the criterion measure, with some Wearables overestimating (Withings Pulse Ox Hip, Garmin Forerunner 920XT, Garmin Vivoactive, Garmin Vivosmart), others underestimating this parameter (Fitbit Charge HR, Fitbit Charge, Garmin Vivofit, Beurer AS80). For intermittent sports, like a 90 min soccer game (mean distance 12 km), the covered distance will be underestimated by ∼1.080 m using Withings Pulse

95% CI. Mean EE (± SD) by the criterion method were: 4.3 km·h <sup>−</sup><sup>1</sup> = 24 ± 6; 7.2 km·h <sup>−</sup><sup>1</sup> = 47 ± 10; 10.1 km·h <sup>−</sup><sup>1</sup> = 61 ± 13; 13.0 km·h <sup>−</sup><sup>1</sup> = 74 ± 17; intermittent = 96 ± 18; outdoor = 210 ± 49 kcal. SW, Bodymedia Sensewear, PL, Polar Loop; B80, Beurer AS80; GVF, Garmin Vivofit; GVS, Garmin Vivosmart; GVA, Garmin Vivoactive; GFR, Garmin Forerunner 920XT; FC, Fitbit Charge; FHR, Fitbit Charge HR; WPO H, Withings Pulse Ox Hip; WPO W, Withings Pulse Ox Wrist.

Ox hip up to ∼5.076 m using Beurer AS80 based on our findings.

The outdoor condition (10.1 km·h −1 ) showed similar high MAPE compared to the laboratory condition with the same Wearables overestimating (Withings Pulse O<sup>x</sup> Wrist and Hip, Garmin Forerunner 920XT, Garmin Vivoactive, Garmin Vivosmart) or underestimating (Fitbit Charge HR, Fitbit Charge, Garmin Vivofit, Beurer AS80) the covered distance.

In summary, for monitoring the covered distance, no Wearable could achieve good validity for all laboratorybased constant and intermittent velocities as well as in the outdoor condition. We acknowledge that the covered distance can be assessed by other Wearables employing for example receivers for Global Navigation Satellite Systems such as Global Positioning Systems (Cummins et al., 2013) and it seems that this technology is superior




to accelerometry to derive the covered distance in sports conditions.

#### Energy Expenditure

The measurement of EE showed no consistent discrepancy over the different velocities between the Wearables and the criterion measure. The Wearables mainly showed an overestimation of EE for constant slower velocities (4.3; 7.2; 10.1 km·h −1 ) and an underestimation of EE for higher velocities (13.0 km·h −1 ). Overall, Bodymedia Sensewear, Polar Loop, Beurer AS80 showed a low validity for all test conditions. The Garmin, Fitbit and Withings Wearables showed a better validity with small to moderate MAPE (1.3–21.2%) for the faster velocities (10.1 km·h −1 , 13.0 km·h −1 ). The results are in line with a review of Evenson et al. (2015) showing a low validity for EE in 10 adult studies. Although Bodymedia Sensewear, Garmin Vivofit, Garmin Vivoactive, and Fitbit Charge showed the narrowest LoA, the range is still insufficiently high. The ICC ranged from moderate to substantial agreement, while larger bias show the tendency to underestimate EE. Extrapolated to a marathon run (∼3,000 kcal), this equates to an error of ∼86 kcal overestimation for Withings Pulse Ox Wrist up to ∼820 kcal for Polar Loop for a runner of 70 kg with a finishing time of 4:13 h (McArdle et al., 2000).

Fitbit Charge, Garmin Vivoactive, Garmin Vivosmart, and Polar Loop showed relative small MAPE (<5.6%) for the intermittent protocol, whereas the other devices mainly underestimate the EE (Withings Pulse O<sup>x</sup> (Wrist or Hip), Garmin Forerunner 920XT, Garmin Vivofit, Beurer AS80, Bodymedia Sensewear). For intermittent sports, like a 90 min soccer game (mean EE ∼1300 kcal), EE will be underestimated by ∼17 kcal using Garmin Vivoactive up to ∼630 kcal using Withings Pulse O<sup>x</sup> hip.

The outdoor condition showed a completely contrary pattern compared to the laboratory condition (10.1 km·h −1 ). While all devices underestimate the EE in the outdoor condition, most of the devices overestimate EE in the comparable laboratory condition. This is surprising, but may be an issue of reliability, an aspect we intentionally did not target in our study. To clarify this, we want to encourage researchers in conducting reliability studies on the presented Wearables. In summary, the presented Wearables should be used very cautiously to assess EE.

### LIMITATIONS

Generally, we have to acknowledge some limitations of the present study. First, there might be some limitations arising from calculating EE via indirect calorimetry using the device

#### REFERENCES


Metamax 3B (Lighton, 2008). Even though the experiments were conducted within 2 weeks of time, which might limit the degradation of the oxygen sensor, previous studies showed, that the Metamax 3B produces acceptably stable and reliable results, but is not adequately valid during moderate and vigorous exercise without some further correction of VO<sup>2</sup> and VCO<sup>2</sup> (Macfarlane and Wong, 2012). As in every validation study, we cannot be entirely sure if some error arises from the criterion-measure and encourage to see the results of this study in light of these limitations.

Second, the velocities on the treadmill were not randomized, as we expected that higher velocities would influence slower velocities more than the other way round. Therefore, we decided not to randomize the velocities, but to gradually increase the velocity. Additionally, during the 5 min rest periods, spirometric and heart rate values decreased to resting levels. Anyhow, we cannot completely discard a cardiovascular drift.

Third, in comparison to several previous validation studies (Kooiman et al., 2015; Bai et al., 2016; An et al., 2017), we investigated a similar number of subjects. However, the relatively small sample size might limit the statistical power of the present results. There are several statistical approaches for validation studies. However, possibly no statistical approach will remain uncriticised and every approach has its advantages and drawbacks. According to previously published validation studies (Kooiman et al., 2015; Bai et al., 2016; An et al., 2017), we used the statistical approach from this studies.

### CONCLUSION

In our study, most Wearables provide an acceptable level of validity for step counts at different constant and intermittent running velocities reflecting sports conditions. The most valid Wearables, represented by the smallest MAPE, to monitor step count were Garmin Vivosmart, Garmin Vivoactive, Garmin Forerunner 920XT, Fitbit Charge, Fitbit Charge HR and Withings Pulse Ox (Hip). Yet, the covered distance, as well as the EE, could not be assessed validly with the investigated Wearables. Especially in sport specific conditions, like a marathon run or a 90 min soccer game, covered distance and EE showed high errors for nearly all Wearables. Consequently, covered distance and EE should not be monitored with the presented Wearables.

### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Wahl, Düking, Droszez, Wahl and Mester. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Weak Relationships between Stint Duration, Physical and Skilled Match Performance in Australian Football

David M. Corbett 1, 2, Alice J. Sweeting1, 2 and Sam Robertson1, 2 \*

1 Institute of Sport, Exercise and Active Living, Victoria University, Melbourne, VIC, Australia, <sup>2</sup> Western Bulldogs Football Club, Melbourne, VIC, Australia

Australian Rules football comprises physical and skilled performance for more than 90 min of play. The cognitive and physiological fatigue experienced by participants during a match may reduce performance. Consequently, the length of time an athlete is on the field before being interchanged (known as a stint), is a key tactic which could maximize the skill and physical output of the Australian Rules athlete. This study developed two methods to quantify the relationship between athlete time on field, skilled and physical output. Professional male athletes (n = 39) from a single elite Australian Rules football club participated, with physical output quantified via player tracking systems across 22 competitive matches. Skilled output was calculated as the sum of involvements performed by each athlete, collected from a commercial statistics company. A random intercept and slope model was built to identify how a team and individuals respond to physical outputs and stint lengths. Stint duration (mins), high intensity running (speeds >14.4 km · hr−<sup>1</sup> ) per minute, meterage per minute and very high intensity running (speeds >25 km·hr−<sup>1</sup> ) per minute had some relationship with skilled involvements. However, none of these relationships were strong, and the direction of influence for each player was varied. Three conditional inference trees were computed to identify the extent to which combinations of physical parameters altered the anticipated skilled output of players. Meterage per minute, player, round number and duration were all related to player involvement. All methods had an average error of 10 to 11 involvements, per player per match. Therefore, other factors aside from physical parameters extracted from wearable technologies may be needed to explain skilled output within Australian Rules football matches.

Keywords: performance analysis, sport statistics, classification tree, team sport, GPS

# INTRODUCTION

Australian Football (AF) involves a high physical and skilled output for more than 90 min of play to maximize team performance (Gray and Jenkins, 2010). Physical and skill output may decline, as a function of time, during AF matches (Coutts et al., 2010). Consequently, a key tactical consideration during AF matches relates to the length of an on-field stint (i.e., the consecutive amount of time spent on ground by a player) for a player, before their physical and/or skilled output is adversely affected (Montgomery and Wisbey, 2016). In elite AF, there is a limitation on the number of player substitutions a team can make within a match. In the 2017 Australian Football League season, this

#### Edited by:

Billy Sperlich, Integrative & Experimentelle Trainingswissenschaft, Universität Würzburg, Germany

#### Reviewed by:

Giovanni Messina, University of Foggia, Italy Xiao Li, Shantou University Medical College, China

> \*Correspondence: Sam Robertson sam.robertson@vu.edu.au

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 13 July 2017 Accepted: 05 October 2017 Published: 23 October 2017

#### Citation:

Corbett DM, Sweeting AJ and Robertson S (2017) Weak Relationships between Stint Duration, Physical and Skilled Match Performance in Australian Football. Front. Physiol. 8:820. doi: 10.3389/fphys.2017.00820

**49**

limit was 90 rotations per match. Consequently, it is crucial in AF that stints are not ended (or started) unnecessarily early, or are too short or long in duration.

During an AF match, various athlete performance data is collected. Physical output can be measured via Global Positioning System (GPS) or Radio Frequency Identification (RFID) (Wyld, 2008; Coutts and Duffield, 2010). These devices typically sample at 10 or 15 Hz, allowing for the calculation of total distance (m), distance within velocity bands (i.e., distance covered >14.4 km·h −1 ), and peak velocity (km·h −1 ). Match statistics are provided by commercial performance analysis companies (Sullivan et al., 2014b). However, there is less standardization in the measurement of skilled output comparative to physical. Skilled output can be measured by quantifying the number of involvements or actions completed by each player. Involvements may include kicks, handballs and other actions considered important to match success by AF coaching staff. The amount of time each player spends on the field and on the bench is available as a measure of temporal output (Bradley and Noakes, 2013). Potentially due to a combination of cognitive (Tenenbaum and Bar-Eli, 1993) and physiological fatigue (Aughey, 2010), it is unlikely that players can maintain an optimal level of physical and skilled output for an entire match (Thelen and Smith, 1994; Aughey, 2010). In AF, a decrement in physical output has been observed for each quarter completed (Coutts et al., 2010), with a 3% reduction in meterage per minute for every 2 min spent on field during rotations longer than 5 min (Montgomery and Wisbey, 2016). Similarly, the level of skilled involvements for players also likely declines as the duration of a match increases. Recent research has examined how work rate, time on field and situational factors, including the number of stoppages, interact to affect skilled involvement (Sullivan et al., 2014a,b). Although factors influencing the skilled output of players have been identified to date (Sullivan et al., 2014a,b), research assessing how these factors may aid match-day stint/rotation strategies remains to be examined. Measures of skilled, physical and temporal output could be modeled to identify how the skilled output of a team and individual responds to change in temporal and physical output.

For this purpose, generalized linear mixed models present a suitable analysis option, in that they allow for the quantification of independent and dependent variables with repeated measures (Gałecki and Burzykowski, 2013). Random intercept models allow for the quantification of pooled data, whereas random slope modeling outputs differing coefficients and equations for each individual entered into the model (Eyduran et al., 2016). Consequently, the relationship between time, physical and skilled outputs at a team and individual level can be quantified.

Decision trees present an alternative, non-linear option to quantify the relationship between physical, skilled and temporal outputs. Conditional inference trees, for example, incorporate a series of significance tests to create thresholds for each dependent variable (Sardá-Espinosa et al., 2017). These thresholds create branches in the tree, each consisting of differing combinations of dependent variables, which then leads to a prediction of the independent variable. It is possible to nest participants within these trees, thus accounting for how individuals respond to differing combinations of dependent variables. This could allow examination of how physical and temporal parameters interact to influence skilled output.

Utilizing a mixed analysis approach comprised of generalized linear mixed models and conditional inference trees, this study will; (i) identify how athlete skilled output changes as a function of time in an AF match, (ii) determine the extent to which these changes occur at the individual level, and (iii) reveal how different permutations of physical and skilled parameters might correspond to differences in skilled output.

## METHODS

### Participants

Professional male athletes (n = 39) from an elite Australian Football League (AFL) club provided written informed consent to participate in this study (age: 23 ± 4 years, height: 187 ± 8 cm, mass: 86 ± 9 kg). All participants completed at least one full match and at least one stint lasting >3 min in the 2016 AF home and away season. Ethical approval was granted by the Victoria University Human Research Ethics Committee.

## Data Collection

Skilled output, defined as the sum of events completed by each player, are likely to contribute to team success as an "involvement." This was calculated as the total of involvements completed by each player, aggregated from a timeline supplied by a commercial provider of sports statistics (Champion Data, Melbourne, Australia). Champion Data provide a timeline of key actions time stamped to each player, which can broadly be categories as; (i) disposals, (ii) other offensive actions, and (iii) defensive actions. An Excel spreadsheet was designed to aggregate the number of key involvements completed by each player within each stint. To develop the most meaningful measure of skilled output for the team included in this study, key involvements were chosen in consultation with the coaching group (Appendix 1). The sum of involvements for each player's stint was databased alongside physical data, and saved as a.csv file for analysis.

Data was collected from 14 indoor matches and 7 outdoor matches (n = 21) during the 2016 AFL home and away Season. For all indoor matches, athlete physical output was collected via a Catapult T5 Local Positioning System (LPS) tag (Catapult Sports, Melbourne, Australia). During outdoor matches, all participants wore a Catapult S5 GPS (Jennings et al., 2010) device (Catapult Sports, Melbourne, Australia). Both devices were worn within each player's jumpers in a custom-sewn pouch. All matches were monitored live using proprietary software Openfield (Catapult Openfield v 1.11.2-1.13.1) to ensure an adequate signal quality of >8 packets/second, and that stints were correctly recorded. At the conclusion of each match, files were synchronized to the Catapult Cloud storage system. Data for each stint was then exported into a.csv file for further analysis.

### Data Cleaning

This study aimed to provide methods that were generalizable to future data. As a result, several filters were applied to the data to remove outliers (Ofoghi et al., 2013). Only stint maximum velocities in the bottom 98% of the data set (<32.2 km·h −1 ), durations in the top 95% (>3 min) and involvements in the bottom 98% (<2.2 Involvements/minute) were included in the analysis. These cut-offs were heuristically selected based on perceived practical application of the findings. All parameters were then expressed relative to stint time. Each player was assigned a random ID (1–45), whilst each stint was labeled in the format "Quarter. Stint" (i.e., the first stint of quarter 1 was labeled as 1.1). Round number was labeled from 1 to 23.

#### Feature Selection

Parameters included in the analyses were selected based on validity, reliability and multicollinearity features. This process was informed via a literature review on common locational parameters (Cummins et al., 2013), a correlation matrix and variance inflation matrix between all parameters. Consequently, meterage per minute (m·min-1), high intensity running (distance >14.4 km·h −1 ) per minute (m·min−<sup>1</sup> ), very high intensity running (distance >25 km·h −1 ) per minute (VHIR·min−<sup>1</sup> ), stint time (mins) and involvements per minute (IPM−<sup>1</sup> ) were all selected for inclusion in the study.

### Generalized Linear Mixed Models

Generalized linear mixed models were computed in R, using the package lme4 (R Foundation for Statistical Computing, Vienna, Austria). For all models, player ID, stint and round number were specified as random effects, with the restricted maximal likelihood approach adopted (Gałecki and Burzykowski, 2013). A random intercept model was built to identify how skilled output changes, as a function of the other parameters, across the team. Involvements per and duration were the dependent and independent variables, respectively. Bench time, meterage per minute, high intensity running per minute and very high intensity running per minute were added to the model sequentially, with the Akaike information criteria (AIC) computed after each model to assess variable importance (Akaike, 1981). Preliminary modeling revealed that bench time (the time an athlete spent off the field between stints) had minimal impact upon model performance and it therefore was not included in the final model. Finally, a random slope model was built for each player using the remaining parameters.

### Conditional Inference Trees

Three conditional inference trees were constructed using the party package in R. This algorithm operates based on a predetermined level of statistical significance (p < 0.05), and conducts recursive partitioning based on factors most strongly linked with the response variable (Sardá-Espinosa et al., 2017). For the present study, the data were split into an 80% training set and a 20% testing set. Each tree was computed with a 95% confidence interval (CI) under a Bonferroni correction and a minimum terminal node size of 100 instances. The first tree in this study utilized the same parameters as the final generalized linear mixed model. Round and stint number was removed from the second tree, whilst player ID was removed from the final tree. Each tree was cross-validated on the test data set, with model performance represented by the root mean squared error (RMSE) of involvements.

## RESULTS

## Generalized Linear Mixed Models

Descriptive statistics of each parameter for stints (n = 2493) and matches (n = 21) are shown in **Table 1**. The coefficients for the random intercept model are presented in **Table 2** with a 95% CI. This model had an R 2 -value of 0.01, and a conditional R<sup>2</sup> of 0.14 (**Figure 1**).

The coefficients for the random slope model are presented in **Figure 2**. This model had an R<sup>2</sup> of 0.013, and a conditional R<sup>2</sup> of 0.23 (**Figure 1**). The relationship between both duration (for 25/39 players) and high intensity running (for 39/39 players), and involvements per minute was negative. Conversely, MPM experienced a positive relationship with involvements per minute for most players (36/39 players). The relationship between very high intensity running per minute differed considerably depending on the player. Each of these parameters had only a minor relationship with involvements, with the final model having an R<sup>2</sup> of 0.012, and a conditional R<sup>2</sup> of 0.23.

TABLE 1 | Descriptive statistics (mean ± SD) for; Involvements (n), duration (mins), bench time (mins), distance (m), high intensity running (HIR, distance >14.4 km·h −1 , m), very high intensity running (VHIR, distance >25 km·h −1 , m).


TABLE 2 | Model 1 and 2: coefficients of fixed effects (95% confidence interval) for Intercept/Involvements per minute (IPM−<sup>1</sup> ), Duration (mins), High intensity running per minute (HIRMPM, m·min−<sup>1</sup> ), meterage per minute (MPM−<sup>1</sup> , m·min−<sup>1</sup> ) and very high intensity running per minute (VHIRM, m·min−<sup>1</sup> ).


#### Conditional Inference Trees

Results from the first conditional inference classification tree revealed Player ID, stint number, duration and round number as the strongest indicators of involvements per minute (**Figure 3**). An RMSE of 0.12 involvements per minute (approximately 10.1 involvements per match) was reported on both the test and training sets. This tree's first partition included player ID, with rotation, duration and Round number forming the second to fourth partitions respectively. The second tree included player, stint duration and stint meterage per minute (**Figure 4**) as the strongest predictors. As per the first conditional inference tree, an RMSE of 0.12 for involvements for minute (10.1 involvements per match) was observed on both the test and train sets. This tree had an initial partition based on Player ID, with subsequent

partitions based on; duration (2nd), an additional division of Player ID (3rd) and finally duration or MPM (4th).The final tree, with ID removed as an input, used only meterage per minute and stint duration to predict involvements per minute (**Figure 5**). An increased RMSE (0.12–0.13 involvements per minute; 11.05 involvements per match) was observed on both sets of data. In this tree, both the first and second partitions were determined using MPM, with duration only forming a partition in instances where MPM exceeded 125.

## DISCUSSION

This study developed two methods to quantify the impact of physical outputs, on a team and individual level, on skilled output by elite AF players during matches. The first method comprised two generalized linear mixed models, resulting in broad equations for the team and individual players. Both models had low R<sup>2</sup> and conditional R 2 -values, resulting in limited explanatory ability.

The second method, a series of conditional inference trees, identified how different circumstances and combinations of physical parameters may change an athletes' expected skilled output. Whilst partitions in the first tree were dominated by uncontrollable factors, such as round and stint number, the second tree achieved a similar classification accuracy using meterage per minute, player ID and duration. The final tree removed player ID as a parameter to identify a broad set of team rules, which only slightly reduced accuracy (0.13 compared to 0.12 involvements per minute).

The random intercept model broadly showed the strength and direction of influence for each parameter. In the observed team, meterage per minute had a negative relationship with involvements per minute. The only variable to have any positive relationship was high intensity running per minute. Practitioners could use this information as a general "rule of thumb" in match day decision making, whereby, a player who is consistently running at a high meterage per minute for an extended duration, without completing high intensity running, is less likely to reach a maximal skilled output. A limitation of this modeling technique

is that it does not necessarily apply to all players, and does not identify how players individually respond to different parameters.

The random slope model addresses the above issue by allowing for different coefficients of the physical parameters for each player. This allows for better profiling of each athlete and for the importance of each parameter to better reflect an individual's strengths and weaknesses. In the observed team, for example, each of the parameters had positive and negative relationships with skilled output, depending on the player. However, despite the strengths of this modeling approach there are still limitations. The linear decline of involvements per minute declines in response to the temporal and physical inputs is assumed, when it is unlikely the decline in skilled output would be so gradual. Rather, players likely need time and physical intensity on field before their skilled output reaches an optimal level. Finally, these models suggest some level of independence between the physical and temporal parameters. As a result, they are unable to determine how parameters may interact to affect skilled output.

The first tree in this study used the same parameters entered into the random slope model, to identify how parameters interact to influence skilled output (**Figure 3**). However, the significance testing procedure selected uncontrollable factors, such as round and rotation numbers as the key explainers of skilled output. The first tree provided a schematic of factors that may influence skilled output in AF. However, because none of the factors from this tree are controllable within a match, this tree would likely have limited uptake in an applied setting. The second tree removed round and rotation number and partitioned based on player, stint time and meterage per minute (**Figure 4**). In an applied setting, the schematic created by this tree could be used to identify the conditions that are likely to lead to maximal skilled output for each player. Additionally, it could be used in a realtime monitoring setting, to identify if the current circumstances imposed upon a player are conducive to maximal skilled output.

The final conditional inference tree in this study removed player, in an attempt to generate a broad set of team rules. This could provide a cleaner schematic of influences upon skilled output across a team. Using only meterage per minute and stint time, this model set six major partitions for skilled involvement. This ranged from high physical output, but a mixed skilled output, to a low physical and low skilled output. In this playing

group, a high intensity (>172 m·min−<sup>1</sup> ), or, a moderate intensity (125–172 m·min−<sup>1</sup> ) and moderate duration (<19.75 min) leads to a higher skilled output. Consequently, match day prescription strategies for the observed team could use this information to limit the stint time of players.

None of the models developed in this study had particularly strong accuracy. The average match duration for a player included in this study was 86 min, resulting in an average error of 0.12 IPM and equating to an average error of approximately 10.1 involvements per match. This is in agreement with other research examining the impact of contextual factors on both physical and skilled output in AF matches. In itself, physical output is influenced by factors, such as the opposition and the location of a match (Ryan et al., 2017). Furthermore, trivial relationships between common locational parameters and Champion Data player ratings as a measure of skilled performance have been noted elsewhere (Dillon et al., 2017). These findings, collectively, highlight the importance of using skilled and technical data alongside locational parameters to inform match day decisionmaking, as opposed to the latter alone.

There are several factors which may explain the limited relationship between GPS parameters and skilled output in AF matches. Firstly, AF is a dynamic sport, and many circumstantial details are difficult to model. In particular, opposition playing styles and changes in positions (Robertson and Joyce, 2014), may have an impact on both the physical and skilled output of player (Sullivan et al., 2014a). Secondly, the aggregate data utilized in this study is limited in its' ability to identify thresholds for reductions in both physical and skilled output. Other research has examined these outputs across quarters (Bradley and Noakes, 2013), and more recently within stints (Montgomery and Wisbey, 2016). Further work is needed to examine physical and skilled behavior as a time-series, to better describe the outputs competed by players. Finally, this was a methodological study, which aimed to identify trends across a single playing group. For this methodology to be applied to other teams and sports, the modeling approaches would need to be independently run. Therefore, the thresholds created here may not necessarily stand true outside of this playing group.

The models utilized in this study may still aid decision making in elite team sports. They use information that is controllable and readily available during matches, and therefore may assist in situations where objective information is desired to make quick, time-sensitive decisions.

### CONCLUSION

This study developed two methods to identify the relationship between physical, skilled and temporal outputs, on an individual and team level. The first method utilized random slope and intercept models to identify factors that may correlate with a decline in skilled output, and what direction their relationship is with skilled output. This could be used to develop a broad equation for the team and individuals, to identify how they would react to differing stint times and physical workloads. The second set of methods utilized conditional inference trees to identify how physical and temporal parameters may interact to influence skilled output. Together, these three models describe; i) the impact of uncontrollable factors, such as round and rotation number, ii) how different individuals react to different

outputs and iii) a general set of thresholds for the data entered into the modeling process. These trees can provide a schematic to assist match day prescription in team sports. None of these models held an optimal predictive ability, suggesting that wearable technology data and notational analysis feeds could be analyzed differently to improve their use in team sports.

#### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the National Statement on Ethical Conduct in Human Research, VU Human Research Ethics Committee, with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the VU Human Research Ethics Committee.

### REFERENCES


### AUTHOR CONTRIBUTIONS

Data collection: DC and SR, formulation of the study: DC, AS, and SR, statistical analysis and visualization; DC and AS, first draft; DC, subsequent drafts; DC, AS, and SR, final approval; DC, AS, and SR.

#### ACKNOWLEDGMENTS

The authors wish to thank the athletes and support staff of the Western Bulldogs for their participation in this study.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphys. 2017.00820/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Corbett, Sweeting and Robertson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# An Inertial Sensor-Based Method for Estimating the Athlete's Relative Joint Center Positions and Center of Mass Kinematics in Alpine Ski Racing

Benedikt Fasel <sup>1</sup> , Jörg Spörri 2, 3, Pascal Schütz <sup>4</sup> , Silvio Lorenzetti <sup>4</sup> and Kamiar Aminian<sup>1</sup> \*

<sup>1</sup> Laboratory of Movement Analysis and Measurement, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, <sup>2</sup> Department of Sport Science and Kinesiology, University of Salzburg, Hallein-Rif, Austria, <sup>3</sup> Department of Orthopedics, Balgrist University Hospital, University of Zurich, Zurich, Switzerland, <sup>4</sup> Department of Health Sciences and Technology, ETH Zürich, Zürich, Switzerland

#### Edited by:

Luca Paolo Ardigò, University of Verona, Italy

#### Reviewed by:

Thomas Leonhard Stöggl, University of Salzburg, Austria Giovanni Messina, University of Foggia, Italy

> \*Correspondence: Kamiar Aminian kamiar.aminian@epfl.ch

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 20 July 2017 Accepted: 12 October 2017 Published: 01 November 2017

#### Citation:

Fasel B, Spörri J, Schütz P, Lorenzetti S and Aminian K (2017) An Inertial Sensor-Based Method for Estimating the Athlete's Relative Joint Center Positions and Center of Mass Kinematics in Alpine Ski Racing. Front. Physiol. 8:850. doi: 10.3389/fphys.2017.00850 For the purpose of gaining a deeper understanding of the relationship between external training load and health in competitive alpine skiing, an accurate and precise estimation of the athlete's kinematics is an essential methodological prerequisite. This study proposes an inertial sensor-based method to estimate the athlete's relative joint center positions and center of mass (CoM) kinematics in alpine skiing. Eleven inertial sensors were fixed to the lower and upper limbs, trunk, and head. The relative positions of the ankle, knee, hip, shoulder, elbow, and wrist joint centers, as well as the athlete's CoM kinematics were validated against a marker-based optoelectronic motion capture system during indoor carpet skiing. For all joints centers analyzed, position accuracy (mean error) was below 110 mm and precision (error standard deviation) was below 30 mm. CoM position accuracy and precision were 25.7 and 6.7 mm, respectively. Both the accuracy and precision of the system to estimate the distance between the ankle of the outside leg and CoM (measure quantifying the skier's overall vertical motion) were found to be below 11 mm. Some poorer accuracy and precision values (below 77 mm) were observed for the athlete's fore-aft position (i.e., the projection of the outer ankle-CoM vector onto the line corresponding to the projection of ski's longitudinal axis on the snow surface). In addition, the system was found to be sensitive enough to distinguish between different types of turns (wide/narrow). Thus, the method proposed in this paper may also provide a useful, pervasive way to monitor and control adverse external loading patterns that occur during regular on-snow training. Moreover, as demonstrated earlier, such an approach might have a certain potential to quantify competition time, movement repetitions and/or the accelerations acting on the different segments of the human body. However, prior to getting feasible for applications in daily training, future studies should primarily focus on a simplification of the sensor setup, as well as a fusion with global navigation satellite systems (i.e., the estimation of the absolute joint and CoM positions).

Keywords: inertial sensors, center of mass, alpine skiing, movement analysis, body model, posture estimation, validation

**57**

## INTRODUCTION

For the purpose of gaining a deeper understanding of the relationship between training load and health in competitive sports, an accurate and precise estimation of the athlete's kinematics is an essential methodological prerequisite (Soligard et al., 2016). External load such as competition time, movement repetition counts, speed, acceleration, etc. (Soligard et al., 2016) could thus be quantified based on the estimated athlete's kinematics. In the context of alpine skiing, a major aim of coaching is to optimize the skier's posture and, thus, the relationship between his center of mass (CoM) and his left and right feet (Tjørhom et al., 2007; Kipp et al., 2008; Spörri et al., 2012b). In order to formalize this concept, a previous study focused on the parameter "vertical distance," the distance between the left or right foot and the skier's CoM, and the parameter "fore-aft position," the projection of the vector relying the CoM with the left or right foot onto the snow surface (Spörri et al., 2012b). Earlier studies in alpine skiing primarily used video-based stereophotogrammetric systems to determine an athlete's kinematics on a ski track (Supej et al., 2003; Federolf, 2012; Spörri et al., 2012a,b, 2016b; Hébert-Losier et al., 2014). Under such in-field conditions, photogrammetric errors of <1.5 cm were reported (Klous et al., 2010; Spörri et al., 2016c). However, despite major advantages regarding accuracy, corresponding measurement setups are complex, capture volumes are limited to a few turns only, and postprocessing is time consuming.

Accelerated by these limitations and recent advances in wearable measurement technology, in the last few years, differential global navigation satellite systems (GNSS) have gained substantial attention as being a valuable alternative for estimating absolute CoM kinematics in-field (Brodie et al., 2008; Lachapelle et al., 2009; Waegli and Skaloud, 2009; Supej, 2010; Gilgien et al., 2013, 2014a,b, 2015a,b, 2016; Supej et al., 2013; Fasel et al., 2016a; Kröll et al., 2016). A major challenge of this alternative approach is that the GNSS antenna cannot be placed on the CoM directly and, therefore, the relative position of the GNSS antenna with respect to the CoM needs to be estimated. Parallel to these developments, CoM kinematics were also approximated based on a single inertial sensor for both human (e.g., Esser et al., 2009; Peyrot et al., 2009; Myklebust et al., 2015) and animal (e.g., Pfau et al., 2005; Warner et al., 2010) motion analysis. The hypothesis of these studies was that the chosen sensor location would match the CoM location. While this hypothesis may be true for gait, it may be violated in certain sports where upper and lower limb movement may alter the CoM position relative to the chosen sensor location. For example, for cross-country skiing, Myklebust et al. (2015) reported average RMS differences between the true CoM position and a sensor located at the sacrum on S1 of up to 32 ± 4 mm.

In alpine ski racing, one approach to resolve the issue of the CoM moving relative to the sensor location is the use of a simple pendulum model as suggested by Gilgien et al. (2015b) and Supej et al. (2013). However, while providing reasonable estimates of the athlete's overall CoM kinematics, such a model could not estimate the athlete's posture, which is key for the understanding of the relationship between specific loading patterns and health in competitive sports. Another option might be the fusion or combination of GNSS with body worn inertial sensor systems (Brodie et al., 2008; Fasel et al., 2016a). In recent years, several experimental field studies considered these systems to estimate athlete's relative joint center positions and CoM kinematics (Brodie et al., 2008; Supej, 2010; Fasel et al., 2016a). Currently, there exists no validated commercial product estimating the CoM kinematics based on inertial sensors. However, in the context of alpine skiing only the study Fasel et al. (2016a) critically validated such a fusion under in-field conditions, implying a certain need for additional scientific evidence and further improvements of the underlying body model. Specifically, they were using segment lengths obtained from the optical reference system, the upper trunk was divided in two segments not following literature recommendations (e.g., Dumas et al., 2007), and arm movement was not considered.

Thus, based on the aforementioned current stage of knowledge, the first objective of this study was to expand the body model suggested by Fasel et al. (2016a) for the estimation of the CoM to a more comprehensive and scalable model and including the upper limbs. The second objective was to validate the relative positions for the upper and lower limb joint centers and the athlete's CoM obtained from the inertial sensors against a video-based stereophotogrammetric reference system. The third objective was to evaluate the benefits of adding the upper limbs to the CoM estimation. The fourth objective was to assess the sensitivity of the wearable system to detect changes in the equipment used and turn types performed.

### METHODS

### Measurement Protocol

The measurements were conducted on an indoor skiing carpet (Maxxtracks Indoor Skislopes, The Netherlands) with belt dimensions 6 × 11 m and 12◦ inclination (**Figure 1**). Eleven male competitive alpine skiers (20.9 ± 5.2 years, 176.1 ± 6.7 cm, 74.0 ± 10.9 kg) participated in the study. Written informed consent was obtained from all athletes prior to the measurements and the study was approved by the ethics committee of École Polytechnique Fédérale de Lausanne (Study Number: HREC 006- 2016). Each athlete skied two trials with 140 cm long skis and two trials with 110 cm long skis at maximum belt speed of 21 km/h. Two types of skis were used to cover a broad range of different turn dynamics. Each trial lasted approximately 120 s and during the first half the athlete skied wide turns taking up the entire carpet width, while for the second half the athlete skied narrow turns taking up half the carpet width. Cones placed in the front of the treadmill were used to indicate the turn width. To ensure that the athletes stayed in the measurement volume, a spring system attached to a custom made belt pulled the athlete backwards (**Figure 1**).

#### Reference System

Ten infrared cameras (T160, Vicon Peak, UK) sampling at 100 Hz surrounded the carpet and covered the entire volume spanned by the carpet. The IfB marker set with 71 markers (List et al., 2013) (**Figure 2**) was used to obtain functionally determined ankle, knee, and hip joint centers and the 3D orientation of the shanks, thighs, pelvis, and lumbar, thoracic, and cervical trunk segments. Basic motion tasks as described in List et al. (2013) were performed to define the functional joint centers barefoot. The foot markers were then moved from the feet to the ski boots and a static posture was used to register the ski boot markers with the previously determined foot anatomical frame. Trunk markers were used to determine the trunk segments, as described in List et al. (2013). Since the IfB marker set could not directly

system with the athlete. The small white boxes are the inertial sensors and the gray dots the reflective markers.

measure upper limb joint centers, additional markers have been placed on the lateral humeral epicondyle, ulnar styloid, and radial styloid of both the left and right upper limbs. The shoulder joint center was defined to lie 3 cm below the acromion marker in the direction of the marker placed on the scapula inferior angle. The wrist joint center was defined to lie in the middle between the markers placed on the ulnar and radial styloids. The elbow joint center was defined to lie 3 cm to the medial direction with respect to the marker placed on the lateral humeral epicondyle. The medial direction has been defined to be normal to the plane

FIGURE 2 | Sensor and marker setup from the front (A), back (B) and side view (C). The four markers fixed to the helmet are not shown here. The inertial sensors placed in the middle and upper back were not used for this study.

spanned by the shoulder, wrist and lateral humeral epicondyle. In order to allow a comparison with the wearable model, the cervical joint center (CJC) and lumbar joint center (LJC) were estimated based on the anatomical tables from Dumas et al. (2007) scaled to the athlete height. CJC was estimated with respect to the marker placed on C7. LJC was estimated based on the average estimated LJC position with respect to the left and right hip joint centers. Four markers were placed on the athlete's helmet. Their mean position was used to approximate the position of the head vertex. Two markers were placed on each ski's tip and tail and allowed defining the skis' longitudinal axis. For the entire measurement in total 81 markers were attached to the participants. The segments' CoM were computed according to Dumas et al. (2007). The upper limb CoM was assumed to lie on the respective segment's longitudinal axes where the hand's longitudinal axis was the same as the forearm's longitudinal axis. The head's CoM was assumed to lie in the mid-point between the marker placed on C7 and the average position of the two markers fixed at the front of the helmet.

In order to allow a comparison to the inertial system, the joint and CoM positions were expressed relative to the LJC. The reference (global) coordinate system was defined as follows: the Y-axis was vertical, pointing upwards (e.g., vertical direction); Zaxis was horizontal and parallel to the treadmill-plane pointing to the right (e.g., lateral direction); the X-axis was the cross-product of the Z- and Y-axis and was pointing forwards (e.g., forwards slope direction in the horizontal plane).

The coaching-relevant parameters vertical distance and foreaft position were computed according to Spörri et al. (2012b). For each leg (left and right) the vector vCoM, ankle(t) connecting the CoM with the ankle joint center was computed. The vertical distance was the norm of vCoM, ankle(t). The fore-aft position was obtained by the projection of vCoM, ankle(t) onto the line corresponding to the projection of ski's longitudinal axis on the snow surface. The snow surface was mathematically defined as the X-Z plane inclined by 12◦ around the Z-axis.

#### Wearable System

Eleven inertial sensors (Physilog 4, GaitUp, Switzerland) were attached with adhesive tape to the shanks, thighs, sacrum, sternum, head, arms and wrists (**Figure 2**). Acceleration and angular velocity were measured at 500 Hz. Offset and sensitivity of the accelerometers were corrected according to Ferraris et al. (1995). To this end, each accelerometer was held static for a few seconds in the six positions where each sensing axis was either parallel, anti-parallel or orthogonal to the Earth's gravity field. Then a least-square fit was used to determine the sensors' offset and sensitivity such that the measured values would be 1, −1, 0, respectively. Offset of the gyroscopes was estimated during the standing still posture before each trial. The wearable system was synchronized with the reference system by an electronic trigger. The sensors' local frames were aligned with the segments' anatomical frames based on the functional calibration (squats, trunk rotation, hip abduction, and upright standing) described in Fasel et al. (2017b). In addition, the functional calibration of the arm sensors consisted of two movements, as illustrated on protocols.io (doi: 10.17504/protocols.io.jzncp5e): (1) slow arm movement in the sagittal plane where the hands hold a pole horizontally with both thumbs pointing medially. The hands were spaced approximately equal to the shoulder width and elbows were kept straight during the entire movement. Three movement cycles of up/down arm movement in the sagittal plane were performed. (2) Upright posture where the arms and wrists were kept vertically with straight elbows. The hands were oriented such that the palms were barely touching the thighs on their lateral side. For the functional calibration the following constraints were assumed: (i) the main rotation during the arm swing was supposed to occur along the medio-lateral axis of the arm and along the anterior-posterior axis of the wrist (e.g., forearm); (ii) the longitudinal axes of the arms and wrists were presumed to pass parallel to gravity during the upright posture.

#### Estimating Segment Orientation

Segment orientation was obtained based on the strap-down and joint drift correction as described in Fasel et al. (2017a,b). For initializing segment orientation, the athletes were standing straight, looking into the slope direction for 5 s before the treadmill was switched on. The wearable system's global frame was identical to the reference system's global frame and defined as follows: the Y-axis (e.g., vertical axis) was aligned with gravity, pointing upwards. X-axis (e.g., forwards axis) was perpendicular to gravity (i.e., horizontal) and pointing in the direction of the slope, facing downwards. The Z-axis (e.g., lateral axis) was the cross-product between the X- and Y-axis, pointing to the right. It was observed that, despite a standardized posture, the upper limbs' azimuths (i.e., direction of the segments' anterior-posterior axes) were not aligned. In order to find the segment's azimuths the same principle as for the joint drift correction presented in Fasel et al. (2017a,b) was used: after initial strap-down integration the segments' azimuths were assumed to be equal to the average joint acceleration orientation difference over the entire trial. Based on this principle, first the initial orientations of the arms were found with respect to the sternum. Second, the initial orientations of the wrists were found with respect to the arms. After this procedure orientation drift was corrected normally as in Fasel et al. (2017a,b). Example data and the matlab source code for the functional calibration, initial segment orientation estimation, and joint drift correction is available on Code Ocean (doi: 10.24433/CO.23792aee-07c5-4cdc-bfe9-9e85fa1bf5d5).

As no inertial sensors were placed on the skis, for computing the fore-aft position the ski orientations were estimated based on the shank orientations. To this end, it was assumed that the ankle was held in a constant position by the ski boot with a flexion of 17◦ without ankle abduction or internal rotation. In other words, the rotation between the ski's longitudinal axis and the shank's anterior-posterior axis was 17◦ around the shank's medio-lateral axis. The fore-aft parameters were then computed identically as for the reference system and described above.

#### Body Model

The body model was estimated based on a kinematic chain similarly to Fasel et al. (2016a). However, since the main aim of the body model was estimating the athlete's CoM, the origin of the kinematic chain was chosen as the LJC (**Figure 3A**). All segment dimensions were then defined according to Dumas et al. (2007), scaled for athlete height. It was assumed that the segment orientations obtained by the inertial sensors were identical to the anatomical frames of the corresponding segments. The trunk was modeled as two independent segments: pelvis and trunk. It was assumed that the pelvis orientation was equal to the sacrum orientation, and that the trunk orientation was equal to the sternum orientation. Thus, for example, the left hip joint position pleft hip(t) was determined based on Equation (1) and the left knee position pleft knee(t) based on Equation (2). All other joint positions were obtained with the same iterative way. Once the joint positions were known, the segment CoMs were estimated according to Dumas et al. (2007). In order to estimate the CoM of the hand, the hand was assumed to have the same orientation as the wrist. To estimate the foot CoM, it was assumed that the foot had the same orientation as the ski (i.e., 17◦ ankle flexion). A weight of 2 kg was added to each foot to take into account the weight of the ski boot. The skis were ignored for computing the CoM. The athlete's CoM was the weighted average of all segment CoMs. In a simplified model, without the arm and wrist sensors, the upper limbs' combined CoM was approximated at the relative position of (0.15, 0.10, 0.00 m) with respect to LJC expressed in the trunk's (i.e., sternum) anatomical frame (**Figure 3B**). The upper limb's relative CoM position was determined from average values of the full model and was scaled for athlete height with the same scaling factor as for the other segments.

$$\mathcal{P}\_{\text{left hip}}(t) = \stackrel{\text{sacrum}}{} \mathcal{R}(t) \* \stackrel{\text{sacrum}}{} \mathcal{V}\_{\text{left hip}} \tag{1}$$

$$p\_{\text{left knee}}(t) = p\_{\text{left hip}}(t) + \stackrel{\text{left high}}{}{}^{\text{left}}\!\!R(t) \* \nu\_{\text{left knee}}}\tag{2}$$

Where t is the time, sacrumR(t) the orientation matrix of the sacrum, left thighR(t) the orientation matrix of the left thigh, vleft hip the vector connecting the LJC to the left hip in the sacrum's anatomical frame, and vleft knee the vector connecting the left hip to the left knee in the left thigh's anatomical frame.

#### Validation

A total of 44 trials (11 athletes, 4 trials per athlete) were analyzed. Error curves were computed by subtracting for each time sample the 3D position of the joint centers and CoM expressed relative to the LJC obtained with the reference system from the wearable system. For each trial, each individual axis and the total distance (i.e., the error norm), mean and standard deviation of the error were computed. Accuracy was defined as the group average of all trial mean errors and precision was defined as the group average of all trial standard deviations of the error.

The same error analysis was performed for the fore-aft parameters, whereas in addition Pearson's correlation coefficient was computed. For each trial 14 wide and 14 narrow turns were automatically segmented based on the crossing points of left and right vertical distance (i.e., norm of vCoM, ankle(t)) (Fasel et al., 2016b). For each turn the range of motion (RoM) of the vertical distance and the fore-aft position was computed and compared to the reference system with a Bland-Altman plot (Bland and Altman, 2007). Since the data points for the same trial were correlated, the limits of agreements (LoA) were computed as described in Fasel et al. (2017b). To assess whether the wearable system was sensitive to changes, Cohen's d was computed separately for the RoM obtained with the reference and the wearable system between trials (140 vs. 110 cm skis) and turn types (wide vs. narrow).

#### RESULTS

Errors for the left and right side were similar, thus, for the sake of clarity, in the following only the results for the left side are

presented. Please refer to the appendix for the results of the right side.

Both accuracy and precision worsen for the more distal joint centers, and were worst for the ankles (total distance accuracy and precision of 109 and 30 mm) and wrists (total distance accuracy and precision of 97 and 16 mm) (**Table 1**). Standard deviation of the joint center accuracy was found to be between 6.3 and 57.6 mm. CoM accuracy and precision for the total distance were 25.7 and 6.7 mm, respectively.

Especially the knee and ankle joint position errors were dependent on the turn phase, i.e., were different for the inside than the outside leg. **Figure 4** shows time-normalized errors for the knee and ankle joints for a typical athlete and nine wide left/right turns of the trial with 140 cm skis. While the hip's vertical position error (Y-axis) remained below 10 mm throughout the turn cycle, the knee joint position had large errors during left turns (i.e., for inside leg).

Accuracy and precision for the CoM computed with the full model was found to be <8.6 mm and <11.2 mm for each axis. Simplifying the model did not impact the CoM precision, but added a bias in the forwards and vertical direction, in which the CoM was estimated 8.5 mm too low and 13.5 mm too posterior (**Table 2**).

For both the full and simplified models, correlation was >0.98 for the vertical distance and approximately 0.90 for foreaft position (**Table 3**). For the full model, fore-aft position was underestimated by 74 mm on average and its average precision was 34 mm. For the full model, vertical distance was on average overestimated by 3 mm with a precision of 11 mm (**Table 3**). Errors were only slightly different for the simplified model. **Figure 5** shows the average ± standard deviation curves for 14 wide double turns of two representative athletes. The full model was used to obtain the wearable curves.

LoA for the RoM of the vertical distance and fore-aft position were considerately lower for the outside leg than the inside leg (**Table 4**, **Figure 6**). The reference average value (standard deviation) of the vertical distance RoM was 53.8 mm (23.5 mm) for the outside leg and 168.9 mm (45.0 mm) for the inside leg. The reference average value (standard deviation) of the foreaft position RoM was 92.7 mm (40.1 mm) for the outside leg

TABLE 1 | Average (standard deviation) accuracy and precision of the relative joint center positions along the X-axis (forwards slope direction), Y-axis (vertical direction), Z-axis (lateral direction), and total distance (norm of 3D difference).


All units are mm.

9 left and right turns of a representative trial. The first 100% of the turn cycle is a left turn where the left leg is the inside leg and the second 100% is a right turn where the left leg is the outside leg.

TABLE 2 | Average (standard deviation) accuracy and precision of the relative CoM positions for the full model with arms and the simplified model without arms.


All units are mm.

TABLE 3 | Average (standard deviation) accuracy and precision of the fore-aft parameters and their correlation to the reference system for the full model with arms and the simplified model without arms.


Units for accuracy and precision are mm.

and 136.7 mm (47.2 mm) for the inside leg. Cohen's d for the RoM computed with the reference system and the full model were similar: between wide and narrow turns >1 for the foreaft position and >2 for the vertical distance. Simplifying the model by removing the arms did only slightly change the foreaft parameters' accuracy and precision. As for the full model, Cohen's d were similar to the reference system.

#### DISCUSSION

In the current paper, an inertial sensor-based method to estimate the athlete's relative joint center positions and center of mass (CoM) kinematics during alpine skiing has been proposed. In addition to these estimates, the joint center- and CoMrelated measures "vertical distance" and "fore-aft position" were computed. The new method's validity was assessed by comparing it to an optoelectronic stereophotogrammetric reference system (gold standard). Accuracy (precision) for the CoM, vertical distance and fore-aft position were 25.7 mm (6.7 mm), 3.3 mm (10.6 mm), and −73.9 mm (34.0 mm), respectively. Excluding the upper limbs from the body model decreased the accuracy and precision of all curves by less than 3 mm, except for the vertical distance where the accuracy changed from 3.3 to −5.5 mm. The proposed procedure for estimating relative segment azimuth during posture initialization seemed sufficiently accurate and precise. Interestingly, the elbow joint position was estimated with better accuracy than the shoulder and wrist joint positions. However, prior to analyzing specific movements for which arm motion is key, the proposed orientation initialization should be validated more specifically.

#### Joint Center Positions

As expected, errors of the relative joint positions increased along the kinematic chain. Two factors might have contributed to these errors: incorrect segment dimensions and inaccurate segment orientation estimations. Segment dimensions were taken from Dumas et al. (2007) and were scaled for athlete height only. Therefore, athlete-individual deviations from the model were not considered and led to a potential bias in the estimation of the segment length. As an example, our athletes had on average a 40 mm wider pelvis and 69 mm shorter trunk. Subject-specific anthropometric measurements could reduce this error; however, at the costs of a more complicated measurement procedure. Furthermore, segment orientation estimation errors might have directly affected joint estimation errors. For example, knee joint position errors were by a factor of 3–4 higher than for the hip joint. The large precision decrease observed could be attributed to soft tissue artifacts of the thigh. Actually, high muscle activation levels during the turns could have temporarily changed the sensor's alignment with respect to the underlying bone. In this context, it is known that during a turn the inside leg has higher hip and knee flexion angles but has to support less force (Klous et al., 2012; Kröll et al., 2015). Thus, it is reasonable that the muscle activation at the inside leg is different compared to the outside leg (Kröll et al., 2011), what, while turning, might have led to a different amount of soft tissue artifact and, therefore, different errors in the estimation of the thigh segment orientation (**Figure 4**). To overcome these limitations, soft tissue artifacts could be modeled for example with a double static calibration as proposed by Cappello et al. (1997), as well as by measuring different static postures with and without muscle pre-activation (e.g., upright standing or sitting on a chair).

#### CoM Position

Despite the limited performance of joint position estimation, CoM position was estimated with very good accuracy and precision. One explanation could be that errors from individual joint positions were averaged out when computing the athlete's CoM. Surprisingly, and in contrast to the findings from Eames et al. (1999) and Whittle (1997) for walking, removing the upper limbs from the model did not decrease CoM accuracy and precision significantly. One potential explanation for this observation might be the fact that during alpine skiing arm movements are mostly symmetrical and that (at least for the current indoor carpet skiing setup) the arms were almost held in a constant position. Another explanation might be the fact that the upper limbs contribute on average only 10% to total body mass (Dumas et al., 2007). Thus, even if arm movements may not have been estimated correctly, corresponding effects on CoM position are rather marginal.

#### Vertical Distance and Fore-Aft Position

Both vertical distance and fore-aft position were estimated with higher precision than reported previously in


All units are in mm.

Fasel et al. (2015), underlining the better suitability of the revised body model used in the current study. Particularly, for the measure "vertical distance," accuracy was slightly improved, while for the fore-aft position accuracy was slightly reduced. Moreover, compared to vertical distance fore-aft position was found to be more sensitive to ankle position errors (**Figure 7**). Under the hypothesis that the largest error source could be attributed to incorrectly estimated thigh orientation due to soft tissue artifacts, a change in thigh orientation would essentially affect the direction of the vector relying the ankle to the CoM, but not its length. Accordingly, soft tissue artifacts may only marginally alter the vertical distance, however, may substantially influence fore-aft position (**Figure 7**), why in the context of inertial-based measurements this parameter should be used with caution. However, future improvements regarding a reduction of the soft tissue artifacts might help to overcome these fore-aft position-related limitations. In this study, the snow surface was defined mathematically for both the reference and wearable system. For on-snow measurements this surface has to be estimated first, for example by constructing a 3D terrain model with drones (e.g., Pix4Dmapper, Pix4D, Switzerland).

#### Methodological Limitations

Despite the carefully chosen reference system and setup, the study has some limitations that are worth to be discussed: first, the model was specifically designed for lower limb and trunk motion capture. Accordingly, upper limb joints (shoulders, elbows, wrists) and head vertex were only approximately tracked. Especially for the shoulder joint and head vertex reference positions might have been estimated with errors of up to a few centimeters. This inaccuracy was judged to be acceptable, since a validation of the upper limb position and orientation was not the main aim of this study. Second, for the estimation

FIGURE 6 | Bland-Altman plots for the range of motion of the vertical distance (left) and fore-aft position (right). The model without arms was used to generate the figures and compute the LoA (dashed lines). Mean error is shown with the solid lines. Blue marks the outside leg and yellow the inside leg. LoA for both models and outside and inside legs are reported in Table 4.

of CoM, segment inertial parameters were taken from Dumas et al. (2007) and were only scaled to athlete height. However, the body model could be further individualized by taking into account the athlete's segment lengths and an estimation of their muscle masses. Third, as inertial sensors cannot provide absolute position measurements, only the relative joint and CoM positions were validated. For reasons of convenience, the lumbar joint center (LJC) has been defined as the origin for both systems, even though it could not be measured directly by the reference system. However, by averaging the LJC estimated from the left and right hip joint center, measurement errors were aimed to be minimized. Fourth, the ecological validity of the study might be limited. Despite the fact that the movement patterns on the treadmill are known to correspond well to the real on-snow skiing situation (Spörri et al., 2016a), the reduced speed might have led to less dynamic movements and less arm motion. Moreover, vibration from skidding on the snow did not exist either. Therefore, it is expected that errors for on-snow skiing might be slightly larger than presented here.

### Perspective

Overall, based on the system's accuracy and precision and, specifically, based on Cohen's d, the proposed method was found to be sensitive enough to distinguish between different types of turns (wide/narrow). Thus, the current method may also provide a useful information for monitoring and controlling adverse external loading patterns that occur during regular onsnow training. Moreover, as demonstrated earlier and in other settings (Chardonnens et al., 2012, 2014; Rawashdeh et al., 2016; Yu et al., 2016; Whiteside et al., 2017), such an approach is also suitable for quantifying competition time, movement repetitions and/or the accelerations acting on the different segments of the human body. However, prior to getting feasible for applications in settings of daily training, future studies should primarily focus on a simplification of the sensor setup, as well as a fusion with global navigation satellite systems (i.e., the estimation of the absolute joint and CoM positions). It has to be pointed out that, in order to fully quantify the total load, not only the external but also the internal load should be quantified (Soligard et al., 2016).

### CONCLUSION

The system allowed computing the athlete's relative joint center and CoM position with sufficient accuracy and precision for detecting meaningful difference in alpine skiing. Only the accuracy and precision of the most distal joints (e.g., ankle) are on the limit of an acceptable range. The accuracy and precision of the ankle positions can be considered acceptable for computing the vertical distance, but not for calculating the fore-aft position. Future developments should aim at reducing soft tissue artifacts such that knee and ankle positions could be estimated with better precision. To compute the absolute CoM position with respect to a fixed global reference frame, the obtained relative CoM position and body model could be combined with an absolute position of a body part (e.g., head), for example measured with differential GNSS. A future study should also address how to simplify the system so that it could be used for everyday external load monitoring, with fully automated calibration and data analysis.

#### AUTHOR CONTRIBUTIONS

BF, JS, PS, SL, and KA conceptualized the study design. BF, JS, PS, SL conducted the data collection. BF, JS, PS contributed to the analysis and interpretation of the data. BF drafted the manuscript, all other authors revised it critically. All authors

#### REFERENCES


approved the final version and agreed to be accountable for all aspects of this work.

#### FUNDING

The study was funded by the Swiss Federal Office of Sport (FOSPO), grant 15-01; VM10052.

#### ACKNOWLEDGMENTS

The authors would like to thank Swiss Indoor Skiing for providing us access to their ski treadmill.


assessing the feasibility of bringing the biomechanics lab to the field. PLoS ONE 11:e0161757. doi: 10.1371/journal.pone.0161757


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer TS declared a shared affiliation, with no collaboration, with one of the authors JS to the handling Editor.

Copyright © 2017 Fasel, Spörri, Schütz, Lorenzetti and Aminian. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Monitoring Energy Expenditure Using a Multi-Sensor Device—Applications and Limitations of the SenseWear Armband in Athletic Populations

Karsten Koehler <sup>1</sup> \* and Clemens Drenowatz <sup>2</sup>

<sup>1</sup> Department of Nutrition and Health Sciences, University of Nebraska-Lincoln, Lincoln, NE, United States, <sup>2</sup> Division of Physical Education, University of Education Upper Austria, Linz, Austria

In order to monitor their energy requirements, athletes may desire to assess energy expenditure (EE) during training and competition. Recent technological advances and increased customer interest have created a market for wearable devices that measure physiological variables and bodily movement over prolonged time periods and convert this information into EE data. This mini-review provides an overview of the applicability of the SenseWear armband (SWA), which combines accelerometry with measurements of heat production and skin conductivity, to measure total daily energy expenditure (TDEE) and its components such as exercise energy expenditure (ExEE) in athletic populations. While the SWA has been shown to provide valid estimates of EE in the general population, validation studies in athletic populations indicate a tendency toward underestimation of ExEE particularly during high-intensity exercise (>10 METs) with an increasing underestimation as exercise intensity increases. Although limited information is available on the accuracy of the SWA during resistance exercise, high-intensity interval exercise, or mixed exercise forms, there seems to be a similar trend of underestimating high levels of ExEE. The SWA, however, is capable of detecting movement patterns and metabolic measurements even at high exercise intensities, suggesting that underestimation may result from limitations in the proprietary algorithms. In addition, the SWA has been used in the assessment of sleep quantity and quality as well as non-exercise activity thermogenesis. Overall, the SWA provides viable information and remains to be used in various clinical and athletic settings, despite the termination of its commercial sale.

Edited by:

Billy Sperlich, Universität Würzburg, Germany

#### Reviewed by:

Salvatore Tedesco, University College Cork, Ireland Beat Knechtle, University of Zurich, Switzerland

> \*Correspondence: Karsten Koehler kkoehler3@unl.edu

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 27 September 2017 Accepted: 17 November 2017 Published: 30 November 2017

#### Citation:

Koehler K and Drenowatz C (2017) Monitoring Energy Expenditure Using a Multi-Sensor Device—Applications and Limitations of the SenseWear Armband in Athletic Populations. Front. Physiol. 8:983. doi: 10.3389/fphys.2017.00983 Keywords: accelerometry, energy balance, high-intensity exercise, resistance exercise, measurement error

### INTRODUCTION: TRACKING ENERGY EXPENDITURE IN ATHLETES

One of the unique characteristics of athletes is that energy requirements of training and competition increase their total daily energy expenditure (TDEE) beyond those of the general population (Westerterp, 2013). Energy requirements can vary considerably depending on exercise type, intensity, and duration, but sustained levels of energy expenditure (EE) can be in the range of 5,000–8,000 kcal/day (Westerterp et al., 1986; Westerterp, 2001). This high energy turnover has implications not only for weight gain and weight loss practices, which are prominent in sports with weight classes, anti-gravitational sports, or aesthetic sports; it also necessitates a sufficient dietary energy intake, as sustained energy deficiency can result in longterm detriments including impaired bone health and infertility (Loucks et al., 2011). In addition, recent data suggest that athletic performance may also be impaired in energy-deprived athletes (Vanheest et al., 2013).

Because of the high energy demands and the consequences of energy deficiency, tracking EE is paramount for many athletes and their support staff. Considering that athletes expend up to 75% of their TDEE during exercise (Westerterp, 2013), quantifying energy needs during training and competition requires particular attention. The current gold-standard method for the assessment of TDEE in free-living situations is the doubly labeled water (DLW) method, which has been used in numerous athletic settings (Westerterp et al., 1986; Sjödin et al., 1994; Trappe et al., 1997; Hill and Davies, 2001, 2002; Ebine et al., 2002; Ekelund et al., 2002; Koehler et al., 2010). However, the time resolution is limited and the method does not differentiate between various components contributing to TDEE, such as exercise energy expenditure (ExEE) (Westerterp et al., 1986). Improved resolution is provided by indirect calorimetry (IC), the reference method for EE quantification in controlled laboratory settings (Haugen et al., 2007). However, despite recent methodological advances, the method remains mostly limited to research and exercise testing. Further, the requirement of a face mask hinders natural training behaviors such as fluid or food intake. Therefore, other approaches that do not interfere with training and competition practices are needed to reliably quantify EE, and particularly ExEE, in athletes.

Available methods include accelerometry, pedometry, heartrate monitors, and self-report methods (Ndahimana and Kim, 2017). With the exception of self-report methods, which only provide subjective information and show low accuracy and reliability (Ndahimana and Kim, 2017), all of these approaches have been incorporated in activity monitors. These devices are less cost-prohibitive than DLW or IC, can be used during a wide range of activities and numerous settings, and allow for data collection over prolonged time intervals in large cohorts (Düking et al., 2016). Several such wearable devices, including the ActiGraph, Actical, RT3, ActivePAL, or GeneActiv, have been developed for research purposes, and various companies have introduced commercial physical activity trackers (e.g., Fitbit, Garmin, Jawbone, Nike). However, as these devices typically rely only on accelerometry, they provide mixed accuracy with regard to its ability to predict EE or time spent in different activities (Welk et al., 2007) and the ability to detect when devices are worn may be limited (Jaeschke et al., 2017).

### TECHNOLOGY OF THE SENSEWEAR ARMBAND: FEATURES, FUNCTIONS, AND MODIFICATIONS

The SenseWear armband (SWA) developed by BodyMedia Inc. (Pittsburgh, PA, USA) combines accelerometry with additional biological variables, such as heat flux, skin temperature, nearbody ambient temperature, and galvanic skin response. The device only collects data when it is in direct contact with the skin and its pattern-recognition algorithm has been shown to provide more accurate results for estimating EE and time spent in various activities when compared to the ActiGraph (Welk et al., 2007). Given these benefits, the SWA became a promising tool to objectively monitor EE in various exercise and non-exercise settings (Fruin and Rankin, 2004). Most basic principles and functions have remained the same since the initial introduction of the first prototypes in the late 1990s, but there have been several upgrades, the most notable modification being the addition of a third dimension accelerometer axis (Riou et al., 2015) along with increased data transfer and storage capacity. Per manufacturer instructions, the SWA is worn on the upper left arm, and can be used to record data continuously for up to 3–4 weeks (Koehler et al., 2013). Data can be downloaded, viewed, and exported for subsequent data processing using manufacturer software (InnerView, BodyMedia, Pittsburgh, PA). A proprietary algorithm converts raw data into estimates of EE, which are expressed both in kcal/min and metabolic equivalents (METs). In efforts to improve the validity of the SWA, this algorithm has been modified several times (Jakicic et al., 2004; Van Hoye et al., 2015). Although the technology was purchased by a competitor in 2013 and has since been discontinued (Welk et al., 2017), the SWA continues to be used extensively in research and clinical settings (**Figure 1**). Considering the continued popularity and the current lack of alternatives on the market, it was our goal to provide a critical review of the applicability of the SWA to measure EE specifically in athletes. As such, we provide a general overview of the strength and limitations of the SWA in the general population (section Validity of the SenseWear Armband in the General Population: Energy Expenditure, Physical Activity, and Exercise), followed by a review of the validity of the SWA in athletes and during various types of high-intensity exercise (section Validity of the SenseWear Armband during High-Intensity Exercise). We further discuss possible reasons for limitations (section Limitations of the SenseWear Armband: Algorithm vs. Methodology) and non-traditional applications of the SWA in athletic settings (section Application of the SenseWear Armband in Athletic Populations). To identify appropriate literature, a quasi-systematic PUBMED search (https://www.ncbi.nlm.nih. gov/pubmed/) was conducted in June 2017 independently by both authors, using "SenseWear" in combination with "exercise," "activity," or "athletes" as search terms. In addition, we included literature cited. Final inclusion was decided on by a joint decision from both authors based on each paper's relevance to the review's target group.

### VALIDITY OF THE SENSEWEAR ARMBAND IN THE GENERAL POPULATION: ENERGY EXPENDITURE, PHYSICAL ACTIVITY, AND EXERCISE

In the general population, the SWA has been validated extensively and has been shown to provide accurate estimates of TDEE as well as EE at rest and during activities of light

to moderate intensities when compared to DLW or IC (Cole et al., 2004; Fruin and Rankin, 2004; Jakicic et al., 2004; King et al., 2004; Mignault et al., 2005; Papazoglou et al., 2006; Malavolti et al., 2007; Patel et al., 2007; St-Onge et al., 2007; Johannsen et al., 2010; Casiraghi et al., 2013; Brazeau et al., 2016). When specific time periods of varying activity intensities were examined, however, the SWA generally overestimated EE at lower intensities, while EE was underestimated at higher intensities (Cole et al., 2004; Fruin and Rankin, 2004; Jakicic et al., 2004; Patel et al., 2007; Dwyer et al., 2009; Berntsen et al., 2010; Benito et al., 2012; Gastin et al., 2017). Accordingly, TDEE was overestimated in participants with low levels of TDEE and underestimated in participants with high TDEE (St-Onge et al., 2007; Johannsen et al., 2010).

It should further be considered that the accuracy of the SWA is impacted by external factors such as treadmill incline, exercise mode (e.g., running vs. bicycling), or the use of the upper vs. lower body exercise (Fruin and Rankin, 2004; Jakicic et al., 2004; Berntsen et al., 2010; Vernillo et al., 2015; Brazeau et al., 2016; Gastin et al., 2017). Specifically, underestimation of EE during uphill walking has been reported in several studies, with increasing measurement errors at steeper inclines (Fruin and Rankin, 2004; Jakicic et al., 2004; Vernillo et al., 2015). Downhill walking, on the other hand, was associated with an overestimation of EE, and—although less pronounced measurement errors increased as declines became steeper (Vernillo et al., 2015). During stationary cycling, total EE did not differ between the SWA and IC, but individual time point data were poorly correlated: At the beginning of the cycling trial, EE was underestimated, but EE estimates by the SWA increased gradually over time even though IC values remained stable (Fruin and Rankin, 2004; Brazeau et al., 2016). Further, Gastin et al. (2017) reported an underestimation of EE during resistance type circuit exercise, most likely due to inaccuracies at higher intensities. In addition to problems related to activity type and intensity, body weight has been shown to affect measurement accuracy. Even though no particular bias toward over- or underestimation of EE was observed, measurement error increased with increasing BMI (Dwyer et al., 2009; Malavolti et al., 2012). Considering that athletes typically are on the extreme ends of the body composition spectrum (Meyer et al., 2013), it is unclear to which degree body weight or composition contribute to measurement errors in athletes.

Differences in body weight or composition may also contribute to the considerable variability of measurement accuracy at the individual level (Fruin and Rankin, 2004; Brazeau et al., 2016). Nevertheless, a recent study reported accurate measurements of TDEE with a mean difference of 2.8 kcal/day and narrow 95% confidence intervals (−34.8 to 40.3 kcal/day) and a correlation coefficient of r = 0.88 when comparing SWA values to DLW in 191 generally healthy adults with diverse body weight and physical activity levels (Drenowatz et al., 2017). Overall, the SWA provides valid estimates of TDEE and ExEE with a measurement error of typically <10% in a recreationally active population.

### VALIDITY OF THE SENSEWEAR ARMBAND DURING HIGH-INTENSITY EXERCISE

To our knowledge, only one study has assessed the validity of SWA-measured TDEE specifically in athletes. Koehler et al. (2011) reported an average difference of 65 kcal/day (<2% of TDEE) between TDEE measured by SWA and DLW in 14 endurance trained athletes and a moderate to strong correlation (r = 0.73) However, higher levels of TDEE tended to be underestimated by the SWA, and the level of underestimation was related to the participant's exercise capacity, whereby EE was underestimated to a greater degree in better trained athletes (Koehler et al., 2011).

### Validity during High-Intensity Aerobic Exercise

Several studies have tested the validity of the SWA during high-intensity, continuous aerobic exercise. In two independent studies in trained male athletes, the SWA underestimated ExEE during treadmill running at speeds of ∼10.1 km/h (6.3 miles/h) and greater (Koehler et al., 2011, 2013). These findings were replicated by Drenowatz and Eisenmann (2011), who demonstrated that ExEE was consistently underestimated in endurance-trained athletes running at 65, 75, and 85% of their aerobic capacity, corresponding to a similar speed range (9.9–14.6 km/h; 6.2–9.1 miles/h). In another study, the SWA underestimated ExEE even at speeds from 6.0 to 7.2 km/h (3.7– 4.5 miles/h) (van Hoye et al., 2014). Similar findings were also reported during stationary bicycling, whereby the SWA underestimated ExEE at workloads between 140 and 380 W (Koehler et al., 2011). In all cases, the level of underestimation increased with increasing exercise intensity (Drenowatz and Eisenmann, 2011; Koehler et al., 2011, 2013; van Hoye et al., 2014). However, visual inspection of the combined data from all five studies (**Figure 2**) suggests that differences between SWA

comparison to the reference method (indirect calorimetry; open symbols) and the difference between SenseWear and indirect calorimetry (gray symbols). The dotted line depicts an exercise intensity of 35 mL/kg/min (10 METs). Data published by Drenowatz and Eisenmann (2011) stem from 20 male and female runners (VO2peak: 57 mL/kg/min); Data published by Koehler et al. (2011) stem from 14 triathletes (VO2peak: 58 mL/kg/min) who were assessed while running and biking; Data published by Koehler et al. (2013) stem from 19 endurance and strength trained men (VO2peak: 55 mL/kg/min) who were assessed while running; Data from van Hoye et al. (2014) stem from 23 male kinesiology students (VO2peak: 69 mL/kg/min) and 20 female kinesiology students (VO2peak: 53 mL/kg/min) who were assessed while walking and running; Data published by Van Hoye et al. (2015) stem from 39 male and female kinesiology students (VO2peak: 58 mL/kg/min) who were assessed while walking and running.

and IC are rather modest at low-to-moderate exercise intensities. At exercise intensities above 35 mL/kg/min (10 METs) SWAmeasured ExEE, however, tends to plateau whereas IC-measured ExEE increases continuously, resulting in a stark increase in the level of underestimation. It is noteworthy that all studies employed an incremental exercise test to assess the validity of the SWA at multiple exercise intensities. To our knowledge, only one study separately used a 30 min exercise bout at a self-selected intensity, resulting in a similar level of underestimation of 27% (Drenowatz and Eisenmann, 2011).

# Validity during Resistance Exercise

Only few studies have examined the accuracy of the SWA during resistance-type exercise. Benito et al. (2012) reported an underestimation of ExEE during circuit-type resistance training at 30, 50, and 70% of the 15RMmax in a mixed sample of 29 recreationally active participants. Compared to IC, SWAestimated ExEE was 32% lower in men, corresponding to a difference of 2.3 METs, and 21% lower in women (1.1 METs). Furthermore, the degree of underestimation increased with increasing exercise intensity, although this effect was only significant in men (Benito et al., 2012). On the other hand, the SWA slightly overestimated exercise EE by an average 35 kcal per session during self-selected resistance exercise in a mixed sample of 52 participants of varying age and fitness level (Bai et al., 2016). The measurement error at the individual level was reported at 15%. However, the average exercise intensity was rather low during these sessions (3.2 METs) and may not resemble a typical resistance exercise session in athletic populations. Using a more traditional resistance training protocol of 9 exercises covering all major muscle groups with 3 sets of 10 repetitions at 70% of the 1-reptition maximum, the SWA provided accurate estimates of ExEE with an error of less than 5% and a strong correlation for ExEE (r = 0.77) and TDEE (r = 0.97) (Reeve et al., 2014). Measurement errors also remained constant across the ExEE spectrum with an almost perfect reliability of the SWA (testretest r = 0.96). It should, however, be considered that ExEE was integrated over the course of the exercise bout; no information was provided on the measurement accuracy for specific exercise types (Reeve et al., 2014).

### Validity during Mixed Exercise Forms

Similar to studies addressing resistance-type exercise, there has been only limited research examining the accuracy of the SWA during mixed exercise forms, particularly in athletic populations. Zanetti et al. (2014) assessed the accuracy of the SWA during a 42-min sport-specific intermittent exercise trial in 14 male rugby players. While there was no clear trend toward over- or underestimation of ExEE with a mean bias of −0.2 kcal/min (−1.9%), results revealed only a moderate correlation between the SWA and IC (r = 0.55). During a 30-min basketball-specific skill session, the SWA, however, was shown to underestimate ExEE by 1.1 kcal/min (15%) (Taylor, 2012). EE during recovery period following intermittent exercise training, on the other hand, was overestimated by 17% by the SWA when compared to IC (Zanetti et al., 2014).

### LIMITATIONS OF THE SENSEWEAR ARMBAND: ALGORITHM VS. METHODOLOGY

Despite the tendency to underestimate ExEE during highintensity exercise, available data suggest that the SWA can reliably detect activity patterns, rest periods, and varying levels of exercise intensity within individuals. For example, significant intra-individual correlations between IC and SWA was reported in 90% of endurance athletes who ran at exercise intensities between 65 and 85% VO2max (Drenowatz and Eisenmann, 2011). In another study involving incremental treadmill running at speeds between 10.8 and 17.3 km/h, raw data including acceleration counts, and particularly counts in the longitudinal plane, increased continuously as workload increased (Koehler et al., 2013), demonstrating that the technology is suited to detect movement patterns even at higher exercise intensities. Consequently, limitations to the proprietary algorithm are a candidate source for the underestimation of ExEE during high-intensity exercise. Several studies have tested whether algorithm adjustments could improve the validity of the SWA during exercise. In one of the first published validation studies, Jakicic et al. (2004) reported that the accuracy of the SWA improved after algorithm revisions. After the initial algorithm underestimated ExEE during walking, stepping, and cycling by 7–29% and overestimated ExEE during arm ergometry by 29%, the researchers provided a subset of their data to develop exercise-specific proprietary equations, which reduced errors in ExEE measured by the SWA to a non-significant level. However, ExEE values, which peaked during stair stepping at 5.3–9.2 kcal/min, did not exceed the 10 MET-threshold. More recently, Van Hoye et al. (2015) compared two different algorithms during low- and moderate-intensity treadmill running in well-trained students, reasoning that a newer algorithm would provide more accurate estimates of EE as the manufacturer updates proprietary algorithms on a regular basis. When compared to the initially used algorithm (version 2.2.), data processed using a newer algorithm (version 5.2) reduced the measurement error from 18–24 to 5–17%, although ExEE remained underestimated.

### APPLICATION OF THE SENSEWEAR ARMBAND IN ATHLETIC POPULATIONS

Despite the previously mentioned limitations, several groups have used the SWA to track EE in athletes. In adolescent sprinters undergoing high-intensity exercise training, Aerenhouts et al. (2011) measured TDEE, ExEE, and activity patterns using the SWA. When compared to self-report, the SWA registered less time spent in high-intensity activity, although this difference did not result in differences in TDEE, which was within 6% of the TDEE derived from activity diaries. The authors also highlighted the need for additional information when athletes fail to wear the SWA for 24 h. The SWA was also used to record ExEE during the competitive season in volleyball players (Woodruff and Meloche, 2013). SWA-recorded ExEE was found to be higher during games when compared to practice and warm-up sessions. Combining SWA data with diet logs and body composition assessment, the authors further concluded that the majority of the athletes were in an energy-balanced state. Using the SWA to quantify non-exercise activity thermogenesis (NEAT) among endurance athletes undergoing periods of high and low training volume, Drenowatz et al. (2013) demonstrated that the high training volume did not result in a compensatory reduction in NEAT;

### REFERENCES


instead, athletes reduced their sedentary activities to allow for more training time. In professional Australian Football players, the SWA was used to document the contribution of NEAT to TDEE, which was greater on training days (85%) when compared to match days (69%) (Walker et al., 2016).

Because the SWA can be worn continuously for several days, it has also been used for the assessment of sleep quantity and quality. In male elite rugby union players, SWA-derived sleep duration was shown to be lower during game nights when compared to non-game nights, although sleep efficiency was not different (Eagles and Lovell, 2016). In another trial comparing high-intensity interval training to strength training, SWA-derived sleep efficiency was lower in the high-intensity interval condition (Kölling et al., 2016). These applications demonstrate that the SWA is well-suited to capture other biological factors, such as characteristics of sleep and NEAT, that may have important implications for athletic performance.

### CONCLUSION AND SUMMARY

Considering that the SWA has been designed for a broad market, it is not surprising that the device tends to underestimate ExEE for periods of high-intensity exercise. Although most data has been established for aerobic exercise, the SWA seems to equally underestimate ExEE during other exercise forms. When energy expenditure is integrated over longer time periods, including rest and recovery, the measurement error becomes less pronounced and estimations of TDEE tend to be more accurate, even in athletic populations. Adjustments to the proprietary algorithm that is used to derive EE may further help to improve the validity of the SWA. Unfortunately the sale of the SWA has been terminated. Recently, a new disposable device with similar functionality has been introduced but is not available for commercial application at this time (Welk et al., 2017). Another viable option is the combination of GPS data with accelerometry and heart rate to assess EE in outdoor sports (Costa et al., 2015), although the accuracy of such devices remains to be explored. Given the current lack of alternatives, the SWA continues to be used in research and practice, emphasizing the need for the continued development of wearable devices that reliably measure EE and related variables in athletic settings.

### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

resistance training with different loads. Eur. J. Appl. Physiol. 112, 3155–3159. doi: 10.1007/s00421-011-2269-5


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Koehler and Drenowatz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Respiratory Frequency during Exercise: The Neglected Physiological Measure

Andrea Nicolò<sup>1</sup> , Carlo Massaroni <sup>2</sup> and Louis Passfield3, 4 \*

<sup>1</sup> Department of Movement, Human and Health Sciences, University of Rome "Foro Italico", Rome, Italy, <sup>2</sup> Unit of Measurements and Biomedical Instrumentation, Departmental Faculty of Engineering, Università Campus Bio-Medico di Roma, Rome, Italy, <sup>3</sup> Endurance Research Group, School of Sport and Exercise Sciences, University of Kent, Kent, United Kingdom, <sup>4</sup> Faculty of Kinesiology, University of Calgary, Calgary, Canada

The use of wearable sensor technology for athlete training monitoring is growing exponentially, but some important measures and related wearable devices have received little attention so far. Respiratory frequency (fR), for example, is emerging as a valuable measurement for training monitoring. Despite the availability of unobtrusive wearable devices measuring f<sup>R</sup> with relatively good accuracy, f<sup>R</sup> is not commonly monitored during training. Yet f<sup>R</sup> is currently measured as a vital sign by multiparameter wearable devices in the military field, clinical settings, and occupational activities. When these devices have been used during exercise, f<sup>R</sup> was used for limited applications like the estimation of the ventilatory threshold. However, more information can be gained from fR. Unlike heart rate, VO˙ <sup>2</sup>, and blood lactate, f<sup>R</sup> is strongly associated with perceived exertion during a variety of exercise paradigms, and under several experimental interventions affecting performance like muscle fatigue, glycogen depletion, heat exposure and hypoxia. This suggests that f<sup>R</sup> is a strong marker of physical effort. Furthermore, unlike other physiological variables, f<sup>R</sup> responds rapidly to variations in workload during high-intensity interval training (HIIT), with potential important implications for many sporting activities. This Perspective article aims to (i) present scientific evidence supporting the relevance of f<sup>R</sup> for training monitoring; (ii) critically revise possible methodologies to measure f<sup>R</sup> and the accuracy of currently available respiratory wearables; (iii) provide preliminary indication on how to analyze f<sup>R</sup> data. This viewpoint is expected to advance the field of training monitoring and stimulate directions for future development of sports wearables.

Keywords: breathing, effort, wearable sensors, training monitoring, athletes

# INTRODUCTION

The large diffusion of wearable devices has stimulated interest in athlete training monitoring, with the aim of maximizing performance, and minimizing the risk of injury and illness (Düking et al., 2016). The development of sport-related technologies is occurring rapidly and is often guided by market forces rather than athlete or scientific needs. In this process, it is not uncommon that technological solutions and measures are available before the sport scientist or practitioner can appreciate their importance, and this can reduce the use of new technologies. Emblematic here, is the example of respiratory frequency (fR), which may provide a better marker of physical effort compared to traditionally monitored physiological variables. However, despite the availability of unobtrusive wearable devices measuring f<sup>R</sup> with relatively good accuracy, the practice of measuring f<sup>R</sup> during training is not common yet.

#### Edited by:

Billy Sperlich, Integrative & Experimentelle Trainingswissenschaft, Universität Würzburg, Germany

#### Reviewed by:

Xiangrong Shi, University of North Texas Health Science Center, United States Monoem Haddad, Qatar University, Qatar Jordan A. Guenette, University of British Columbia, Canada

#### \*Correspondence:

Louis Passfield l.passfield@kent.ac.uk

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 14 July 2017 Accepted: 31 October 2017 Published: 11 December 2017

#### Citation:

Nicolò A, Massaroni C and Passfield L (2017) Respiratory Frequency during Exercise: The Neglected Physiological Measure. Front. Physiol. 8:922. doi: 10.3389/fphys.2017.00922

**75**

### CURRENT APPLICATIONS OF RESPIRATORY WEARABLES

For a long time, f<sup>R</sup> has received little consideration also in the clinical field, despite being recognized as a vital sign capable of predicting serious adverse events. A series of papers entitled "Respiratory rate: the neglected vital sign" (Cheng et al., 2008; Cretikos et al., 2008; Gandevia and McKenzie, 2008; Steichen et al., 2008) and "Rate of respiration: the forgotten vital sign" (Parkes, 2011) contributed to redirect attention to f<sup>R</sup> in the clinical field. These contributions also inspired the present manuscript, which aims to draw attention to the potential of f<sup>R</sup> for monitoring training in sport. Due to its importance as a vital sign, f<sup>R</sup> is currently measured by unobtrusive multi-parameter wearable devices mainly in the military field, clinical setting, and during occupational activities. When these devices have been used during exercise, f<sup>R</sup> is typically used for limited applications such as the estimation of the ventilatory threshold during incremental exercise (Hailstone and Kilding, 2011). Whilst, the disproportionate and progressive increase in fR, which begins with attainment of the first ventilatory threshold, may be used as a practical non-invasive method for estimating the ventilatory thresholds (Cross et al., 2012), there are other important reasons why athletes should consider monitoring f<sup>R</sup> during training.

### RESPIRATORY FREQUENCY AS A MARKER OF PHYSICAL EFFORT

f<sup>R</sup> is often measured in exercise physiology as one of the two components (together with tidal volume) of minute ventilation. However, minute ventilation has typically received much more attention than its components, being the best single indicator of the ventilatory output. Nevertheless, recent evidence suggests that f<sup>R</sup> and tidal volume are regulated by different inputs during exercise, and that their differential responses contain valuable information (Nicolò et al., 2017a,b). f<sup>R</sup> plays an important role during exercise as a strong marker of physical effort, more so than other traditionally monitored physiological variables. The non-linear increase of f<sup>R</sup> during incremental exercise parallels the well-known time course of blood lactate (La−), resembling the change in physical effort and task difficulty experienced at exercise intensities above the first ventilatory threshold. In fact, f<sup>R</sup> better reflects physical effort than La<sup>−</sup> when an incremental test is performed after exercise-induced muscle damage (Davies et al., 2011) or glycogen depletion (Busse et al., 1991), and in patients with McArdle's disease (Voduc et al., 2004). This suggests that physical effort is more causally linked with f<sup>R</sup> than La−.

Unlike VO˙ <sup>2</sup>, heart rate (HR) and La−, f<sup>R</sup> shows an effortlike response during a variety of exercise paradigms. During both time-to-exhaustion and self-paced time trial protocols, f<sup>R</sup> increases approximately linearly over time and approaches maximal values at the end of exercise. This response is observed during both continuous (Nicolò et al., 2016a) and intermittent (Nicolò et al., 2014a,b, 2017b) exercise of different duration, and with a variety of experimental interventions that affect performance. Moreover, unlike other physiological variables, the time course of f<sup>R</sup> is closely associated with that of Rating of Perceived Exertion (RPE) (Nicolò et al., 2014a, 2016a, 2017b). This association is found even after locomotor muscle fatigue (Marcora et al., 2008) and damage (Davies et al., 2009), inspiratory (Mador and Acevedo, 1991) and expiratory (Taylor and Romer, 2008) muscle fatigue, muscle glycogen depletion (Busse et al., 1991), increases in body temperature (Hayashi et al., 2006), hypoxia (Koglin and Kayser, 2013), ingestion of sodium bicarbonate (Robertson et al., 1986), prior endurance exercise (Spengler et al., 2000), and after expiratory muscle training (Suzuki et al., 1995). Conversely, HR, VO˙ <sup>2</sup>, and La <sup>−</sup> are partially dissociated from RPE under some of these experimental interventions. Therefore, f<sup>R</sup> appears to be sensitive to different fatigue states, and thus may present potentially important implications for training and recovery monitoring. Furthermore, f<sup>R</sup> may be a good predictor of time to exhaustion during constantworkload trials (Pires et al., 2011a,b) and can help understand how effort is distributed during self-paced time trials (Nicolò et al., 2014a, 2016a). The observation that f<sup>R</sup> is a stronger correlate of RPE than other physiological variables is not novel (Noble et al., 1973; Robertson et al., 1986), and it has previously been proposed as a variable to monitor during training (James et al., 1989; Neary et al., 1995). However, the importance of f<sup>R</sup> as a marker of physical effort has emerged from recent investigations (Nicolò et al., 2014a, 2016a, 2017b).

An important feature differentiating f<sup>R</sup> from other physiological variables is the very fast response at exercise onset and offset. During sustained all-out exercise, f<sup>R</sup> increases rapidly at the beginning of exercise and quickly reaches maximal values that are maintained throughout the trial, even where an exponential decrease in power-output occurs (Nicolò et al., 2015). A rapid response of f<sup>R</sup> is also observed during the alternation of work and recovery phases characterizing highintensity interval training (HIIT) (Nicolò et al., 2014b, 2017b). Furthermore, f<sup>R</sup> changes in proportion to workload variations in work and recovery across different HIIT sessions (Nicolò et al., 2017b). This makes f<sup>R</sup> a useful variable to describe the fast changes in effort that characterize HIIT (**Figures 1A–C**). In contrast, VO˙ <sup>2</sup> and HR do not respond abruptly to such changes in workload (Nicolò et al., 2014b, 2017b).

The experimental evidence for f<sup>R</sup> as a marker of effort is substantiated by our understanding of the mechanisms underlying its regulation. One of the major regulators of ventilation during exercise is central command (Forster et al., 2012), i.e., the central neural drive associated with voluntary motor effort. Moreover, it has been suggested that central command regulates preferentially f<sup>R</sup> rather than tidal volume (Nicolò et al., 2017b). Central command is also the sensory signal for perceived exertion (Marcora, 2009), and this provides a neurophysiological explanation for the association observed between perceived exertion and fR. This is why in the present manuscript we refer to "physical effort" as a theoretical construct which is distinct from, but linked to, perceived effort. Physical effort can be defined as the degree of motor effort, (i.e., the magnitude of central command) (Nicolò et al., 2016b). For the applied sport scientists and practitioners, physical effort (and thereby fR) reflects how hard, heavy and strenuous a physical task

FIGURE 1 | Typical subject performing a 20-s work 40-s rest self-paced intermittent cycling time trial lasting 30 min (i.e., 30 repetitions). Data are from Nicolò et al. (2014a). The time course of power output is depicted in (A). Of note, f<sup>R</sup> responds very fast to the alternation of the work and recovery phases, and increases progressively over time (B). The rapid change in f<sup>R</sup> according to variations in workload can be better observed by showing the time course of f<sup>R</sup> within the 60-s work-recovery cycle (C). The solid thick line represents the average of the entire trial, the dashed lines represent each repetition and the solid vertical line separates the 20-s work from the 40-s recovery. For details on this analysis see Nicolò et al. (2014b). This is also a convenient representation to show f<sup>R</sup> data real time during HIIT. In order to synthesize the effort of the training session, the f<sup>R</sup> distribution (D) and concentration (E) profiles have also been constructed. The distribution profile describes the time spent above each fR-value, while the concentration profile describes the time spent at each fR-value. Both analyses can also be used to describe several training sessions. See Kosmidis and Passfield (2015) for more details on the two analyses.

is, whilst perception of effort is the conscious sensation of this physical task (Marcora, 2010).

Sports scientists and practitioners are therefore encouraged to consider f<sup>R</sup> among the variables to monitor in training. Note, most of the evidence suggesting f<sup>R</sup> to be a valid marker of effort comes from studies that used cycling as exercise modality, while less data are available on other exercise modalities. A similar f<sup>R</sup> response was observed during incremental exercise performed either with legs or arms separately as well as with legs and arms combined, despite considerable differences in absolute VO˙ 2, workload and HR (Robertson et al., 1986). This suggests that f<sup>R</sup> reflects the effort exerted during exercise irrespective of absolute workload, metabolic demand, and muscle masses involved. On the other hand, different ventilatory responses have been found when comparing running with cycling (Elliott and Grace, 2010). A different degree of entrainment (coupling between locomotion and breathing rhythms) between cycling and running is often proposed as an explanation for between-modality differences in fR, but experimental evidence is conflicting. The entrainment phenomenon is well-documented in some sports like rowing, where high inter-individual variability in entrainment pattern is observed (Siegmund et al., 1999). Thus, for rowing a degree of caution is suggested in the interpretation of f<sup>R</sup> until more research is conducted.

### HOW TO MEASURE RESPIRATORY FREQUENCY IN THE FIELD

The limited consideration given to f<sup>R</sup> in sport should not be ascribed to technical limitations. It is the easiest ventilatory variable to measure during exercise and several respiratory wearables have been developed. Directly, f<sup>R</sup> can be measured with portable devices registering flow-rate at the mouth (e.g., flow sensors), but require the use of a facemask. These devices (e.g., K5, Cosmed, Rome, Italy) are accurate but relatively obtrusive and not well-suited to training monitoring. However, they are widely used as criterion devices for validating less obtrusive respiratory wearables. Indirectly, f<sup>R</sup> can be measured using the strain and movements of the chest and abdomen induced by ventilation, the sound of breathing, or the effect that ventilation has on biosignals such as electrocardiogram (ECG) and photoplethysmogram (PPG). f<sup>R</sup> can also be measured with sensors monitoring exhaled carbon dioxide, air temperature or humidity, but these sensors are not commonly considered for wearable solutions used in sport.

The majority of commercially-available respiratory wearables register ventilation-induced thoracic and/or abdominal strain through sensors embedded into straps or clothes. Commonly used sensors are inductive (Hexoskin <sup>R</sup> , Carré Technologies Inc., Montreal, Que., Canada; LifeShirt <sup>R</sup> , Vivometrics, Inc., Ventura, CA, U.S.A.; EquivitalTM EQ02 LifeMonitorTM, Hidalgo Cambridge, U.K.), piezo-electric (Pneumotrace IITM, UFI, Morro Bay, CA, USA), capacitive (ZephyrTM BioHarnessTM, Zephyr Technology, Auckland, New Zealand), and piezo-resistive (Wearable Wellness SystemTM, Smartex S.r.l., Italy). The accuracy of most of these respiratory wearables is good as assessed by comparison with a flow sensor criterion device. For instance, a mean average difference (bias) ± limits of agreement (LoA) of ∼0.3 ± 2 and 0.2 ± 2.4 breaths·min−<sup>1</sup> was found for Hexoskin <sup>R</sup> during submaximal incremental walking (Villar et al., 2015) and for EquivitalTM EQ02 LifeMonitor during moderate-intensity walking and running (Liu et al., 2013), respectively. A bias ± LoA of −0.1 ± 5.7 breaths·min−<sup>1</sup> was found for LifeShirt <sup>R</sup> during a maximal incremental running test (Witt et al., 2006). A bias ± LoA of −0.6 ± 5 and 0.2 ± 8.3 breaths·min−<sup>1</sup> was found for ZephyrTM BioHarnessTM during a maximal incremental running test and a prolonged moderate-intensity running trial in the heat, respectively (Kim et al., 2013). However, direct comparison of the accuracy of different strain sensors in estimating f<sup>R</sup> during exercise is lacking, and requires further investigation.

Respiratory wearables positioned on the torso can be affected by non-respiratory chest and abdomen movements during locomotion. This problem is commonly addressed when respiratory wearables based on movement sensors are used like accelerometer-based devices registering chest and/or abdomen movements (i.e., inclination changes), and algorithms resilient to motion artifacts have been developed (Liu et al., 2011). Compared to the use of a single accelerometer, the estimation of f<sup>R</sup> improved with a sensor fusion method combining accelerometer and gyro-sensor outputs (Yoon et al., 2014). An improvement of 4.6 and 9.54% was observed during treadmill interval training and resistance exercise, respectively, and this method was found suitable for real-time f<sup>R</sup> monitoring (Yoon et al., 2014). Respiratory wearables based on magnetometers have also shown good agreement, with a bias ± LoA of ∼0.2 ± 3 bpm breaths·min−<sup>1</sup> during moderate walking (McCool et al., 2002). The combination of strain sensors with movement sensors capable of detecting motion artifacts might be an attractive solution for future development of respiratory wearables.

The sound of breathing is used in the clinical field for estimating fR, but it has received little attention in sport (Peterson et al., 2014). Recording breathing sound during exercise may have some advantages in view of the relatively loud sounds produced, especially during high-intensity. Anecdotally, athletes report monitoring the breathing sounds of their opponents as a gauge of their physical effort during endurance competitions. However, environmental noise can interfere with the quality of the acoustic registration and may explain why little attention has been devoted to breathing sound so far.

It is well-established that ventilation affects the morphology of the ECG signal, and that f<sup>R</sup> can be extracted from the ECG with different techniques (Helfenbein et al., 2014). A few encouraging attempts have also been made to derive f<sup>R</sup> from ECG during cycling exercise (Bailón et al., 2006; Schumann et al., 2016). It is also documented that ventilation affects the PPG signal (Meredith et al., 2012), from which f<sup>R</sup> can be extracted with appropriate computational processing (Charlton et al., 2016). The PPG signal is receiving growing attention in the sports wearable sector because of its simplicity of recording; for instance, it can be obtained from different body sites like the finger, the wrist and the earlobe. Nevertheless, data on the validity of f<sup>R</sup> extracted from the PPG signal during exercise is sparse. In an early attempt made during cycling incremental exercise, motion artifacts prevented a good estimation of f<sup>R</sup> and the error of estimation increased with the increase in exercise intensity (Nakajima et al., 1996). Some of these problems may be overcome with the application of robust filters and appropriate computing techniques (Lee et al., 2011). However, more research is needed to evaluate whether f<sup>R</sup> can be satisfactorily estimated from the ECG or the PPG signal during exercise.

Work on the development of respiratory wearables is likely to increase from a technological point of view (including the computing sector), because a range of sensors and methods can be used to measure fR. Therefore, we expect growing interest in the development of fR-based wearables specifically designed for sporting activities, triggered by the understanding of the importance of f<sup>R</sup> for training monitoring. Among the wearables currently available, those measuring chest strain are the most numerous, and their accuracy is generally good. However, the wearability of some of these devices needs to improve before use in monitoring training. Further validation studies are needed to guide sport scientists and practitioners on the choice of the suitable device. Validation studies have generally targeted few exercise modalities (mainly walking and running), and some devices have only been tested during moderate-intensity exercise.

### HOW SHOULD RESPIRATORY FREQUENCY DATA BE ANALYZED?

Since we are at an early stage of training monitoring by means of fR, this section aims to provide some initial guidelines on how to deal with f<sup>R</sup> data. It is important to point out that the variability of f<sup>R</sup> is relatively high if compared to that of other physiological variables like HR (Faude et al., 2017). This is not necessarily a limitation because f<sup>R</sup> is also sensitive to variations in performance induced by a variety of experimental interventions, indicating its relatively high signal-to-noise ratio. However, the variability issue should be considered when analyzing and interpreting f<sup>R</sup> data. A breath-by-breath f<sup>R</sup> dataset should be filtered for errant breaths (i.e., values resulting after coughs, sighs, swallows, etc.), as commonly performed for gas exchange analysis (Lamarra et al., 1987). Subsequently, data can be interpolated to 1-s intervals and bin averaged according to experimental or practical needs. Due to the inherent variability of fR, the maximal value of f<sup>R</sup> (fRmax) should not be taken from breath-by-breath values but from an average of no <10 s. For the same reason, average values should be displayed real time during training activities rather than breath-by-breath values.

The f Rmax reached during maximal effort exercise is similar across different exercise paradigms and durations (Kift and Williams, 2007; Nicolò et al., 2014a,b, 2016a, 2017b), with few extreme exceptions (Nicolò et al., 2015). Therefore, different maximal exercise protocols appear to be suitable for measuring f Rmax. It is convenient to normalize f<sup>R</sup> to f Rmax to develop prescription and monitoring strategies that can be generalized, since there is relatively high variability in f Rmax across different individuals, and the factors determining this variability are not well-understood. The first attempt to interpret f<sup>R</sup> data normalized to f Rmax was made by Nicolò et al. (2014a). They found a strong correlation between f<sup>R</sup> and RPE with similar values across a

continuous and three different HIIT trials matched for effort and exercise duration. Therefore, values from the four trials were considered together, and the regression equation of the correlation obtained was used to associate f<sup>R</sup> normalized to f Rmax with the well-known 6–20 RPE scale (**Figure 2**). For instance, a value of 80% f Rmax approximately corresponded to an effort perceived as hard, and a value of 88% f Rmax to an effort perceived as very hard, with clear implications for training prescription and monitoring. Indeed, f<sup>R</sup> is an objective variable that can be measured continuously during exercise, while RPE is a subjective variable which can only be collected at discrete points in time. This approach could be improved further by normalizing f<sup>R</sup> to the range of possible fR-values available (from f<sup>R</sup> measured at rest to fRmax), in a similar manner to the formula used to obtain the HR reserve (Karvonen and Vuorimaa, 1988). This normalization procedure could be used to provide objective real-time feedback on physical effort, with values conveniently ranging from 0 to 100. A real-time feedback could also allow athletes to voluntary alter their breathing pattern as allegedly advised by some coaches, although the potential benefit of this practice is uncertain.

Different approaches may be used to synthesize f<sup>R</sup> data from one or more training sessions. Unlike for HR, average f<sup>R</sup> is similar across maximal-effort training sessions differing in the HIIT format of exercise or duration (Nicolò et al., 2014a, 2016a, 2017b). Therefore, average f<sup>R</sup> may provide a simple preliminary description of the overall physical effort of a training session. However, more comprehensive analyses are required to fully examine the potential of f<sup>R</sup> data. Two promising analyses conceived to analyze large datasets are the training distribution and the training concentration profiles described by Passfield and Hopker (2017). The training distribution profile shows the total session time spent above the reference fR-value (which can be

REFERENCES


interpreted as the reference level of effort), which assumes every possible value (**Figure 1D**). The training concentration profile is a concentration curve (i.e., the derivative of the distribution curve), which shows the cumulative time spent training at each fR-value (effort level) (**Figure 1E**). f<sup>R</sup> distribution and f<sup>R</sup> concentration profiles would therefore provide a breakthrough in understanding training effort, which is currently summarized by a single session value of RPE.

#### CONCLUSION

In this perspective article, we aimed to present scientific evidence indicating the importance of monitoring f<sup>R</sup> during training, and to propose possible methodologies and wearable sensors currently available to measure f<sup>R</sup> in the field. We also provided indications on how to analyze and interpret f<sup>R</sup> data. This is expected to benefit athlete training monitoring and the advancement of applied research in this area of sports science, and to stimulate the development and use of respiratory wearables specifically designed for sporting activities. That of f<sup>R</sup> represents a good example of how wearable sensor development should follow athlete's needs and be informed by scientific findings.

#### AUTHOR CONTRIBUTIONS

All authors (AN, CM, and LP) contributed to the conception and design of the work, drafted the work or revised it critically for important intellectual content and approved the final version of the manuscript. All authors (AN, CM, and LP) agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.


during prolonged submaximal exercise. J. Appl. Physiol. 100, 414–420. doi: 10.1152/japplphysiol.00541.2005


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Nicolò, Massaroni and Passfield. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Case for Adopting a Multivariate Approach to Optimize Training Load Quantification in Team Sports

Dan Weaving1, 2, 3 \*, Ben Jones 1, 2, 4, 5, Kevin Till 1, 2, 4, Grant Abt <sup>3</sup> and Clive Beggs <sup>1</sup>

1 Institute for Sport, Physical Activity and Leisure, Leeds Beckett University, Leeds, United Kingdom, <sup>2</sup> Leeds Rhinos Rugby League Club, Leeds, United Kingdom, <sup>3</sup> School of Life Sciences, University of Hull, Kingston upon Hull, United Kingdom, <sup>4</sup> Yorkshire Carnegie, Leeds, United Kingdom, <sup>5</sup> The Rugby Football League, Leeds, United Kingdom

Keywords: multivariate analysis, global positioning systems (GPS), training load, reductionism, external training load, orthogonal analysis

Professional sports teams are investing substantial resources in monitoring the training load (TL) in their players in an attempt to achieve favorable training outcomes such as increases in performance and a reduction in negative outcomes such as injury. This investment is likely to increase as organizations explore the most recent developments in wearable technology that allow a wide variety of objective physiological and other measures to be collected concurrently and over long periods of time. The question of how all of this data can be used is one that many in our field are now asking (Foster et al., 2017). To answer this, we have to start with a definition of TL. Soligard et al. (2016) recently defined TL as:

#### Edited by:

Billy Sperlich, Integrative and Experimentelle Trainingswissenschaft, Universität Würzburg, Germany

#### Reviewed by:

Shaun Jamss McLaren, Teesside University, United Kingdom Lars Donath, University of Basel, Switzerland

\*Correspondence:

Dan Weaving d.a.weaving@leedsbeckett.ac.uk

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 22 June 2017 Accepted: 27 November 2017 Published: 12 December 2017

#### Citation:

Weaving D, Jones B, Till K, Abt G and Beggs C (2017) The Case for Adopting a Multivariate Approach to Optimize Training Load Quantification in Team Sports. Front. Physiol. 8:1024. doi: 10.3389/fphys.2017.01024 "the sport and non-sport burden (single or multiple physiological, psychological, or mechanical stressors) as a stimulus that is applied to a human biological system (including subcellular elements, a single cell, tissues, one or multiple organ systems, or the individual)"

To quantify this construct, a common approach is to determine the ratio of a single measure across two moving-average time periods (e.g., acute- and chronic-training-load-ratio [A:C]). Suboptimal (either too high or low) TL is associated with an increased risk of injury (Hulin et al., 2016). However, while many TL methods (e.g., total distance, high-speed distance, session rating of perceived exertion [sRPE]) are collected, they are used individually as "predictor" variables in these analyses. Therefore, the initial consideration should be to determine the variable(s) that provide the most valid representation of the actual load imposed on each athlete.

Establishing the validity of a TL measure is typically examined through its agreement with a criterion which represents the true value. For example, the speed derived from a global positioning system (GPS) device is compared to that derived from a radar gun (Roe et al., 2016). In this instance, the confidence that the criterion measure represents the true value is high. In contrast, determining the validity of internal TL methods is problematic due to the limited physiological markers that are available in the field, and that there is no criterion method of measuring the internal TL. In addition, the definition highlighted previously (Soligard et al., 2016) demonstrates the complexity of the internal TL construct. Therefore, despite sRPE having been reported to correlate highly with Banister's training impulse (TRIMP) (r = 0.75) (Lovell et al., 2013) and Edward's TRIMP (r = 0.70) (Kelly et al., 2016) models, in these examples the shared variance is only 56 and 49%, respectively. This means that about half of the variance is unexplained. Are we therefore adopting a reductionist approach by assuming that by association, a single measure can capture the whole (true) internal TL imposed?

Physiological systems are complex, with many disparate factors affecting the outcomes of training. In essence, every bout of exercise/training imposes specific physiological, biomechanical,

**83**

and psychological demands which vary not only with the prescribed "dose" (i.e., sets, repetitions, duration etc) but also with the mode (e.g., strength training vs. sport-specific training) of exercise (Soligard et al., 2016; Cardinale and Varley, 2017). Therefore, it is unlikely that a single independent variable will be able to capture this complexity and provide a valid measure of TL (either internal or external) and consequently, a holistic representation of TL has been suggested (Cardinale and Varley, 2017). By taking a univariate approach, we are in danger of omitting valuable information that could contribute to explaining the relationships between the imposed TL, and changes in fitness/performance/injury. For example, it is common practice to collect multiple TL variables concurrently. Recent investigations have shown that a single TL variable is unable to capture a meaningful proportion of the variance provided by multiple internal and external TL variables, which is exacerbated by the mode of training (e.g., technical-tactical, high-intensity-interval-training, sprint-training) (Weaving et al., 2014, 2017). Therefore, as the internal TL is governed largely by the external TL, external TL measures are likely to contribute "surrogate" information about the internal TL imposed and provide information that can also relate to training outcomes (Oxendale et al., 2016). In data science terms, the information contained collectively in, and between, these variables, has great potential to inform and optimize our understanding of training dose-response relationships. However, appropriately unlocking this information (without statistical/mathematical violation) can be difficult. As the variables associated with TL are often strongly correlated, multicollinearity (i.e., the degree to which variables are similar to one another) is frequently a problem. In addition, because player cohorts are small, it is often the case that the number of measured variables can exceed the number of players. As such, TL datasets can pose a considerable challenge when using traditional techniques such as logistic and multiple linear regression, thereby limiting their applicability when adopting multivariate (rather than univariate) TL analyses. However, through the use of dimension reduction techniques such as principal component analysis (PCA) (Weaving et al., 2014, 2017) and single value decomposition (SVD) (Till et al., 2016), which are immune to multicollinearity issues, it is possible to capture the complexity of a system in just a few orthogonal composite variables (i.e., variables that provide

#### REFERENCES


unique information). Because most of the variance in the system is captured in these orthogonal composite variables, it means that complex higher-dimensional systems can be represented on 2D and 3D scatter plots with minimal loss of information (Till et al., 2016). Furthermore, because the new variables are orthogonal it means that they are not correlated in any way, thus ensuring that they capture different attributes of the TL "system." Single value decomposition and eigen-decomposition are at the heart of other useful data science techniques, such as partial least squares correlation analysis (PLSCA) (Beggs et al., 2016), which have great potential with respect to TL quantification. Rather than taking a conventional statistical approach, PLSCA utilizes the concept of shared information to gain new insights into the relationships between groups of variables (i.e., both predictor and response variables) in complex datasets. For example, using PLSCA, the relationship between multiple TL variables (e.g., total-distance, high-speed-distance, and s-RPE) and multiple "fatigue" variables can be investigated in a single analysis, allowing stronger inferences to made of the "doseresponse" nature of these broad theoretical constructs that we wish to represent.

Despite the perceived increases in computational demands placed on practitioners, the authors feel that this multivariate approach warrants further investigation, at least initially in research, given the importance of TL measures in optimizing the preparation of team-sport players. It is then envisaged that this approach could be integrated into athlete monitoring software platforms to "combine" unique aspects of information provided by multiple TL variables. Although developing our understanding of what individual TL measures represent is important (i.e., validity), it is hoped that multivariate approaches will further develop our knowledge of the dose-response nature of TL monitoring with important training outcomes such as the changes in fitness, performance, and injury risk.

#### AUTHOR CONTRIBUTIONS

DW, GA, and BJ: conceptualized the idea, wrote the introduction, and rationale to the commentary. DW and CB: wrote the discussion of the analysis approach to multivariate data. GA, KT, BJ, DW, and CB: drafted the manuscript, revised critically for important intellectual content.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Weaving, Jones, Till, Abt and Beggs. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Discovery of a Sweet Spot on the Foot with a Smart Wearable Soccer Boot Sensor That Maximizes the Chances of Scoring a Curved Kick in Soccer

#### Franz Konstantin Fuss <sup>1</sup> \*, Peter Düking<sup>2</sup> and Yehuda Weizman<sup>1</sup>

<sup>1</sup> Smart Equipment Engineering and Wearable Technology Research Program, Centre for Design Innovation, Swinburne University of Technology, Melbourne, VIC, Australia, <sup>2</sup> Integrative and Experimental Training Science, Institute for Sport Sciences, Julius-Maximilians University Würzburg, Würzburg, Germany

#### Edited by:

Kamiar Aminian, École Polytechnique Fédérale de Lausanne, Switzerland

#### Reviewed by:

Giovanni Messina, University of Foggia, Italy Filipe Manuel Clemente, Polytechnic Institute of Viana do Castelo, Portugal

\*Correspondence:

Franz Konstantin Fuss fkfuss@swin.edu.au

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 14 November 2017 Accepted: 18 January 2018 Published: 13 February 2018

#### Citation:

Fuss FK, Düking P and Weizman Y (2018) Discovery of a Sweet Spot on the Foot with a Smart Wearable Soccer Boot Sensor That Maximizes the Chances of Scoring a Curved Kick in Soccer. Front. Physiol. 9:63. doi: 10.3389/fphys.2018.00063 This paper provides the evidence of a sweet spot on the boot/foot as well as the method for detecting it with a wearable pressure sensitive device. This study confirmed the hypothesized existence of sweet and dead spots on a soccer boot or foot when kicking a ball. For a stationary curved kick, kicking the ball at the sweet spot maximized the probability of scoring a goal (58–86%), whereas having the impact point at the dead zone minimized the probability (11–22%). The sweet spot was found based on hypothesized favorable parameter ranges (center of pressure in x/y-directions and/or peak impact force) and the dead zone based on hypothesized unfavorable parameter ranges. The sweet spot was rather concentrated, independent of which parameter combination was used (two- or three-parameter combination), whereas the dead zone, located 21 mm from the sweet spot, was more widespread.

Keywords: smart soccer boot, pressure sensor, sweet spot, dead spot, probability of scoring a goal, center of pressure, impact force, wearable technology

### INTRODUCTION

It was recently proposed that wearable sensor technology ("wearables") aid optimizing athletes performance by providing feedback about monitored context-specific parameters (Düking et al., 2017). This approach was successfully implemented in different settings (Crowell and Davis, 2011; Windt et al., 2017). Yet, a field which received little attention is the kicking action and more precisely the foot-to-ball impact phase in soccer. This is surprising, since soccer is the most popular sport in the world, and improving kicking actions is often part of soccer training (Kellis and Katis, 2007). The lack of research on wearables analyzing the foot-to-ball impact phase surely is limited by the lack of available sensor technologies to access relevant parameters.

In soccer, the direct free kick is one possibility of scoring a goal, and up to 6.31% of all goals are scored in elite (female) soccer (Alcock, 2010). Another, more challenging technique is the curved kick, where a stationary ball follows a curved trajectory around a human wall formed by defensive players in order to hit the goal. However, before this technique can be improved optimally in individual soccer players, characteristics of an ideal curved direct free kick must be analyzed and established. From a biomechanical point of view, soccer kicks can be analyzed from several

**86**

kinematic and kinetic aspects, i.e., the approach, the supporting leg, the kicking leg, joint velocities, and the foot-to-ball impact (Kellis and Katis, 2007; Lees et al., 2010). However, it can be argued that the foot-to-ball impact phase is the paramount aspect of the kick since it is the only time players can influence the speed, spin and direction of the ball. In general, very little research has been conducted on curved direct free kicks and there is no single study available that addresses the differences of successful and non-successful curved direct free kicks in the foot-to-ball impact phase. This issue partly arises from methodological limitations related to evaluating the foot-to-ball impact phase.

The kicking action and particularly the foot-to-ball contact is usually investigated kinematically, by using high-speed cameras or motion analysis systems with body segment markers (Barfield et al., 2002; Dichiera et al., 2006; Nunome et al., 2006; Ishii and Maruyama, 2007; Shinkai et al., 2008; Scurr and Hall, 2009; Ball, 2011). The data sampling frequency or frame rate ranges from 50 Hz (Dichiera et al., 2006) to 5 kHz (Shinkai et al., 2008). Force plates are only useful to capture the action of the support leg (Ball, 2011). EMG (electromyography) was employed for analysing the muscle activity during kicking (Bauer, 1983; Dorge et al., 1999; Orchard et al., 2001). The kick impact force was estimated or derived in two different ways. Ishii and Maruyama (2007) assessed the deformation of the ball with high-speed cameras (2.5 kHz), as the force is a power function of the deformation based on Hertzian contact mechanics. The force calculated was ∼1,200 N. Shinkai et al. (2008) also used high-speed cameras (5,000 Hz) for estimating the velocity of the center of mass of the ball, the slope of which at time of peak deformation (±1 ms) corresponds to the peak acceleration of the ball. The product of the latter and the mass of the ball yields the peak impact force. The average peak force reported by Shinkai et al. (2008) amounted to 2,847 ± 538 N. In summary, the problem of kinematics is that impact force can only be estimated, if calculated from other parameters obtained from kinematic analysis, rather than measured directly. The center of pressure (COP) of the foot-to-ball impact phase, however, cannot be determined from close-up ultra-high-speed camera data accurately.

To the best of our knowledge, the only research using wearable sensor technology specifically aiming to analyse the foot-toball impact phase was performed by Hennig et al. (2009) who equipped two shoes (best and the worst shoes in terms of instep kicking accuracy out of five commercially available soccer shoes) with a Pedar (Novel GmbH, Munich, Germany) pressure distribution measuring insole located outside of the shoe upper (Hennig et al., 2009). The pressure was measured on every other sensor at a frequency of 571 Hz. From the pressure data, the summation center of pressure (COP) was calculated (Hennig, 2011), which was located more medially and more proximally in the shoe that delivered more accurate kicks. While providing meaningful results, from our perspective, transferability of these to practice is restricted by the high costs of the used wearable sensor technology whereby this technology cannot be made available for amateur athletes. However, these are likely the ones benefiting the most from biofeedback by wearables (Düking et al., 2017).

A pressure-sensitive wearable technology was recently developed with the purpose of analyzing players' kicking technique at the foot to-ball impact phase was developed (Weizman and Fuss, 2015a,b). This technology has several advantages over commercially available pressure sensor array systems: it is cheap (cheaper than the Pedar insole by a factor of ∼100); highly accurate in terms of impact COP measurement (far more accurate than a Kistler force plate); samples the data at 2–2.5 kHz per sensor. The pressure-sensitive wearable technology can be incorporated into athletes' footwear (which we will call from now on the "Smart Soccer Boot") to precisely measure the position of the COP and the magnitude of the impact force at each instance in time at the contact area between a player's kicking foot and the ball. The Smart Soccer Boot was originally developed for training purposes, specifically to monitor the training load of kicking.

The aim of this study is to use the Smart Soccer Boot for exploring the accuracy of curved kicks, evaluating the probability of scoring a goal, linking the chances of success to dynamic parameters obtained from the smart boot (such as impact force and location of the center of pressure), and analyzing whether there is a spot on the shoe (sweet spot) that maximizes the chances of success when kicking the ball. This leads to four hypotheses:

Hypothesis 1: there is no significant difference between the measured dynamic parameters (COPx, COPy, impact force) of all hits (scoring a goal) and those of all misses. The reason being that there is no single "sweet spot" on the shoe or foot that guarantees a success rate of 100% for scoring a goal. There might be ball-shoe or -foot contact spots or zones that offer more or fewer chances of scoring a goal with areas of average chances in between these specific spots. As the probabilities are distributed across these spots and the areas in between and as the chances at the specific spots are not exactly 100 or 0% either, the dynamic parameters associated with hits and misses might considerably overlap and therefore not exhibit a significant difference.

Hypothesis 2: There is a favorable parameter range that maximizes the probability of success as well as an unfavorable parameter range that minimizes this probability. A method for identifying such parameter ranges has to be developed, which this research is based on. Furthermore, parameters of all hits and all misses cannot be directly compared, but rather extreme cases such as successful kicks and parameters within the favorable range vs. unsuccessful kicks and parameters within the unfavorable range. This approach separates the data and is expected to result in significant differences between COP locations that provide more or fewer chances of scoring a goal. The COP locations, however, are seen as a continuum across increasing/decreasing probabilities of success, and their extremes locations are spots with maximum chances and minimum chances of scoring a goal.

Hypothesis 3: If there is a favorable/unfavorable parameter range, then the extreme COP location related to the favorable range constitutes a well-defined sweet spot. If there really is a spot that maximizes chances then this will be a "sweet spot," the definition of which is the location of COP that maximizes chances of scoring a goal.

Hypothesis 4: If there is a sweet spot on the boot/foot, then there is also a dead spot or zone. The dead spot is a spot located differently from a sweet spot, whereas a dead zone is e.g., a ring around the dead spot if there are feasible contact points around the sweet spot. Otherwise, a sector of a ring could be found that minimizes the chances of scoring a goal if the ball-boot/foot contact is located within this dead zone.

The use of an accurate measurement device is indispensable for this task, which, naturally, must be in the form of a wearable device located at the medial and dorsal part of the foot. Although the Pedar insole (Novel GmbH, Munich, Germany) is wearable inside a shoe, wrapping it around a soccer boot (as done by Hennig et al., 2009; Hennig, 2011, for finding the average COP of two shoes with different kick accuracies) is difficult as it was designed to be worn inside a shoe for plantar pressure measurement. As such, a smart wearable device specifically designed for measuring the ball-to-boot or -foot impact force and COP with high accuracy (Weizman and Fuss, 2015a,b; Weizman, 2016) was used in this study.

The term "sweet spot" used in this paper is adapted from sports implements. In racquets, bats and clubs, hitting a ball with the sweet spot either maximizes the performance (increase in ball speed; e.g., power spot of tennis racquets), or minimizes the risk of overstrain injuries (node point that minimizes racquet vibrations, and center of percussion that minimizes the shock force at the hand; Fuss, 2011; Fuss et al., 2014). These features are not applicable to the "sweet spot" on a shoe, boot or foot; nevertheless, kicking a ball at the sweet spot hypothetically maximizes the player's performance by increasing the chances of scoring a goal.

### METHODOLOGY

#### Smart Soccer Boot

The sensor array system for the Smart Soccer Boot (Weizman and Fuss, 2015a) consists of 16 sensor cells (**Figure 1**), a programmable microcontroller and a compact electronics circuit board. All sensor cells are arranged in a 4 × 4 matrix, whereby each cell is 20 × 20 mm separated by a 1 mm gap. The piezoresistive material used for the sensors consisted of an offthe-shelf piezoresistive vinyl, and exhibited a linear calibration curve when the pressure was plotted against conductance data derived from force impact tests (Weizman, 2016). Each sensor was calibrated individually and validated against a Kistler force plate (type 9260AA6, Kistler, Winterthur, Switzerland) with impact forces ranging from 368 to 2,146 N (Weizman and Fuss, 2015b). The R 2 values when correlating measured sensor impact forces against measured impact force on the Kistler force plate ranged from 0.9333 to 0.9882 (0.9647 ± 0.0189; Weizman, 2016). The validation of the COP obtained from the force sensor against the one returned from the Kistler force plate failed, as the Kistler force plate was not able to measure the COP of impact forces accurately (**Figure 1**). In most cases, the COP obtained from the Kistler force plate was even outside the impacted sensors (impact on 4 adjacent sensor cells only, 2 × 2 matrix), even if the impact was confined to 4 sensors with a 10 mm thick wooden spacer, thereby preventing loading of adjacent areas (Weizman, 2016). The COP returned from the sensor was always very close to the center of the 4 sensor cells ["very close" because the impact force was applied manually and could not be centered precisely; (Weizman, 2016); **Figure 1**]. High precision in determining the center of pressure is paramount for the present study and its results.

#### Participants

Ten right-footed and experienced male soccer players (n = 10; age = 26 ± 1.71 years; body height: 177.1 ± 5.43 cm; body mass: 75.2 ± 8.36 kg; shoe size [EU]: 43 ± 1.4) volunteered to participate in the study after having been extensively informed about all testing procedures. The recruited players were trained midfielders or strikers with at least 6 years of soccer training at a non-professional level.

The study was granted Ethics approval by the RMIT University Human Ethics Committee (approval no. ASEHAPP 28-14). All testings were carried out in accordance with the Declaration of Helsinki. No player suffered from injury, illness, and/or disease and all players were instructed to have eaten a light meal 1 h prior to testing, and to stay well hydrated. However, this was not specifically tested for by the investigators of this study.

#### Sensor Placement

For the purpose of this study and for reasons of consistency and comparability, the sensor system had to be placed on specific anatomical landmarks to cover the contact area between the foot and the ball for the curved kicks. The sensor placement on the anatomical landmarks of the foot is visualized in **Figure 2a**. The sensor system is not implemented in a soccer boot yet and a design is warranted in which the sensor can be placed securely on the aforementioned anatomical landmarks.

To solve this issue, a placement of the sensor system inside a boot under the soccer shoes upper was tested. However, this approach was not satisfying since proper placement of the sensor cells could not be guaranteed and was rejected consequently.

Secondly, a pocket made of artificial leather was produced in which the sensor cells fit securely. This leather pocket could be fitted to the outer upper part of a soccer boot by Velcro tape. Even though being promising, this approach was rejected as well due to the same reasons as the first approach. The method of placing a pressure sensor on the shoe upper was actually used by Henning (Hennig et al., 2009; "The pressure measuring pads were adjusted on top of the shoes to the foot anatomy, guaranteeing that all sensors were matched to identical anatomical locations of the individual feet," Hennig, 2011). However, we experienced problems of identifying the anatomical landmarks by palpation through the shoe upper, and therefore abandoned this method.

In a third approach, an off-the-shelf sock (EU-size 42–44) with a hand-stitched thin layer of artificial leather on top was used to build a pocket in which the sensor cells fit properly (**Figures 2b,c**). Artificial leather was chosen to mimic the upper material of commercially available soccer shoes as close as possible.

A snap fastener on one corner of the leather pocket allowed to insert and remove the sensor cells easily to allow maintenance if

FIGURE 1 | Pressure sensor matrix and its validation against a Kistler force plate; COPx, COPy: position of the center of pressure in x- and y-direction of the coordinate system of the sensor matrix; (A) 4 × 4 sensor matrix and the positions (d1–d4, c1–c5) of the spacers for impact loading of nine 2 × 2 quarters for validating the position of the center of pressure (COP); (B) COPs obtained from the force plate with respect to the sensor matrix (dashed black square; note that COPs cannot be outside the sensor matrix; yet, the force plate returned impossible COP positions); (C) COPs obtained from the pressure sensor matrix (black dots: COP position; red dots: average position; green ellipse: area of one standard deviation of COPx and COPy with respect to the average, black elliptic contour: cluster of all COP positions per quarter; note that the average COPs are not exactly at the center of each quarter as it was impossible to impart the impact force exactly at the center of each spacer); (D) Comparison of average COPs obtained from the force plate (red) and the pressure sensor matrix (green). ©Yehuda Weizman, reproduced from Weizman (2016) with kind permission.

necessary. With this set up it is possible to equip different players easily with the sensor system, while simultaneously keeping the comfort of players high amidst kicking. Additionally, the design of the sock allowed a precise placement of the sensor system on the same anatomical landmarks for each participant, which is crucial for the purposes of this study. For these reasons, this approach was selected to analyses the characteristics of curved kicks with participants.

Sensor cell 1 was placed on the medial side of the metatarsophalangeal joint I. The medial edge of the sensor was aligned to the medial side of the metatarsal I in proximal direction to the medial cuneiform. The anterolateral corner of the sensor was located on the metatarsophalangeal joint IV. The lateral side of the sensor matrix was aligned to metatarsal IV.

### Experiments

To test the hypotheses, each player conducted 8–18 curved direct free kicks in windless conditions on artificial grass with a standard size 5 ball with an internal pressure of 0.8 bar (∼11.6 psi). Players performed a standardized warm-up and were allowed to take several test kicks to familiarize themselves with the task prior to the actual testing. For all kicks, players were told to kick the ball as they would normally do in competition and not to alter their kicking technique in any way.

Slightly modified from a previously used set-up (Alcock et al., 2012), the ball was positioned at 20 m distance in a straight line from the right goal post (**Figure 3**). An artificial wall made out of polymer material with a height of 1.83 m was placed at a distance of 9.15 m away from the ball and was placed sideways by an experienced goalkeeper as he/she would do in competition, i.e., 1½ players are placed outside of an imaginary line between the ball and the goal post closer to the ball. The aim for each player was to curve the ball around the artificial wall on the right side, and to hit a target with the dimensions of 2.44 × 2.44 m (1/3 of

FIGURE 2 | Sock with artificial leather stitched on top to secure the sensor matrix in place; (a) Placement of the sensor matrix (black contour; note that this black contour is not square as the sensor matrix is wrapped around the medial side of the foot, as seen in subfigure c), on anatomical landmarks and its coordinate system; (b) Instrumented sock and foot-to-ball contact; (c) Leather on top of the sensor matrix wrapped around the medial side of the foot, including snap fastener for securing the sensor matrix in place.

a full-sized goal) which was placed on the right side of the goal. Consequently, the ball had to follow a left-curved trajectory in order to hit the target. The kick was recorded as unsuccessful (miss) if the ball did not hit the target or was not curved around the artificial wall properly. Missed kicks were found to be on the right side of the target area, but never on the left side.

#### Data Analysis

The raw data of all 16 pressure sensor cells were collected at 2–2.5 kHz in ASCII format (10-bit analog to digital converter). The time series of the ASCII data was converted to voltage (drop voltage measured across the reference resistor of each of the 16 voltage dividers). From the voltage, the following parameters were calculated in sequence as a time series: the resistance of each pressure sensor (calculated from the voltage and the reference resistor); the conductance of each sensor (reciprocal of resistance); the pressure of each sensor (from the pressure-conductance-calibration curves); and the force on each sensor (from the sensor area and pressure). The overall impact force was determined from the sum of forces from the 16 individual cells. The center of pressure (COP) was calculated from individual forces and the position of the geometrical center of each sensor cell relative to the coordinate system of the sensor array (**Figure 1**) in x- and y-directions (COPx, COPy). The time derivatives of the distance between two consecutive COP positions delivered the instantaneous velocity of the COP.

We used the following continuous data as time series for further calculations: COPx, COPy, and impact force (F). The parameters used for statistical purposes were:


All three parameters (quantitative data) were combined with the success data (qualitative binary data: hit = 1, miss = 0), the number of the participant and the number of kick. The latter two numbers served only for identification purposes, used for attributing parameter data to the participant and the type of kick. The success data served for calculation of the probability of success P, of scoring a goal.

#### Hits against Misses Analysis

The data of the parameters (listed above) of all hits were compared to the parameters of all misses with the Mann– Whitney U-test (as some of the data sets were not normally distributed, verified with the Shapiro–Wilk test if p < 0.05) and the p-values were determined. This procedure revealed whether there is a significant difference between parameter data obtained from kicking a successful or unsuccessful curve shot. The effect size was calculated in terms of the Rank-Biserial Correlation (Cureton, 1956), r, from the U-value: r = 1 – 2U/(n<sup>1</sup> n2), where n<sup>1</sup> and n<sup>2</sup> denote the number of data compared by the Mann– Whitney test, and U ≤ 0.5 n<sup>1</sup> n2. Note that the effect size r ranges from zero to unity.

#### Regression Analysis

The probability of success P, of scoring a goal, equals the average of hits h (1) and misses m (0) across a specified parameter range.

$$P = \frac{\sum \left( h, m \right)}{n} \tag{1}$$

where n is the total number of data.

The method used in this paper is an analogy to, and optimization of, the Median–Median Line method by Wald (1940). However, instead of dividing the data into two equal size subsamples, separated by the median of the independent parameter, the separation line divided the data sample into two unequal size subsamples, which was optimized based on the conditions explained subsequently.

The entire dataset of an independent parameter including the associated hit and miss data (dependent parameter), was divided into two subsamples (data ranges), separated by a threshold value s. The subsample on one side of s delivers a greater probability of success P, compared to the subsample on the other side of s. The preferred range, for maximizing the chances of success, is identified by the higher P. The absolute P-differential D of the two subsamples should be as high as possible.

$$D = P\_1 - P\_2 = \frac{\sum\_{i=1}^{i\_s} \{h, m\}}{i\_s} - \frac{\sum\_{i=i\_s}^{n} \{h, m\}}{n - i\_s} \tag{2}$$

where i<sup>s</sup> denotes the number of the datum just before or after the threshold value s; P<sup>1</sup> denotes P before s, and P<sup>2</sup> denotes P after s; by definition, the average P<sup>1</sup> is greater than the average P2, in order to fulfill the condition of a maximum or near-maximum D.

Yet, the probability data P, on either side of s, should be significantly different. This was determined with an independent t-test, by comparing the two samples of hit and miss data (h, m) of both sides of s. An F-test for testing the significance of the difference between the variances of the two samples determined whether a homoscedastic (F-test p > 0.05) or heteroscedastic (F-test p < 0.05) t-test had to be performed. These homoand hetero-scedastic p-values as well as the F-test p-value were computed with a moving average (smaller and greater s) across the entire dataset, i.e., for all possible s-values running across the entire range of a parameter (such as COPy, Fmax, etc.). The optimal threshold value s was determined from those D-data that are


The last requirement ensures that there is a sufficient number of data left for the Kruskal–Wallis rank sum test, detailed in the next section. The optimal threshold value s divides the parameter range into two subsamples (data ranges), a favorable one (for maximizing the chances of success) and an unfavorable one (that minimizes the chances of success **Figure 4**). When comparing the data of the two subsamples, the effect size is always at the maximum, as they are separated by s. **Figure 5** is an extension of

FIGURE 4 | Principle of probability of success (P) against the range of parameter data; the black vertical line s divides the parameter range into two subsamples (smaller and greater than s); P<sup>1</sup> = average of hit and miss data for the parameter range smaller than the threshold value s (s = 6.5); P<sup>2</sup> = average of hit and miss data for the parameter range larger than the threshold value s; D = probability differential (P<sup>1</sup> – P2); P<sup>1</sup> and P<sup>2</sup> are significantly different (p = p-value); the parameter range associated with P1, the larger of the two P, is the favorable range of the parameter tested; the parameter range associated with P2, the smaller of the two Ps, is the unfavorable range of the parameter tested.

**Figure 4**, showing a realistic dataset and a feasible (ideal) and an unfeasible separation line s. The feasibility is determined by the p-value and the magnitude of D.

#### Two-Parameter Analysis

In contrast to the previous section that treats each parameter individually, this section deals with the effect of two parameters have on each other, i.e., addresses the question whether the favorable ranges of two parameters influence each other positively (by improving the probability of scoring a goal) or negatively (by diminishing the probability of scoring a goal). When selecting two parameters, then, based on their individual threshold values, four combinations (quarters of a point cloud), and associated datasets of hit and miss data, are obtained:


The four associated datasets of hit and miss data, resulting in four average probability (P) data, were tested for their significant differences. It was expected that the probability of success (P) of two parameters combined, both in their favorable ranges,

• was greater than P of either of these parameters individually in their favorable ranges; and

• was significantly greater than P of two parameters combined, both in their unfavorable ranges.

The significance of the latter expectation was tested with Kruskal–Wallis rank sum test, and the significance of the individual differences in the four average probability (P) data was assessed with two post-hoc tests: Conover and Dunn, both of them adjusted by the Holm FWER (familywise error rates) and Benjamini-Hochberg FDR (false discovery rate) methods. The effect size was calculated in terms of the Rank-Biserial Correlation r, by comparing the data sets of the two parameters in their favorable and unfavorable ranges. It is evident that the effect size results in 1 for parameter A and B (when comparing data of the favorable and unfavorable ranges); however, the third parameter (C) is not optimized (in terms of favorable and unfavorable ranges), the effect size of which is therefore smaller than unity.

#### Three-Parameter Analysis

When selecting three parameters, then, based on their individual threshold values, eight combinations and associated datasets of hit and miss data are obtained:


The eight associated datasets of hit and miss data, resulting in eight average probability (P) data, were tested for their significant differences. It was expected that the probability of success (P) of three parameters combined, both in their favorable ranges, was significantly greater than P of two parameters combined, both in their unfavorable ranges.

The significance of the latter expectation was tested with Kruskal–Wallis rank sum test, and the significance of the individual differences in the average probability (P) data was assessed with the post-hoc tests specified above.

In the three-parameter analysis, the conditions for finding the optimal threshold value s were re-defined such that:


Note that one of the previous conditions of the regression analysis (single parameter), namely s close to, or at, the maximum D, was sacrificed for obtaining the highest chance of success of scoring a goal with all three parameters in their favorable ranges. The effect size (Rank-Biserial Correlation r) is always unity when comparing the data sets of the three parameters in their favorable and unfavorable ranges.

#### COP Analysis

In order to establish a difference in the COP path of successful and unsuccessful kicks, the average paths of COPx and COPy, and the average impact forces at each COP position, were calculated from the two-parameter analyses, by taking the successful kicks of the two parameters in the favorable range, and the unsuccessful kicks of the two parameters in the unfavorable range. By taking several combinations of two parameters, the COP paths of the successful kicks as well as the ones of the unsuccessful ones should be close to each other and thereby mutually validate the sweet spot on the boot. The COP path was visualized as a bubble plot, where the bubble size corresponded to the magnitude of the impact force.

The COPs (and also Fmax) of similar kicks (successful or unsuccessful) were averaged in the following way:


### RESULTS

The Results section is organized around the four main findings in consecutive order whereby one finding leads to the next one:


### Participant Statistics

The participants kicked the ball 8–18 times (12.9 ± 3.1). Their chances of success of scoring a goal ranged from 22.2 to 72.7% (30.3 ± 20.3%). Only two of the 10 players scored in more than 50% of the attempts.

### Comparison of Parameter Data of All Hits against All Misses

The peak force (Fmax) data of all misses and all hits were 1,682 ± 519 N (678–3,161), and 1,843 ± 628 N (769–3,365), respectively. The COPx data (at Fmax) were −7.9 ± 8.0 mm (−24.3 to +20.1 mm) and −10.2 ± 7.4 mm (−22.9 to +12.4 mm), respectively; and COPy data (at Fmax) were 3.7 ± 4.4 mm (−10.4 to +15.1 mm) and 3.0 ± 5.4 mm (−4.4 to +17.2 mm).

The p-values of the three parameters were >0.05 and therefore the parameter data of all hits were not different from the parameter data of all misses. Specifically, the p-value of COPx of all hits compared to COPx of all misses was 0.187; the corresponding p-value of COPy was 0.105; and the one of Fmax was 0.119. As there was no difference between parameters of hits and misses, only a very small (if 0.01 < r < 0.2; Sawilowsky, 2009) effect was observed: COPx effect size r = 0.149; COPy effect size r = 0.183; and Fmax effect size r = 0.176. Hypothesis 1 was therefore confirmed and the method of comparing the parameter data of all hits against all misses is considered unsuccessful.

#### Trend Analysis

For the three parameters defined in the Methodology section, the threshold values s, P<sup>1</sup> before and P<sup>2</sup> after the threshold, the probability differential D at the threshold, the p-value of D, the number of significant data, and the overall trend are listed in **Table 1**.

COPx exhibited three possible threshold values (**Figure 6A**), the highest one with the best D, p-value, and P2; and the smallest one with the best P1. In the two remaining parameters (**Figures 6B,C**), there was only one threshold value that satisfied the conditions of


P<sup>1</sup> ranged from 33 to 47% (the higher, the better); P<sup>2</sup> from 14 to 22% (the smaller, the better); and D from 16 to 28% (the higher, the better).

As favorable and unfavorable parameter ranges could be found, Hypothesis 2 was confirmed.

#### Two-Parameter Analysis

The hit/miss data were divided into 4 groups for pairwise comparison (**Table 2**), and the chances of success of the groups were compared with the Kruskal–Wallis rank sum test.

#### COPx at Fmax vs. COPy at Fmax

The chances of success of the groups I, II, III, and IV (**Table 2**) were:

COPx threshold value s at −0.0086 m: 10.81, 33.33, 25, and 58.33%, respectively (quarter analysis from **Figure 7**);

COPx threshold value s at −0.0070 m: 12.12, 30, 22.92, and 57.14%, respectively;

COPx threshold value s at −0.0036 m: 10.53, 20, 20.97, and 52.63%, respectively.

The difference between these group percentages of success was statistically highly significant as determined by the Kruskal–Wallis rank sum test (p < 0.0013 for the three COPx threshold values). The post-hoc tests revealed the individual differences, namely that the percentage of a parameter of group IV (>50%) was significantly different from the percentages of groups I and III; whereas the remaining pairs were not significantly different, among which is II vs. IV, for all the three COPx threshold values.

The individual percentages of the success probability of COPx and COPy positions (both in their favorable ranges) were 33– 36.8 and 45.80%, respectively; their combined success probability exceeded the individual ones and was 52.6–58.3%.

#### COPx vs. Fmax

The chances of success of the groups I, II, III, and IV were:

COPx threshold value s at −0.0086 m: 21.74, 13.33, 23.08, and 81.25%, respectively (quarter analysis from **Figure 8**);

COPx threshold value s at −0.007 m: 21.05, 13.33, 23.33, and 81.25%, respectively;

COPx threshold value s at −0.0036 m: 20, 0, 23.08, and 68.18%, respectively.

TABLE 1 | Summary of trend analyses; s = parameter value at the separation line, separating the favorable parameter range from the unfavorable one; D = differential of P<sup>1</sup> (probability of scoring a goal if the parameter is in the favorable range) and P<sup>2</sup> (probability of scoring a goal if the parameter is in the unfavorable range); COPx, center of pressure in x-direction; COPy, center of pressure in y-direction; Fmax, maximum impact force.


(B): P against COPy (location of the COP in y-direction); (C): P against Fmax (peak impact force); note that D can be negative, if P<sup>2</sup> > P1.



The difference between these group percentages of success was statistically highly significant as determined by the Kruskal–Wallis rank sum test (p ≤ 0.00006 for the three COPx threshold values). The post-hoc tests revealed the individual differences, namely that the percentage of group IV (>68%) was significantly different from the three other percentages; whereas the other three were not significantly different.

The individual percentages of the success probability of the COPx position and Fmax magnitude (both in their favorable ranges) were 33–36.8 and 46.67%, respectively; their combined success probability exceeded the individual ones and was >68%.

FIGURE 7 | Two-parameter analysis: COPx vs. COPy; 0 = miss, 1 = hit; dashed lines = threshold values (cf. Table 1); I, II, III, IV = group codes from Table 2.

#### COPy vs. Fmax

The chances of success of the groups A, B, C, and D were 15.87, 34.29, 27.78, and 76.92%, respectively (quarter analysis from **Figure 9**). The difference between these group percentages of success was statistically highly significant as determined by the Kruskal–Wallis rank sum test (p = 0.000152). The post-hoc tests revealed the individual differences, namely that the percentage of parameter D (77%) was significantly different (p < 0.015) from

FIGURE 9 | Two-parameter analysis: COPy vs. Fmax (peak impact force); 0 = miss, 1 = hit; dashed lines = threshold values (cf. Table 1); I, II, III, IV = group codes from Table 2.

the three other percentages; whereas the other three were not significantly different (p > 0.06).

The individual percentages of the success probability of the COPy position and Fmax magnitude (both in their favorable ranges) were 45.83 and 46.67%, respectively; their combined success probability exceeded the individual ones and was 77%.

### Three-Parameter Analysis

The hit/miss data were divided into eight groups initially for pairwise comparison:


Note that the suffixes "a" and "b" refer to Fmax within the unfavorable and favorable ranges, respectively. Group IIb was excluded as it consisted only of 2 data (both were misses; cube analysis from **Figure 10**, red zeros in quadrant II). The chances of success of the groups were compared with the Kruskal–Wallis rank sum test.

The threshold values for separating favorable and unfavorable ranges were re-defined in order to obtain the highest chance of success (percentage) of group IVb. The separation values for max percentage of success (85.71%) were found for COPx at −0.0036 m, COPy at 0.0027 m, and for Fmax at 2,000 N; thereby satisfying the conditions stated in the Methodology section of the three-parameter analysis. At these separation values, the chances of success for scoring a goal with two parameters in their favorable ranges out of these three parameters were: COPx vs. COPy, 51.16%; COPx vs. Fmax, 55.17%; COPy vs. Fmax: 75%. These two-parameter percentages were smaller than the optimal ones found in the two-parameter analysis, namely COPx vs. COPy: 52.63–58.33% (three different s for COPx), COPx vs. Fmax: 68.18–81.25% (three different s for COPx), COPy vs. Fmax: 76.92%.

The chances of success of the groups Ia, Ib, IIa, IIIa, IIIb, IVa, and IVb were 20, 0, 20, 16.67, 20.67, 34.48, and 85.71%, respectively (cube analysis from **Figure 10**). The difference between these group percentages of success was statistically highly significant as determined by the Kruskal–Wallis rank sum test (p = 0.000067). The post-hoc tests revealed the individual differences, namely that the percentage of group IVb (85.71%, all three parameters in their favorable ranges) was significantly different from all the other groups (p < 0.01). In contrast to this, the percentage of the other groups (excluding IVb) were not significantly different.

The individual percentages of the success probability of COPx and COPy positions and Fmax magnitude (both in their favorable ranges) were 33–36.8, 45.83, and 46.67%; their combined success probability in the two-parameter analysis was 51.16–75% (see above); and their combined success probability in the three-parameter analysis arrived at 85.71%, which exceeded the individual and two-parameter combination ones.

## Path of the COP

**Figure 11** shows eight datasets, numbered from 1 to 8 from medial (left) to lateral (right):


FIGURE 11 | Centre of pressure in y-direction (COPy) against Centre of pressure in x-direction (COPx); the bubble size of the 8 bubble plots corresponds to the impact force; the graphs are aligned to the coordinate system of the sensor matrix: positive COPy data = distal; negative COPy data = proximal; negative COPx data = medial; positive COPx data = lateral; the COP moves from distal to proximal (= downward on the graph) during impact; 1: average COP of successful kicks, COPy and Fmax within the favorable range; 2: average COP of successful kicks, COPx and COPy within the favorable range; 3: average COP of successful kicks, COPx and Fmax within the favorable range; 4: average COP of all successful kicks; 5: average COP of all unsuccessful kicks; 6: average COP of unsuccessful kicks, COPy and Fmax within the unfavorable range; 7: average COP of unsuccessful kicks, COPx and COPy within the unfavorable range; 8: average COP of unsuccessful kicks, COPx and Fmax within the unfavorable range.


#### COP of all Successful Kicks Compared to COP of All Unsuccessful Kicks

The COP moves form distal to proximal, with a slight movement to the medial side (**Figure 11**, datasets 4 and 5). The COP of all successful kicks appears to be located more medially (at least after the force peak) compared to the COP of all unsuccessful kicks; this apparent difference, however, is statistically not significant and therefore due to chance. From the Results section 2, COPx at Fmax, COPy at Fmax, and Fmax of all successful and unsuccessful kicks are similar.

#### COP of Successful Kicks with COPx and COPy in Their Favorable Ranges Compared to COP of Unsuccessful Kicks with COPx and COPy in Their Unfavorable Ranges

The average COP paths of successful (dataset 2, **Figure 11**) and unsuccessful (dataset 7, **Figure 11**) kicks were clearly separated; the COPx data at Fmax (medio-laterally) by ∼13 mm (**Figure 11**); and the COPy data at Fmax (proximo-distally) by ∼6 mm. The peak forces of both datasets (2, 7) were identical (Mann–Whitney U-test p = 0.984; negligible effect size of r = 0.005).

#### COP of Successful Kicks with COPx and Fmax in Their Favorable Ranges Compared to COP of Unsuccessful Kicks with COPx and Fmax in Their Unfavorable Ranges

The average COP paths of successful (dataset 3, **Figure 11**) and unsuccessful (dataset 8, **Figure 11**) kicks were clearly separated in the x-direction; however, there was no significant difference between the COPy at Fmax data of both datasets (Mann– Whitney U-test p = 0.2113; small effect size of r = 0.212).

#### COP of Successful Kicks with COPy and Fmax in Their Favorable Ranges Compared to COP of Unsuccessful Kicks with COPy and Fmax in Their Unfavorable Ranges

The average COP paths of successful (dataset 1, **Figure 11**) and unsuccessful (dataset 6, **Figure 11**) kicks were clearly separated; surprisingly, there was a significant difference between the COPx at Fmax data of both datasets (Mann–Whitney U-test p = 0.0033; medium effect size of r = 0.521), even if the COPx data were not optimized in this analysis (only COPy and Fmax were).

#### COP of Successful Kicks with COPx, COPy, and Fmax in Their Favorable Ranges Compared to COP of Unsuccessful Kicks with COPx, COPy, and Fmax in Their Unfavorable Ranges

**Figure 12** shows two further datasets:


The average COP paths of successful (dataset 9, **Figure 12**) and unsuccessful (dataset 10, **Figure 12**) kicks were clearly separated; the COPx data at Fmax (medio-laterally) by ∼19 mm (**Figure 11**); and the COPy data at Fmax (proximo-distally) by ∼9 mm.

#### Comparison of Plots of COP Paths

The four COP locations (green in **Figure 13**) of optimized parameters (favorable range) and successful kicks, i.e., datasets 1–3 and 9, are identical (p > 0.05) and perfectly superimposed.

The COPx values of the three 2-parameter combinations and one 3-parameter combination showed a p-value of 0.8411 (Kruskal–Wallis rank sum test), and were therefore not significantly different from each other.

The COPy values of the three 2-parameter combinations and one 3-parameter combination showed a p-value of 0.5896

FIGURE 12 | Centre of pressure in y-direction (COPy) against Centre of pressure in x-direction (COPx); the bubble size of the 10 bubble plots corresponds to the impact force; the graphs are aligned to the coordinate system of the sensor matrix: positive COPy data = distal; negative COPy data = proximal; negative COPx data = medial; positive COPx data = lateral; 1–8: cf. legend of Figure 11; 9: average COP of successful kicks, COPx, COPy, and Fmax within the favorable ranges; 10: average COP of successful kicks, COPx, COPy, and Fmax within the unfavorable ranges.

(Kruskal–Wallis rank sum test), and were therefore not significantly different from each other.

Fmax values of the three 2-parameter combinations and one 3-parameter combination showed a p-value of 0.1933 (Kruskal– Wallis rank sum test), and were therefore not significantly different from each other.

The four COP locations (red in **Figure 13**) of un-optimized parameters (unfavorable range) and unsuccessful kicks, i.e., datasets 6–8 and 10, however, are, in 2 cases, not identical and clearly separated.

Fmax values of the three 2-parameter combinations and one 3-parameter combination showed a p-value of 1.7306e-11 (Kruskal–Wallis rank sum test). The reason for this result was that Fmax was optimized in dataset 7, and therefore exhibited a higher average (post-hoc tests p < 0.001 for dataset 7 compared to datasets 6, 8, 9).

The COPx values of the three 2-parameter combinations and one 3-parameter combination showed a p-value of 0.0022 (Kruskal–Wallis rank sum test). The post-hoc p-value of dataset 6 vs. dataset 10 was p < 0.005, which are the two datasets that are furthest apart in the x-direction in **Figure 12**.

The COPy values of the three 2-parameter combinations and one 3-parameter combination showed a p-value of 0.0001 (Kruskal–Wallis rank sum test). The post-hoc p-value of dataset 6 vs. dataset 8 was p = 0.001, which are the datasets that are furthest apart in the y-direction in **Figure 12** (this result does not include dataset 10 which had a slightly higher standard deviation than set 6, and therefore was insignificantly different from set 8).

#### Definition and Position of the Sweet- and Dead Spots

We define the sweet spot as the impact zone between ball and boot/foot that maximizes the chance of scoring a goal (with a curve ball in this case).

Equally, we define the dead spot as the impact zone between ball and boot/foot that minimizes the chance of scoring a goal (with a curve ball in this case).

**Figure 13** shows the COP of datasets 1, 2, 3, 9 and 6, 7, 8, 10 superimposed on 6 ellipses, the center of which is located at the average COPx at Fmax and average COPy at Fmax, and the semi-major and semi-minor axes correspond to one standard deviation of COPx and COPy data, respectively. The location of the 4 ellipses of datasets 1, 2, 3, 9 illustrates the finding detailed in the previous section, namely that there is no significant difference between the COPx data of the 4 sets nor COPy data of the 4 sets.

The 4 ellipses of sets 1, 2, 3, 9 define the location of a sweet zone, specifically as all 4 ellipses are superimposed rather than separated. This sweet zone can be reproduced with any of the three 2-parameter datasets (1–3). The ellipse of the threeparameter analysis constitutes the actual sweet spot, the location of which is almost identical to the ellipse of set 2 (COPx and COPy in their favorable ranges). The sweet spot is located more medially and proximally than the dead spot.

The three ellipses of sets 6–8 are more separated and define the dead zone, on the lateral and distal side of the sweet spot. The ellipse of the three-parameter analysis (dataset 10) constitutes the location of the actual dead spot, which is further lateral and distal of the ellipse of set 7 (COPx and COPy in their unfavorable ranges).

Note that the distance between the centers of the two ellipses is merely 21.3 mm (**Figure 13**), a distance that decides between high and low chances of scoring a goal with a curved kick. These chances are, according to the three-parameter analysis, 86% in the sweet spot and 20% in the dead spot, the percentages of which are significantly different. However, the 20% value was not significantly different from all three-parameter combinations other than the one with all three parameters in their favorable ranges (85.71%; **Figure 10**). Hypotheses 3 and 4 were confirmed in terms of the existence of clearly defined sweet and dead spots.

#### DISCUSSION

To the authors' best knowledge, the present study was the first one ever on sweet and dead spots on the foot or boot that maximize and minimize the chances of scoring a goal, respectively. This paper hypothesized that a spot exists on the shoe upper or dorsum of the foot that, when kicking a ball at this very spot, would maximize the chances of scoring a goal. The main result of this study was that a sweet spot was found on the medio-proximal aspect of the foot kicking a soccer ball. In contrast, the location of the dead spot was seen to be more latero-distal.

The term "sweet spot" was used in soccer shoes for the first time, to the best of our knowledge, when the Air Zoom Total 90 III was introduced with a side lacing system. This design was supposed to enlarge the sweet spot, defined as "the area of the boot that makes contact with the ball when shooting" (Wilson, 2006). However, as detailed in the Results section, there are regions within the contact area that provide higher or lower kicking accuracy. The term "sweet spot" should therefore be confined to the position of the COP that is associated with the highest chance of scoring. It is obvious that there is no single spot on a boot or foot that guarantees a 100% success rate. The "sweet spot," enabling players to maximize their chances of scoring a goal, should depend on at least two parameters: COPx and COPy. A third parameter, e.g., the kick force, can be considered if it correlates with the chance of success statistically and if it has a mechanically explicable influence on kicking accuracy. In curved shots, the higher the kick force (normal force), the higher is the friction force. The latter improves the spin rate of the ball, resulting in the Magnus Effect and the aerodynamic sideward force. The greater the latter, the more pronounced is the curve of the ball.

Even if a "dead spot" was found in this research, with all three parameters in their unfavorable range, the low success chances when hitting this spot were not significantly different from the success chances if at least one of the three parameters is in its unfavorable range. This fact explains the sweet spot as a multiparameter-dependent location, which is, therefore, more specialized than the dead spot, but also more difficult to achieve when kicking a ball. The dead spot would be better defined by the entire area outside the sweet spot, or confined to the part of the area outside the sweet spot, where any possible ball contact actually occurs. This corresponds to that area that is actually used by the players for kicking. The fact that the dead zone was more widespread than the concentrated sweet spot supports hypothesis 4.

The problem that arises in this paper is whether the sweet spot is success- or player-controlled. In essence, the data could be skewed toward the better players, and therefore represent the kicking style of only the successful players. In the worst case, the sweet spot could be dominated by only one specific player. There are two counterarguments that stand on this assumption:


range of parameters). The question that arises now is: why do participants A–C share the same COP? This could either be a coincidence or be based on what participants A–C have in common. This common parameter would then be a higher success rate. The reason for this would be that the ball-tofoot contact in the sweet spot guarantees a higher success rate in the first place. The same principle becomes evident from Hennig's (2011) study, describing the results of 20 participants kicking with two different shoes, shoe C with better kick accuracy and contact points more medially and proximally, shoe B with worse kick accuracy and contact points more laterally and distally. The two different ball-shoe contact points, determined with a pressure sensor, reflect different levels of kick accuracy.

The participants in our study contributing to the sweet spot (from most to least contribution) were: 5, 3, 10, 6+7, 1+4+8; and to dead spot were: 1, 2, 9, 4, 8, 7, 10, 6, 3. It is evident that the contribution to the sweet spot was made more by participants with better kick accuracy than by participants with less kick accuracy contributing to the dead spot. However, participants with a better kick accuracy, contributing to the sweet spot, share the same average COP, as shown in **Figure 13** (black dots in the center of ellipses of sets 9 and 10).

Interestingly, our study revealed the same contact point distribution with respect to kicking accuracy as Hennig's studies (Hennig et al., 2009; Hennig, 2011), namely that the contact point providing better kick accuracy (our sweet spot) is located more medially and proximally with respect to the one providing less accuracy (our dead spot).

The frontal plane curvature in the dorso-medial part of the forefoot is more horizontal at the forefoot center (aligned to the transverse axis of the body), and more vertical on the medial side of the forefoot (aligned to the longitudinal axis of the body). Thus, the tangent to this curvature becomes more inclined from the center (top) of the forefoot to its medial edge. Consequently, kicking a ball with a contact point located more on the medial side generates a more vertical spin axis of the ball. The more vertical the spin axis, the stronger is the Magnus effect and the more pronounced is the curved flight path of the ball. This is consistent with the observed outcome of all missed kicks in our study, with the ball ending up on the right side of the goal.

The data obtained from this study are true only for the cohort examined and cannot necessarily be extrapolated to professional soccer players. It could very well be that in professional players, the gap between sweet and dead spots is more pronounced. Equally, the chances of scoring a goal when kicking the ball at the sweet spot are expected to be higher in professional players. It is nevertheless remarkable that a significant difference between sweet and dead spot could be found (thereby establishing sweet and dead spots as such) in an amateur soccer cohort, consisting of players of different kick accuracy.

A limitation of the present study is that we cannot be fully certain that the sensor did not move while kicking. Yet, we controlled the sensor location by visual inspection and palpation after every kick. Further evidence that the sensor remained immobile was that the four ellipses of the sweet spot (**Figure 13**) were superimposed with insignificantly different COP locations at Fmax.

This study revealed that the wearable device used in this study (smart soccer boot) is not only suitable to measure the training load of kicking, but also to assess the consistency of kicking in terms of how close the impact points are located relative to sweet and dead spots. In the future, we envisage that a smart soccer boot with fully integrated pressure matrix displays, on its digital twin representation method, the distribution of impact points, their impact force, and success (hits/misses) in real time, while calculating the position of sweet and dead spots. This will add another angle to measurement and management of training loads. Furthermore, an instantaneous biofeedback informing athletes of relevant parameters (i.e., distribution of impact points, impact force and probability of success) can be used to improve players' kicking performance beyond the abilities of an experienced coach.

#### PRACTICAL APPLICATIONS AND CONCLUSIONS

The hypothesized existence of sweet and dead spots on a boot or foot when kicking a soccer ball was confirmed; however, the data comparison of all hits and all misses proved unsuccessful for establishing sweet and dead spots. As a consequence of this result, the data of COPx, COPy, and Fmax were investigated whether or not they can be separated in favorable and unfavorable ranges by means of a new method. Accordingly, the sweet and dead spots were found based on the hypothesized favorable/unfavorable parameter ranges (center of pressure in x/y-directions and/or peak impact force). These ranges maximized/minimized the chances of scoring a goal. Kicking the ball with the sweet spot maximized the probability of scoring a goal (58–86%), whereas

#### REFERENCES


having the impact points at the dead spot/zone minimized the probability (11–22%). The sweet spot was rather concentrated, independent of which parameter combination was used (twoor three-parameter combination), whereas the dead spot, located 21 mm from the sweet spot, was more widespread. The sweet spot was located more medial and proximal than the more scattered dead spots.

Based on the parameters analyzed and the discovery of the sweet and dead spots, we believe that in the future, the Smart Soccer Boot will be able to improve players' kicking performance by real-time biofeedback. Future studies should examine the application of the smart soccer boot in other types of kicks and investigate the existence of sweet/dead spots similar to the present study. Additionally, the sensor needs to be implemented in a boot and real-time biofeedback methods have to be developed.

From a practical point of view, we believe that this would allow players to directly analyze and alter their kicking technique based on the biofeedback signals (Düking et al., 2017) in order to hit the ball with the herein established sweet spot to increase the probability of a successful kick. Consecutively, players likely could train without the necessity of a coach being present to improve their kicking technique.

#### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

### ACKNOWLEDGMENTS

The authors would like to thank the participants who volunteered in the kicking experiments. The authors would like to thank the two reviewers for their valuable comments and suggestions to improve the quality of the paper.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Fuss, Düking and Weizman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Player Monitoring in Indoor Team Sports: Concurrent Validity of Inertial Measurement Units to Quantify Average and Peak Acceleration Values

Mareike Roell <sup>1</sup> \*, Kai Roecker 1,2, Dominic Gehring<sup>1</sup> , Hubert Mahler <sup>1</sup> and Albert Gollhofer <sup>1</sup>

#### Edited by:

Kamiar Aminian, École Polytechnique Fédérale de Lausanne, Switzerland

#### Reviewed by:

Filipe Manuel Clemente, Polytechnic Institute of Viana do Castelo, Portugal Antonio Dello Iacono, Zinman College for Physical Education and Sport, Israel

\*Correspondence:

Mareike Roell mareike.roell@sport.uni-freiburg.de

Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 09 November 2017 Accepted: 12 February 2018 Published: 27 February 2018

#### Citation:

Roell M, Roecker K, Gehring D, Mahler H and Gollhofer A (2018) Player Monitoring in Indoor Team Sports: Concurrent Validity of Inertial Measurement Units to Quantify Average and Peak Acceleration Values. Front. Physiol. 9:141. doi: 10.3389/fphys.2018.00141 <sup>1</sup> Department for Sports and Sport Science, Albert-Ludwigs-University Freiburg, Freiburg im Breisgau, Germany, <sup>2</sup> Applied Public Health, Furtwangen University, Furtwangen im Schwarzwald, Germany

The increasing interest in assessing physical demands in team sports has led to the development of multiple sports related monitoring systems. Due to technical limitations, these systems primarily could be applied to outdoor sports, whereas an equivalent indoor locomotion analysis is not established yet. Technological development of inertial measurement units (IMU) broadens the possibilities for player monitoring and enables the quantification of locomotor movements in indoor environments. The aim of the current study was to validate an IMU measuring by determining average and peak human acceleration under indoor conditions in team sport specific movements. Data of a single wearable tracking device including an IMU (Optimeye S5, Catapult Sports, Melbourne, Australia) were compared to the results of a 3D motion analysis (MA) system (Vicon Motion Systems, Oxford, UK) during selected standardized movement simulations in an indoor laboratory (n = 56). A low-pass filtering method for gravity correction (LF) and two sensor fusion algorithms for orientation estimation [Complementary Filter (CF), Kalman-Filter (KF)] were implemented and compared with MA system data. Significant differences (p < 0.05) were found between LF and MA data but not between sensor fusion algorithms and MA. Higher precision and lower relative errors were found for CF (RMSE = 0.05; CV = 2.6%) and KF (RMSE = 0.15; CV = 3.8%) both compared to the LF method (RMSE = 1.14; CV = 47.6%) regarding the magnitude of the resulting vector and strongly emphasize the implementation of orientation estimation to accurately describe human acceleration. Comparing both sensor fusion algorithms, CF revealed slightly lower errors than KF and additionally provided valuable information about positive and negative acceleration values in all three movement planes with moderate to good validity (CV = 3.9 – 17.8%). Compared to x- and y-axis superior results were found for the z-axis. These findings demonstrate that IMU-based wearable tracking devices can successfully be applied for athlete monitoring in indoor team sports and provide potential to accurately quantify accelerations and decelerations in all three orthogonal axes with acceptable validity. An increase in accuracy taking magnetometers in account should be specifically pursued by future research.

Keywords: locomotion analysis, orientation estimation, inertial measurement unit, complementary filter, physical demands, indoor team sports

### INTRODUCTION

Knowledge about physical demands in team sports has become increasingly important to optimize training programs, to enhance physical performance and to prevent injuries (Fox et al., 2017; Vanrenterghem et al., 2017). Several monitoring systems have been developed to simultaneously quantify multiple players' position, velocity and acceleration during sport specific locomotion (Chambers et al., 2015; Li et al., 2016). In order to rely on a monitoring system's output for player monitoring, data should be both valid and reliable. It is important to note that high consistency in measurements of a system indicates its ability to determine evident and meaningful changes in an athlete's performance. The amount of error caused by high variability within or between monitoring tools as well as the agreement between measured and true values should therefore always be taken into account when performance is assessed and evaluated. Especially, GPS-based systems have recently been evaluated as applicable monitoring tools and are commonly applied in outdoor sports (Cummins et al., 2013; Johnston et al., 2014). Reduced signal quality, however, disables their usage in indoor environments. Alternatively, indoor monitoring systems have been developed and can be subdivided into vision-based or microtechnological systems. Although visionbased motion analysis is widely applied for player monitoring, findings about validity and reliability are inconsistent due to the multitude of existing systems and their dependency upon manual intervention, quality of video footage or camera positioning (Duthie et al., 2005; Barris and Button, 2008). Permanently installed microtechnological local positioning systems are able to overcome these problems showing high values of reliability (CV < 2%) and validity with reported typical errors of 1.2-9.3% for distance, speed and acceleration (Leser et al., 2014; Rhodes et al., 2014; Serpiello et al., 2017). However, mentionable errors were found for mean and peak deceleration (TEmean = 84%, TEpeak = 20%) as well as a decrease of accuracy for actions at the side of the court (Serpiello et al., 2017). Furthermore, high costs and local restrictions due to their fixed installation limit the application of local positioning systems (Hedley et al., 2010; Stevens et al., 2014). Lately, the technological development of inexpensive and portable Micro Electro Mechanical Systems (MEMS) enabled possibilities of quantifying physical loads with a robust method even during games or training sessions in different sports facilities. Most sport specific tracking devices nowadays include a 9 degree of freedom triaxial inertial measurement unit (IMU) containing an accelerometer, gyroscope and magnetometer within a GPS tracking device. Applied in indoor environments these devices enable sampling of acceleration-based data in high resolution during sporting activities without the support from GPS-signals. Extracting sports relevant data from the sensor's signals indoors without the aid of external references is complicated due to multiple sources of noise, mainly by earth's gravity acceleration. Several approaches were proposed therefore to estimate the tracking device's orientation with respect to the earth's coordinate system, e.g., sensor fusion algorithms without GPS (Madgwick et al., 2011; Sabatini, 2011a; Valenti et al., 2015). Such algorithms commonly combine accelerometer and gyroscope signals to compute the device's attitude (pitch and roll angles) relative to the direction of gravity. Including magnetometer readings into the algorithm enables the computation of the sensor's heading, meaning its deviation from magnetic north. Highly dynamic changes of the device's orientation as they frequently occur during sporting activities challenge an algorithm's accuracy. As accepted standard, stochastic Kalman-Filter-based techniques (KF) are commonly applied as effective tool in human motion analysis (Sabatini et al., 2006; Sabatini, 2011a), giving a probabilistic determination of modeled state estimations with the goal of minimizing errors from the true value. Due to multiple tuning parameters their main advantage lies in a high accuracy and a broad field of applications exceeding the purpose of orientation estimation. Complementary Filters (CF) serve as frequencybased equal alternative because of their algorithmic simplicity, effective performance and less difficult implementation process. Due to the dependency upon the single sensors' frequency characteristics the potential applications of CF are restricted, but provide equal and accurate results for orientation estimation (Madgwick et al., 2011; Tian et al., 2013; Valenti et al., 2015). Quantitatively, the time required for necessary linear regression iterations in KFs results in a slower convergence compared to CFs (Ricci et al., 2016). Considering the high frequency of movement changes observed in court-based team sports (Abdelkrim et al., 2007; Luteberget and Spencer, 2017) leading to consequently frequent changes of the device's orientation, the immediate convergence that was found for CFs might serve as appropriate foundation to provide accurate orientation estimation in indoor team sports. Although, CFs have not been evaluated thoroughly especially if compared to KF-based techniques for sport specific purposes, their effectiveness has already been proven for the analysis of human movements (Bachmann et al., 2001; Tian and Tan, 2012; Tian et al., 2013). Validity of a commercially available IMU-based monitoring system that relies on KF-techniques have been proven regarding the magnitude of the resulting acceleration vector or the instantaneous rate of change of acceleration (Wundersitz et al., 2013, 2015a,b). Based on those parameters activity profiles and quantification of loads during games and training have been proposed for indoor team sports (Montgomery et al., 2010; Schelling and Torres, 2016; Luteberget and Spencer, 2017). More elaborated discriminant analysis, however, is lacking because continuous information about gravity-corrected accelerations in the global anterior-posterior, lateral and vertical directions are typically not provided by manufacturers. Exact coordinates of the resulting vector with respect to the earth's coordinate system, however, would be desirable for profound game analyses, as they are already standard in outdoor team sports, leading to a deeper understanding of physical and underlying physiological demands. Recently a new approach has been proposed describing the relation between power output and time duration of movement as a general function for GPS-based analyses of soccer games (Roecker et al., 2017). The function is independent of arbitrary or experience-based intensity thresholds which offers apparently the transfer to acceleration-dominant indoor-sport analyses with the use of IMUs and appropriate sensor fusioning. Added value could be provided through additional information regarding the amount and intensity of acceleration components in the global x-, y- and z-direction as well as distinction between positive and negative acceleration. On this basis, interpretation of individual locomotion might be beneficial for individualization of training programs, supervision of rehabilitation processes or control of each player's injury risk.

The aim of the current study was to compare the concurrent validity of a recently published CF algorithm with a KF, provided by the sensor's manufacturer and a low-pass filtering method applied to IMU signals to obtain average and peak acceleration values in all movement planes. Data recorded from an IMUbased tracking device during simulated team sport specific movements is set against a 3D motion capture system.

## MATERIALS AND METHODS

### Preliminary Investigation

In order to use IMUs for the purpose of orientation estimation, data output should be not only valid but also reliable. A number of studies already evaluated the tracking device that has been used in this study (Optimeye S5, Catapult Sports, Melbourne, Australia) regarding the accelerometer's intra- and inter-device variability under laboratory but also field conditions in handball, ice hockey and Australian football (Boyd et al., 2011; Luteberget et al., 2017; van Iterson et al., 2017). Coefficient of variation (CV) values well below the according smallest worthwhile difference were found during dynamic, mechanical motion (CVinter < 1.04%; CVintra < 1.05%) and sporting activities performed by subjects in the laboratory (CVinter < 6.7%) or field (CVinter < 2.1%; CVintra <26.6%). While these results indicate a good within- as well as between-device reliability of accelerometers, evidence regarding the gyroscope's reliability is missing. As the gyroscope's data output is critical for accurate orientation estimation, we evaluated within- and between-device reliability using a platform rotating at constant angular velocity of 199◦ /s and 270◦ /s respectively. A device mounted on the platform was rotated around either its x-, y-, or z-axis. 10 consecutive trials of 30 s rotation were recorded in each axis for overall 8 devices at both 199◦ /s and 270◦ /s. Between trials the turntable was standing still for 30 s. In both conditions a CV<1% was found for mean and peak angular velocity within as well as between devices, indicating an excellent reliability of the gyroscope. Overall, both accelerometer and gyroscope contained within the tracking device show low intra- and inter-device variability, indicating sufficient reliability of the underlying technology. As the output of the single inertial sensors can be equated with the sensor fusion algorithms' input, the applied tracking device can be stated to be reliable enough for further validation research.

#### Procedure

Data of a single wearable tracking device including an IMU with a sampling frequency of 100 Hz (Optimeye S5, Catapult Sports, Melbourne, Australia) was compared to the results of a 3D motion analysis (MA) system operating at 200 Hz (Vicon Motion Systems, Oxford, UK) during several standardized movement simulations in an indoor laboratory.

To eliminate unintentional artifacts, the device was clamped into a stiff wooden frame that was adapted to the dimensions of the device. The investigator manually moved the frame according to predefined movement simulations inside the capturing volume of the MA system. The simulations were chosen to imitate orientations and changes of orientation as they would equally occur during team sport specific movements. Constant monoor multi-planar motion of the investigator was combined with different orientations of the device including rotations around x-, y-, and z-axis between or during each trial (**Table 1**). Prior and after each trial, the frame was stroked against the ground to evoke a trigger signal for synchronization of the IMU and the MA system. Each of the 28 movement simulations was performed and recorded two times within one recording session. The device has not been turned on and off between trials to simulate longterm usage as it would also appear during training sessions or games. Calibration has been performed by the manufacturer and was therefore not repeated manually prior to recording.

Three retro-reflective markers (Ø 14 mm) were attached to the edges of the rectangular wooden frame to capture the device's local coordinate system (LCS) optically and to calculate a single virtual marker at the estimated position of the IMU's sensor position within the dimensions of the tracking device. Calculation of the virtual marker was done with a custom written script (Bodybuilder, Vicon Motion Systems, Oxford, UK). These virtual marker's trajectories were used for further analysis.

For the purpose of our study, ethical approval and written informed consent were not mandatory since it neither contained human subject research nor recruitment of human subjects, physical or psychological interventions or clinical research practices. Movements of the IMU in the laboratory were performed by the investigator, being well aware of the executed simulation movements. At no point of data collection any risks concerning the investigators physical or psychological health were apparent.

#### Data Processing

Raw data for both the MA system and the IMU were exported to Microsoft Excel (Microsoft Excel 2013, Version 15.0,


TABLE

Common orientations

absolute acceleration

 occurring during team sports were manually

 values covered a wide range of intensities varying from smallest values found ∼1 m\*s−2 to highest values ∼21 m\*s−2. Each trial was performed two times (n = 56).

simulated.Intensities

 varied between movements

 from slow (walking) to fast (sprint, jump) to cover intensities that would equally occur during training or game. Observed

Redmond, USA) through the according manufacturer-supplied software (Nexus 2, Vicon Motion Systems, Oxford, UK; Sprint 5.1.7, Catapult Sports, Melbourne, Australia).

After frequency-reduction of the MA system data to 100 Hz (Biomechanics Toolbar Version 1.02, Liverpool John Moores University, UK) a fourth order, zero-lag, low-pass digital Butterworth-Filter was applied to reduce noise from the x, y and z positional data. According to a residual analysis (Winter, 2009) an optimal cut-off frequency of 5 Hz was chosen. Due to a standard deviation (SD) <1.0 Hz between all trials the same cut-off frequency was applied to all trials. To exclude phase-shift dual pass filtering and a correction of the cut-off frequency to 6.23 Hz was applied (Winter, 2009). Acceleration values (m<sup>∗</sup> s −2 ) in all three orthogonal planes were calculated through double numerical differentiation of the smoothed data. Data of the accelerometer (g) were converted to m<sup>∗</sup> s −2 and data of the x and y-axes inverted once due to the tracking device's orientation within the wooden frame.

#### Low-Pass Filter (LF)

As first option to separate gravity from the sensor's readings a traditional method for gravity correction was applied. Relying on the assumption that gravitational signals only contain low-frequency components body-induced and gravity-induced accelerations can be separated (Bartlett, 2013; Mönks, 2017). Through low-pass filtering (LF) the acceleration data in each axis with a cut-off frequency of 0.3 Hz (Butterworth 4th order), the constant earth's gravity vector was extracted and afterwards subtracted from original acceleration values. The resulting signal was smoothed (Butterworth 4th order) using a cut-off frequency of 5 Hz (6.23 Hz corrected) for all trials (SD < 1.0 Hz) after visual inspection of residual analysis outputs (Winter, 2009) and smallest mean bias to MA reference data.

#### Complementary Filter (CF)

As second option a sensor fusion algorithm (CF) that was originally developed to navigate unmanned aerial vehicles (Valenti et al., 2015) was implemented and adapted to human motion. These algorithms determine the orientation of the tracking device's LCS with respect to the global coordinate system (GCS) using a quaternion based approach. The applied CF has been developed to estimate the device's absolute orientation in two consecutive steps. In the first step, accelerometer and gyroscope data are used to correct the LCS for tilt. Through lowpass filtering the accelerometer signal and high-pass filtering the integrated gyroscope readings with the same cut-off frequency the complementary filter creates "complement" signals that are fused together to estimate the sensor's orientation. The resulting intermediate coordinate system with x- and y- axes being planar to the GCS represents the computed attitude estimation as relative orientation. The algorithm enables the estimation of an absolute orientation in a second step by correcting the intermediate coordinate system's yaw angle. This second step is only performed if magnetometer data are included in calculations and results in a GCS with the positive x-axis always pointing toward magnetic north (**Figure 1**).

To compare inertial data with the MA system, only the first step of the proposed CF was implemented to calculate the device's relative orientation as the MA system's x-axis has not been aligned to magnetic north during calibration. The cut-off frequency for the accelerometer data is constantly characterized using an adaptive gain algorithm within the CF. An initial filtering gain of 0.0072 was chosen, which is based on another CF that has been applied in human motion analysis (Tian et al., 2013). Accelerometer readings were converted to m<sup>∗</sup> s −2 and multiplied with the quaternion of attitude estimation to rotate x-, y-, and z-vectors into the intermediate coordinate system. Constant gravitational acceleration was removed by subtracting 9.81 m<sup>∗</sup> s <sup>−</sup><sup>2</sup> of the intermediate z-vector. All calculations were performed using routines written in C++ (compiled and edited with Microsoft Visual C++ 2017, Redmond, USA). Resulting acceleration vectors were then low-pass filtered (4th order Butterworth) with a cut-off frequency of 5 Hz (6.23 Hz corrected; SD < 1.0 Hz). The cut-off frequency has been determined due to the lowest mean bias between CF data filtered at cutoff frequencies from 4 to 10 Hz and the MA system. Due to the tracking device's orientation within the wooden frame the resulting acceleration values in x- and y-axes had to be inverted once.

#### Kalman-Filter (KF)

As current standard for sensor fusion, a Kalman-Filter (KF) has not been implemented explicitly since it is provided by the manufacturer's software (Sprint 5.1.7, Catapult Sports, Melbourne, Australia). The manufacturer's results were chosen since they have previously been validated and are known to be designed specifically for sport specific environments (Wundersitz et al., 2013, 2015b). The manufacturer's software provides one continuous Kalman-filtered parameter, which is the magnitude of the resultant vector representing the combined effects of x-, y- and z-vectors corrected for gravity. This variable has been exported to Microsoft Excel for further analysis and was low-pass filtered for reducing unwanted noise (4th order Butterworth). 8 Hz (9.97 Hz corrected) has been chosen as cut-off frequency for all trials after residual analysis (SD < 1.0 Hz) and lowest mean bias compared to the MA system criterion. The same parameter has been calculated for MA system data as well as for LF and CF data.

### Data Analyses

After data processing CF, KF, LF, and MA system data of each trial were synchronized by overlaying peaks of the triggering signals. Trigger signals were then excluded for further analysis so that only movement sequences were included. For each trial the average magnitude of the overall acceleration (totalx/y/z), positive acceleration (accelerationx/y/z), and negative acceleration (decelerationx/y/z) as well as the peak magnitude of positive and negative acceleration values were calculated for CF, LF, and MA system data in x-, y-, and z-axes. Average magnitude as well as peak magnitude of the resultant vector (resultantx/y/z) were calculated for CF, KF, LF, and MA system data. To assess the agreement between IMU-based variables and MA system variables mean bias, root mean square error (RMSE; Barnston, 1992), 95% limits of agreement (Atkinson and Nevill, 1998), Spearman's correlation coefficient and the percentage difference in the mean between criterion (MA)

and measurement (CF, KF, LF) expressed as coefficient of variation (CV; Hopkins, 2000) were calculated for mean and peak acceleration values in x, y and z-axes (CF, LF, MA) as well as average and peak magnitude of the resulting vector (CF, KF, MA). According to previous research evaluating the relative error of IMU-based acceleration variables a CV ≤5% was considered as small, CV ≥5% and <20% as moderate and CV ≥20% as large (Wundersitz et al., 2015b; Alexander et al., 2016). To approve the acceptable use of MEMS-based sensors in the field a CV <20% was intended (Tran et al., 2010; Wundersitz et al., 2015a).

### Statistical Analyses

All statistical analyses were performed using JMP Version 13.1.0 (SAS Instituts Inc., Cary, NC, USA). Data are presented as mean ± SD with statistical significance set at p ≤ 0.05 except otherwise stated. Shapiro-Wilk-Tests revealed heteroscedastic data sets for mean and peak accelerations and magnitude variables. Therefore, a nonparametric one-way ANOVA on ranks (Kruskal-Wallis test) was applied to determine differences in mean and peak variables between CF, KF, LF, and MA system data. Mann-Whitney-U tests were additionally performed post-hoc to determine if differences in the means of measurement systems were evident for each variable. The α-level was adjusted to α = 0.017 after Bonferroni-correction to compare mean and peak magnitude acceleration values in all three axes between CF, LF and MA data. To identify differences in mean and peak magnitude values of the resulting vector between CF, KF, LF and MA system αlevel was set at α = 0.013 after Bonferroni-correction. Effect sizes (r) for all performed statistical tests were calculated and interpreted according to Cohen (1992). Bland-Altman plots for all CF mean and peak variables against the MA system were used to visually evaluate the CF data in all axes (Bland and Altman, 1999).

### RESULTS

Regarding average acceleration, significant differences were found between CF, LF and MA system data for totaly, accelerationx, acceleration<sup>y</sup> and deceleration<sup>y</sup> (p < 0.05; r = 0.12 – 0.87; **Table 2**). Peak values showed significant differences in accelerationx, acceleration<sup>y</sup> and deceleration<sup>y</sup> (p < 0.05, r = 0.10 – 0.26). Post-hoc Mann-Whitney-U tests revealed, that without orientation estimation the gravity component could not accurately be eliminated in all three axes leading to significant differences for totaly, mean/peak accelerationy, mean/peak decelerationy, peak deceleration<sup>x</sup> and peak deceleration<sup>z</sup> between LF data and MA system (p < 0.017, r = 0.39 – 0.50), whereas no significant differences were found between CF data and MA system. The LF and CF method significantly differed regarding totaly, mean/peak accelerationx, mean/peak acceleration<sup>y</sup> as well as mean/peak deceleration<sup>y</sup> (p < 0.017, r = 0.34 – 0.55).

Analysis of agreement support these findings with a high relative error regarding mean/peak acceleration values of the LF technique in x-, y- and z-axis (CV = 27.4 – 80.7%, RMSE = 0.37 – 0.72 m<sup>∗</sup> s −2 ). LF method showed poor results also regarding accuracy, precision, correlation coefficient and limits of agreement (**Table 2**). Implementation of the proposed CF clearly improved measurement indices when compared to the LF data with a relative error of 8.0–15.9% for average magnitude values and 3.9–17.9% for peak magnitude values respectively. A low RMSE was found for mean acceleration values (RMSE = 0.04 – 0.22 m<sup>∗</sup> s −2 ) whereas a higher error could be determined for peak values (RMSE = 0.23 – 0.59 m<sup>∗</sup> s −2 ). Bland-Altman plots for mean acceleration values in all axes are shown in **Figure 2** and indicate improved agreement of positive acceleration values compared to deceleration. No systemic bias could be observed for mean acceleration values as well as for peak acceleration values (**Figures 2**, **3**). Limits of agreement exceed when regarding peak


TABLE 2 | Analysis of agreement between CF data respective LF data and MA system data.

Mean and peak acceleration values are presented for overall acceleration (total), positive acceleration (acceleration), and negative acceleration (deceleration) in x-, y- and z-axis. \*Significant differences in the mean between LF and MA (p < 0.017) SD, standard deviation; 95% LoA, 95% limits of agreement; rs, Spearman's correlation coefficient; RMSE, root mean square error; CV, coefficient of variation.

each. Dashed lines: 95% LoA, solid line: mean bias.

acceleration values in all axes (**Figure 3**) but still are within an acceptable range.

Comparison of the resulting vector's magnitude between CF, KF, LF, and MA system data revealed no significant differences for average as well as peak resulting magnitude values (p < 0.013). Although no significant differences could be found agreement analysis indicate poor accuracy, precision, limits of agreement and relative error of the LF method for mean and peak variables. Analysis of agreement for the results of both orientation filters (CF, KF) however indicate high accuracy in quantifying mean and peak resulting magnitude values. In contrast to the LF method low RMSE and CV values were found for CF and KF, indicating a high accuracy of both methods (**Table 3**). Thereby, slightly smaller errors of the CF data compared to the manufacturer's KF in all reported parameters could be noted.

### DISCUSSION

### Main Findings

Aim of this study was to evaluate the concurrent validity of two standard sensor fusion algorithms to accurately quantify and normalize team sport specific accelerations TABLE 3 | Analysis of agreement between KF data, LF data, CF data, and MA system data.


Mean and peak acceleration values are presented for the magnitude of the resulting acceleration vector. \*significant differences in the mean between data processing method and criterion (MA) (p < 0.013) SD, standard deviation; 95% LoA, 95% limits of agreement; rs, Spearman's correlation coefficient; RMSE, root mean square error; CV, coefficient of variation.

as well as decelerations under indoor conditions. Furthermore, it was intended to receive information alongside the resulting acceleration vector about positive and negative acceleration values in all three movement planes.

Our findings show that after implementation of a sensor fusion algorithm, the IMU-derived data do not substantially differ from the motion capture system data. Analysis of agreement indicate that the CF algorithm seems to be capable of quantifying average acceleration magnitude (CV = 8.0 – 15.9%, RMSE = 0.04 – 0.22 m<sup>∗</sup> s −2 ) and peak acceleration magnitude (CV = 3.9 – 17.8%, RMSE = 0.23 – 0.59 m<sup>∗</sup> s −2 ) in x-, y-, and z-axes within a good to moderate range. Validity could be shown for both sensor fusion algorithms regarding the magnitude of the resultant acceleration vector (**Table 3**). Although no differences were evident between CF and KF, slight advantages of CF were found according to analysis of agreement. Overall, MEMSbased tracking devices seem to provide promising information to continuously calculate human acceleration and deceleration through the application of adequate orientation filters and smoothing techniques even without the aid of external references.

### Comparison of LF, CF, and KF

Previous studies with relevance for team sports activities have reported that raw accelerometer data show insufficient accuracy as a measure of impacts during jumping movements or average acceleration during high-speed running (Tran et al., 2010; Alexander et al., 2016). The authors assumed these discrepancies to result from a lack of gravity-compensation. As IMU-based sensors are sensitive to all kinematic phenomena occurring within a time and space fixed inertial frame, earth's constant gravity and rotation is apparent in the sensor's reading. While earth's rotation with 15 degree/h compared to sensor noise is negligible for the current issue of interest (Sabatini, 2011b; Groves, 2013), a precise separation of human-induced accelerations and external bias, including earth's gravity is essential to accurately describe an athlete's locomotion. Our results indicate that the simple low-pass filtering to extract gravity-induced high-frequency components does not provide acceptable results (CVmean > 20%, RMSEmean = 0.69 – 1.03 m∗ s −2 ; CVpeak > 20%, RMSEpeak = 1.29 – 3.98 m<sup>∗</sup> s −2 ). In contrast to sensor fusion techniques the exact direction of gravitational acceleration acting on the tracking device stays unknown, which seems to hinder accurate distinction between gravity and body acceleration. While the standard low-pass filtering method might be sufficient in primarily static environments, our results reveal serious errors when it comes to quantifying human acceleration during sport specific simulations including frequent orientation and movement changes. Contrastingly, both sensor fusion algorithms resulted in obvious improvements of accuracy and precision of the tracking data. Regardless of the filtering technique (stochastic vs. complementary) a high concurrent validity in measuring the resultant's vector magnitude was observed for mean and peak values. These observations strongly emphasize that future analysis must consider the orientation of the athlete in regard to the global coordinate system via sensor fusion. The KF-parameter provided by the manufacturer's software has previously been validated (Wundersitz et al., 2015a,b) during linear movements (CV = 6.5 – 9.5%) and a team sport specific circuit that included jumping, change of direction tasks and tackling (CV = 5.5%). Our findings support these results, indicating good to acceptable validity of the KF not only for quantifying peak (CV = 7.1%, RMSE = 0.83 m<sup>∗</sup> s −2 ) but especially average values (CV = 3.8%, RMSE = 0.15 m<sup>∗</sup> s −2 ) during a variety of team sports related movement simulations. Although both orientation filters show good results the applied CF slightly outperformed the KF regarding mean bias, limits of agreement, RMSE, correlation coefficient and relative error (**Table 3**). Bergamini et al. (2014) found errors in orientation estimation during locomotor trials depending on the task and type of orientation but independent of the type of sensor fusion. More detailed analyses under highly controlled conditions revealed slight differences occurring from the sensor fusion algorithm itself although the main dependency still resulted from the performed movement (Ricci et al., 2016). During dynamic trials with a robotic arm imitating human movements the implemented KF indeed showed an overall better performance but also a remarkably slower rate of convergence during static trials. While the CF immediately adapted to stops after a motion the KF technique required about 10 s to reach a stable signal. As in court-based sports movement changes occur about every 3 s (Abdelkrim et al., 2007; Luteberget and Spencer, 2017) and likely induce pauses of short duration, a faster rate of convergence might be beneficial in particular to follow these intermittent changes between highly dynamic motion and momentary stops. Since most of our trials were short of duration (<30 s) and included temporary pauses, e.g., during change of direction movements, the observed advantages of the CF could be explained by its faster adaption. However, we did not examine the algorithm's convergence rate directly. Furthermore, the properties of the manufacturer's algorithm are unknown while the choice of tuning parameters is critical for an algorithm's accuracy (Ricci et al., 2016). The exact reason for the discrepancies between the stochastic and complementary approach can therefore not be explained completely by our work.

## Analysis in Movement Planes

The proposed CF enables to overcome the restriction of analyzing primarily the magnitude of the resulting acceleration. Continuous discrimination in average acceleration and deceleration can be provided with moderate to good accuracy in all axes. For peak values, RMSE indicates a high accuracy in vertical direction, but an increase in the magnitude of the error for x- and y-values compared to average accelerations. Still, these values allow a good approximation of peak values within an error range of 0.39 – 0.59 m<sup>∗</sup> s −2 and are lower than RMSE values found for resultant peak impacts in team sport movements (Wundersitz et al., 2015b). However, in addition with relatively high CV values practitioners should be aware of limited accuracy when analyzing single maximum values in both horizontal axes. Previous research examining the validity of MEMS-based sensors during sporting activities applied a maximum CV of 20% as limit for acceptable validity (Tran et al., 2010; Wundersitz et al., 2013, 2015a). Therefore, relative errors of 6.7% (accz) and 3.9% (decz) found in this study for peak acceleration values in the z-axis indicate good validity of the CF data. In the horizontal plane, RMSE values generally speak for the CF's validity and according relative errors could objectively be described as acceptable (CV < 20%). However, the measures in x- and y-axes cannot be stated to be accurate enough when quantifying especially high peak acceleration values. Relative errors up to 17.8% (accx) could in fact equal errors >1 m<sup>∗</sup> s −2 for high intense acceleration efforts (>6 m<sup>∗</sup> s −2 ) and lead to large misinterpretations of a player's true performance. An internal non-orthogonality of the tracking device's axes could explain these findings, but is usually prevented with the use of calibration routines (Groves, 2013). For the purpose of this study, no calibration was performed prior to the recording as the manufacturer recommends to rely on the built-in calibration of the devices. More likely the described error results from a misalignment between the x- and y-axes of the computed intermediate frame with the reference axes of the MA coordinate system. Since no correction of the yaw-angle was performed deviations in the horizontal plane of the intermediate coordinate system with respect to the MA coordinate system could account for higher relative errors. This hypothesis is supported by the lowest CVs and RMSEs found for peak vertical parameters (RMSEmean = 0.23 – 0.38 m<sup>∗</sup> s −2 , CVpeak =3.9 – 6.7%) indicating a good correction of pitch and roll angles. It is assumed to reach similar values also for x- and y-axes if a perfect alignment of the calculated and the reference coordinate system is accomplished. However, our results show that the implementation of a complementary filtering technique results in a good level of validity when determining average and peak acceleration values in vertical direction as well as promising precision for the horizontal plane.

### Practical Applications

Although accuracy should further be improved for horizontal and lateral direction our results suggest a successful application for developing discriminant acceleration-based activity profiles in indoor sports. Recent studies emphasize the importance of accelerations during team sports for the imposed external load on the athlete. Accelerations and decelerations are known for higher metabolic loads (Osgnach et al., 2010) and greater processes of muscular damage due to their eccentric loading (Nosaka and Newton, 2002; Lakomy and Haydon, 2004). Both could account for the decrease in acceleration efforts over time, observed in football matches which is assumed to indicate an increase in fatigue (Akenhead et al., 2013; Mara et al., 2017). A greater amount of physical loads including acceleration and deceleration based movements can be assumed for indoor courtbased sports as an increase in physical demands and acceleration patterns has been observed with the reduction of pitch sizes during soccer games (Hodgson et al., 2014). Although the importance of this topic is widely accepted, only a limited number of studies examined the proportions of accelerations and decelerations in indoor sports (Manchado et al., 2013; Luteberget and Spencer, 2017; Puente et al., 2017). According information about required acceleration-based locomotion for each sport can easily be provided to sport scientists and coaches with the use of a complementary filtering technique, helping them to execute well-directed player replacements, adapt training programs, individualize recovery protocols and optimize athletic conditioning. In contrast to assessing the resulting vectors magnitude alone, this could lead to a deeper understanding of players' movements during training and competition indoors. A promising potential of IMU's is assumed in their ability to quantify locomotion but also to distinguish between distinct movement patterns. A number of studies in this relatively new field of interest has previously evaluated the validity of IMU-based variables, mainly PlayerLoad <sup>R</sup> and the resultant acceleration vector with respect to different movement patterns (Wundersitz et al., 2015a,b). Comparing peak acceleration values during walking, jogging and running a slight increase of the relative error was found for running (CV = 9.3%) compared to walking (CV = 6.5%) and jogging (CV = 7.5%) (Wundersitz et al., 2015b). In contrast no clear differences were observed when subjects performed 7 different team sport movements within a circuit, where CVs ranged from 3.7 to 6.9% only (Wundersitz et al., 2015b). When comparing validity of IMU-based accelerations during 3 different tackling tasks no differences in accuracy were apparent between two of the three movements (Wundersitz et al., 2015a). Overall, indications from literature do currently not suggest any obvious differences in validity of IMU's based on the performed movement itself. However, our results indicate that more detailed analysis in single movement planes seem to be possible and thereby might lead to according discrimination and quantification of movement patterns using IMU's in future by overcoming the restriction of the resulting acceleration vector only. Still, this study focused on the more general concurrent validity of sensor fusion algorithms under sport specific conditions and showed the potential of the applied complementary filtering technique to correctly estimate sports-related orientations in principle rather than for distinct movement patterns. Our findings therefore seem not sufficient enough to answer this question properly but should be taken into account for future research regarding the discrimination of movement patterns based on IMUs' output.

### Limitations

As a limitation of the study, the previously described misalignments between the tilt-corrected coordinate system and the reference axes of the MA system probably have an impact on accuracy in anterior-posterior and lateral direction, mainly affecting the relative error. Including only accelerometer and gyroscope data results in an orientation estimation relative to the direction of earth's gravity vector. Calculating absolute orientation with respect to the court's coordinates might be possible with the aid of magnetometer data and calibration trials, but further has to be validated. Ferromagnetic disturbances as they might occur during game days due to electronic sound systems around the court have to be considered for calculations. A certain amount of error has to be mentioned regarding the derivatives of MA system data, as the MA system directly measures displacement data not acceleration itself. Numerical differentiation of positional data can increase high-frequency noise of the MA data. Despite the attempt to partially dampen according inaccuracies, a potential influence on the criterion data has to be considered (Cole et al., 2014). However, MA system data are accepted as standard validation criterion for multiple player monitoring systems including acceleration estimates (Stevens et al., 2014; Vickery et al., 2014; Wundersitz et al., 2015a). Due to the short duration of recorded movement simulations we could not be aware of drifting phenomena as they might appear during longer trials. Drifting errors were not apparent during two recorded long-duration trials of 1.5 min, however this should be proved for a larger number of samples. When monitoring players during training or games another source of error might arise from enhanced vibrations of the device. For our experiments, the sensor has been fixed within a wooden frame to reduce unintentional whippings, which could occur when placing the sensor in a looser harness. This could presuppose different tuning parameters of the complementary filter as well as adaptions of the smoothing cut-off frequencies. Other microtechnological monitoring systems like GPS-devices showed a decrease in accuracy during short-distance or highacceleration movements (Akenhead et al., 2014; Johnston et al., 2014). With a view to these findings it has to be mentioned that validity of IMUs in quantifying acceleration and deceleration efforts might also vary between specific movement patterns or intensities. This has not been part of this study, since we focused on the simulation of orientations as they might also occur during team sport activities rather than on actual movement performances. Therefore, our results are missing a conclusion about the advantages or disadvantages of IMU regarding the quantification of acceleration efforts during distinct movements.

### CONCLUSION

The findings of this study show that wearable tracking devices containing a MEMS-based sensor have a great potential to be applied also indoors as valid tool to determine accelerations and decelerations during of team sport specific movement including walking, running, jumping and change of direction simulations. The possibility to continuously analyze accelerationvalues in horizontal and vertical planes broadens the field of player monitoring and comprehension of physical demands in indoor court-based sports. Coaches and sports scientists should be aware of the applied sensor fusion algorithm, its tuning parameters, correct smoothing technique and avoid analyzing raw accelerometer data to accurately determine the athlete's acceleration. Future research should aim to

### REFERENCES


increase accuracy of accelerometer-derived data with the aid of magnetometers especially in x- and y-axes. Based on this, emphasis should be given to develop appropriate tools to detect an athlete's exact orientation on the court and the direction of performed movements in relation to the court's coordinate system. Discrimination between single movement patterns like backwards and forward movements, but also lateral motions and their proportion to each other should be investigated in future research and help to develop distinct activity profiles. Therefore, it would be critical to assess the validity and reliability of sensor fusion algorithms during actual performed different movement patterns and intensity zones. Further, numerical integration of acceleration values enables the calculation of according velocity which would lead to a deeper understanding of external loads in indoor team sports. For this purpose drift, which occurs due to the additive integration of noise within the IMU signal, has to be eliminated by appropriate algorithms. By providing comprehensive information about locomotion that exceed the restriction to resulting acceleration vectors, IMUs could become a meaningful tool for player monitoring in indoor team sports in future.

## AUTHOR CONTRIBUTIONS

MR, KR, and HM designed this study. Methodology was planned by MR, KR, and DG. MR and DG collected the data. MR analyzed and interpreted the data. AG provided funding acquisition and resources. MR drafted the manuscript. All authors revised the manuscript and approved the final version to be published.

### FUNDING

The Adidas AG (Future Team) provided financial support in form of salaries for authors MR and HM.

### ACKNOWLEDGMENTS

We acknowledge the financial support by German Research Foundation (DFG) and University of Freiburg within the funding programme Open Access Publishing.


players during competitive matches. J. Sci. Med. Sport 20, 867–872. doi: 10.1016/j.jsams.2016.12.078


vertical and resultant force during running and change-of-direction tasks. Sports Biomech. 12, 403–412. doi: 10.1080/14763141.2013. 811284

**Conflict of Interest Statement:** KR is scientific consultant for Adidas and is owner of a software company (ergonizer.com) for performance diagnostics. HM and MR receive research grants of Adidas. AG is scientific consultant for Adidas. This did not play any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright © 2018 Roell, Roecker, Gehring, Mahler and Gollhofer. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Potential Usefulness of Virtual Reality Systems for Athletes: A Short SWOT Analysis

#### Peter Düking1,2 \*, Hans-Christer Holmberg2,3,4 and Billy Sperlich<sup>1</sup>

1 Integrative & Experimental Exercise Science & Training, Institute for Sport Sciences, University of Würzburg, Würzburg, Germany, <sup>2</sup> Swedish Winter Sports Research Centre, Mid Sweden University, Östersund, Sweden, <sup>3</sup> School of Sport Sciences, UiT The Arctic University of Norway, Tromsø, Norway, <sup>4</sup> School of Kinesiology, University of British Columbia, Vancouver, BC, Canada

#### Keywords: telemedicine, eHealth, mHealth, telerehabilitation, wearable, internet of sports

Virtual reality (VR) systems (Neumann et al., 2017), which are currently receiving considerable attention from athletes, create a two- or three-dimensional environment in the form of emulated pictures and/or video-recordings where in addition to being mentally present, the athlete even often feels like he/she is there physically as well. As she/he interacts with and/or reacts to this environment, movement is captured by sensors, allowing the system to provide feedback.

As with every newly evolving technology related to human movement and behavior, it is important to be aware of the strengths, weaknesses, opportunities and threats (SWOT) associated with the use of this particular type of technology. SWOT analyses are widely utilized for strategic planning of developmental processes (Pickton and Wright, 1998; Tao and Shi, 2016) and it is of great interest to consider whether VR systems should be adopted by athletes or not. Aspects more inherent to the employed technologies of VR systems, and aspects more related to the application of VR systems with athletes are considered as strength/weaknesses and opportunities/threats, respectively. Analogously, SWOT analysis concerning another emerging technology involving sensors of individual parameters (i.e., "implantables") has been performed (Sperlich et al., 2017).

#### Edited by:

Luca Paolo Ardigò, University of Verona, Italy

#### Reviewed by:

David L. Neumann, Griffith University, Australia Nicola Luigi Bragazzi, Università di Genova, Italy

> \*Correspondence: Peter Düking peterdueking@gmx.de

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 19 October 2017 Accepted: 07 February 2018 Published: 05 March 2018

#### Citation:

Düking P, Holmberg H-C and Sperlich B (2018) The Potential Usefulness of Virtual Reality Systems for Athletes: A Short SWOT Analysis. Front. Physiol. 9:128. doi: 10.3389/fphys.2018.00128 STRENGTHS

VR systems allow individualization of training (Kim et al., 2013) and can be applied even in everyday settings, such as when traveling, lying in bed or working. Moreover, (bio-)feedback (Düking et al., 2017) can be provided by continuous learning algorithms to athletes directly in real time (Kim et al., 2013) and/or even remotely to coaches (Neumann et al., 2017).

Inherent to the nature of VR is the potential to design and manipulate freely an almost infinite number of procedures for training athletes individually (Hoffmann et al., 2014). For example, manipulation of the visual environment (e.g., fog, light reflections, darkness, dust, rain, snow) allows many different conditions to be experienced. In addition, a large number of repetitions per training session can be achieved, which is likely to be beneficial in connection with sports where this is not possible in real life (e.g., ski jumping, downhill skiing, sky-jumps, and many more). In VR, an individual may compete against or train with any other athlete around the world (Capin et al., 1997; Neumann et al., 2017), regardless of their relative levels of performance, gender, ages and even if the other athlete is injured.

#### WEAKNESSES

Realistic environments, which enhance the sense of immersion, are key to optimizing training and learning (Vignais et al., 2015).

The level of immersion depends on the feeling of "being present" in VR (place illusion) and the illusion of what is happening is real (plausibility illusion) (Slater, 2009). Consequently, the haptic, tactile, visual, and/or audio (bio-)feedback provided must be as realistic as possible and movements in the real world need to be synchronized with those in the virtual world (Vignais et al., 2015; otherwise, "seasickness" can be induced, Faisal, 2017). However, current VR systems cannot always achieve these goals (Katz et al., 2006).

Moreover, certain VR applications designed to capture the motion of athletes in real time require massive computational power, as well as a broad bandwidth for the transfer of data. Real video footage requires a relatively extensive database, whereas animated video footage may result in the "uncanny valley" effect, i.e., realistic graphical representations of characters that evoke unpleasant feelings (Vignais et al., 2015).

For a more realistic experience, the technology should be non-obtrusive, as small and light-weight as possible, allowing the athlete to execute movements without restriction or harming him/herself or others.

Finally, the costliness of setting up VR systems can limit their usage.

### OPPORTUNITIES

VR systems enable athletes to learn remotely from any coach and at a time and place of their own choosing, improving a wide variety of skills such as decision-making and pacing strategies that optimize utilization of energy (Hoffmann et al., 2014; Murray et al., 2015; Romeas et al., 2015; Gokeler et al., 2016). Creative behavior, involving a wide variety of patterns of movement and tasks (Santos et al., 2016), can be stimulated by providing a plethora of appropriate exercises. Exercising in VR can lower the level of perceived exertion while simultaneously enhancing enjoyment (Mestre et al., 2011), which could increase the willingness to exercise, as well as performance while exercising.

Prior to competitions, VR systems can probably be employed to optimize warm-up procedures (Calatayud et al., 2010), for example, by enhancing motor imagery (Louis et al., 2008). Stress and certain dimensions of (competitive) anxiety could potentially be managed more efficiently with such systems (Parsons and Rizzo, 2008; Stinson and Bowman, 2014). With VR, athletes can train for competitions under the conditions predicted for the actual event, thereby achieving more realistic preparation (Swaren et al., 2012).

VR might also help injured athletes in two ways: First, it could aid the diagnosis of certain aspects of sport-related injuries (Teel and Slobounov, 2015). And secondly, recovery could be promoted by providing exercises designed to maintain mental alertness and readiness through simulation of real-life scenarios from a first-person perspective (Craig, 2014) and/or by helping athletes to maintain appropriate movements during rehabilitation (Fitzgerald et al., 2007; Gokeler et al., 2016).

From an employment perspective, specialized coaches will most likely have to be hired to implement and handle the more complicated VR systems of the future.

For researchers, VR provides exceptional opportunities for highly reliable field-testing of athletes (Gokeler et al., 2016), e.g., their perception-action-loops (Bideau et al., 2010; Craig, 2014). In the future, such diagnostic tests could also be applied routinely to young athletes, e.g., for earlier identification of talent.

## THREATS

The transferability of skills, tactics, creative behavior and diagnostic procedures from the virtual to the real world remains to be established scientifically, although there is already evidence for the transferability of skills (Tirp et al., 2015). Some VR sensations (e.g., of g-forces, 3-D orientation) are currently not realistic, which could lead to unnatural patterns of movement, as well as under-/overuse and/or injury.

As with every novel technology, VR must first prove its value in order to convince rehabilitation specialists, athletes, coaches and others to adopt it (Katz et al., 2006; Akenhead and Nassis, 2015).

From an economic perspective, certain coaching jobs could be jeopardized by VR systems and, moreover, the cost of certain of these systems is still quite high.

Furthermore, VR systems may pose a threat to certain aspects of health, e.g., mental or visual (Spiegel, 2017). Proper hygiene must be given high priority, especially with respect to avoiding the spread of bacteria and/or viruses among team members (Davies et al., 2017). When exercising in VR, an athlete may be more prone to falling or collision with nearby objects, a risk which appears to be particularly great in connection with visual restriction due to a head-mounted display (Neumann et al., 2017). Another real risk associated with extensive use of VR systems in general is social isolation (Spiegel, 2017).

Finally, the personal data collected by VR systems must be protected from outside access and misuse (Spiegel, 2017).

### SUMMARY

To summarize, VR systems show considerable promise for improving certain aspects of athletic performance, such as tactics or creative behavior, as well as in connection with rehabilitation, and research. Current technological limitations restrict sophisticated application of VR by athletes and transferability from the virtual to the real world and certain related health concerns require detailed further investigation.

Although SWOT analyses have potential limitations (e.g., by being too subjective; Pickton and Wright, 1998), we believe that this opinion article offers a valuable starting point for those who want to know more about the use of VR systems by athletes.

We have pointed out only the most prominent strengths, weaknesses, opportunities and threats associated with the use of VR systems in connection with sports (**Table 1**) and there are surely many more. It is noteworthy that most current research in this area focuses on aerobic sports and more emphasis on skill-based sports is needed (Neumann et al., 2017). Moreover, VR systems are still in their infancy and the substantial improvements and other alterations certain to come in the near future, as well as the applicability of VR systems to the athletic population must be monitored continuously and carefully.

TABLE 1 | Strengths, weaknesses, opportunities, and threats associated with the use of VR systems by athletes.



# AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

#### REFERENCES


### FUNDING

This publication was funded by the German Research Foundation (DFG) and the University of Wuerzburg in the funding programme Open Access Publishing.


rehabilitation exercises. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2007, 4870–4874. doi: 10.1109/IEMBS.2007.4353431


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Düking, Holmberg and Sperlich. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Impact of Web-Based Feedback on Physical Activity and Cardiovascular Health of Nurses Working in a Cardiovascular Setting: A Randomized Trial

Jennifer L. Reed1,2 \*, Christie A. Cole<sup>1</sup> , Madeleine C. Ziss <sup>1</sup> , Heather E. Tulloch<sup>1</sup> , Jennifer Brunet <sup>2</sup> , Heather Sherrard<sup>1</sup> , Robert D. Reid<sup>1</sup> and Andrew L. Pipe<sup>1</sup>

<sup>1</sup> Division of Cardiac Prevention and Rehabilitation, University of Ottawa Heart Institute, Ottawa, ON, Canada, <sup>2</sup> Faculty of Health Sciences, School of Human Kinetics, University of Ottawa, Ottawa, ON, Canada

#### Edited by:

Billy Sperlich, University of Würzburg, Germany

#### Reviewed by:

Filipe Manuel Clemente, Polytechnic Institute of Viana do Castelo, Portugal Rodney P. Joseph, Arizona State University, United States

> \*Correspondence: Jennifer L. Reed

jreed@ottawaheart.ca

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 04 September 2017 Accepted: 12 February 2018 Published: 06 March 2018

#### Citation:

Reed JL, Cole CA, Ziss MC, Tulloch HE, Brunet J, Sherrard H, Reid RD and Pipe AL (2018) The Impact of Web-Based Feedback on Physical Activity and Cardiovascular Health of Nurses Working in a Cardiovascular Setting: A Randomized Trial. Front. Physiol. 9:142. doi: 10.3389/fphys.2018.00142

A disconcerting proportion of Canadian nurses are physically inactive and report poor cardiovascular health. Web-based interventions incorporating feedback and group features may represent opportune, convenient, and cost-effective methods for encouraging physical activity (PA) in order to improve the levels of PA and cardiovascular health of nurses. The purpose of this parallel-group randomized trial was to examine the impact of an intervention providing participants with feedback from an activity monitor coupled with a web-based individual, friend or team PA challenge, on the PA and cardiovascular health of nurses working in a cardiovascular setting.

Methods: Nurses were randomly assigned in a 1:1:1 ratio to one of the following intervention "challenge" groups: (1) individual, (2) friend or (3) team. Nurses wore a Tractivity® activity monitor throughout a baseline week and 6-week intervention. Height, body mass, body fat percentage, waist circumference, resting blood pressure (BP) and heart rate were assessed, and body mass index (BMI) was calculated, during baseline and within 1 week post-intervention. Data were analyzed using descriptive statistics and general linear model procedures for repeated measures.

Results: 76 nurses (97% female; age: 46 ± 11 years) participated. Weekly moderate-to-vigorous intensity PA (MVPA) changed over time (F = 4.022, df = 4.827, p = 0.002, η <sup>2</sup> = 0.055), and was greater during intervention week 2 when compared to intervention week 6 (p = 0.011). Daily steps changed over time (F = 7.668, df = 3.910, p < 0.001, η <sup>2</sup> = 0.100), and were greater during baseline and intervention weeks 1, 2, 3, and 5 when compared to intervention week 6 (p < 0.05). No differences in weekly MVPA or daily steps were observed between groups (p > 0.05). No changes in body mass, BMI or waist circumference were observed within or between groups (p > 0.05). Decreases in body fat percentage (−0.8 ± 4.8%, p = 0.015) and resting systolic BP (−2.6 ± 8.8 mmHg, p = 0.019) were observed within groups, but not between groups (p > 0.05).

Conclusions: A web-based intervention providing feedback and a PA challenge initially impacted the PA, body fat percentage and resting systolic BP of nurses working in

**120**

a cardiovascular setting, though increases in PA were short-lived. The nature of the PA challenge did not differentially impact outcomes. Alternative innovative strategies to improve and sustain nurses' PA should be developed and their effectiveness evaluated.

Keywords: nurses, physical activity, cardiovascular, web-application, activity monitor, challenges

### INTRODUCTION

Nurses are the largest professional group within the health care workforce. Several investigators in varied settings have assessed the self-reported and objective physical activity (PA) levels of nurses and shown low levels of PA (Kaewthummanukul et al., 2006; Sveinsdottir and Gunnarsdottir, 2008; Ratner and Sawatzky, 2009; James et al., 2013; Babiolakis et al., 2015; Perry et al., 2015; Reed et al., 2018). The National Survey of the Work and Health of Nurses in Canada showed that a disconcerting proportion of nurses are overweight or obese (45%) and smokers (16%); have high blood pressure (13%), high cholesterol (10%) and diabetes (3%); and, experience fair/poor mental health (6%) (Shields and Wilkins, 2006). The irrefutable evidence demonstrating the effectiveness of regular PA in the prevention and management of cardiovascular disease and associated risk factors (Warburton et al., 2006, 2010; Haskell et al., 2007; Reed and Pipe, 2016) highlights the opportunity afforded by targeted PA interventions to promote positive behavior change within this unique and large professional population.

As adults worldwide embrace modern technologies, webbased innovations may represent opportune, convenient, and cost-effective methods to target suboptimal PA levels and poor cardiovascular health of nurses. Web-based interventions that can be delivered anytime and anywhere warrant particular attention because they are accessible 24 h a day, 7 days a week which may be ideal for nurses working long (i.e., 12 h) and rotating (i.e., days, evenings, nights, weekdays, weekends) shifts. Several reviews have shown that web-based interventions can increase PA levels and reduce body mass, waist circumference and blood pressure in adults (van den Berg et al., 2007; Liu et al., 2013; Joseph et al., 2014; Seo and Niu, 2015; Direito et al., 2017; Sorgente et al., 2017).

The primary purpose of this parallel-group randomized trial was to examine the impact of a web-based intervention providing specific feedback derived from an activity monitor on the PA levels and cardiovascular health of nurses working in a Canadian cardiovascular setting. We hypothesized that nurses' PA levels and cardiovascular health would improve in response to the receipt of personalized feedback regarding their PA levels derived from an activity monitor. Further, as modern informatics capabilities enable us to harness strategies designed to initiate and support positive behavior change, the secondary purpose of this trial was to assess whether nurses' PA levels are enhanced when they work together to meet their PA goals (i.e., friend or team PA challenge) compared to when they work alone to meet their goals (i.e., individual PA challenge). We hypothesized that nurses' assigned to a friend or team PA challenge (and thus have their weekly PA levels displayed to others) would become more physically active when compared to nurses assigned to an individual PA challenge. This hypothesis is based on work suggesting that people perform better when they are in front of others than when they perform alone (Hausenblas et al., 2014). One explanation for this is based on the self-presentation theory (Leary, 1992) which suggests that the desire to enhance oneself and make positive impressions in front of others is an important motivator of human behavior. From this perspective, one might expect that nurses will increase their PA levels more if they know that others will see their levels than if no one else will see their PA levels.

### MATERIALS AND METHODS

#### Study Design

This parallel-group randomized trial was conducted at the University of Ottawa Heart Institute (UOHI), a tertiary care cardiovascular institute. This study was carried out in accordance with the consolidated standards of reporting trials (CONSORT) and intervention description and replication (TIDieR) checklists (Hoffmann et al., 2014; Boutron et al., 2017). All participants provided written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the UOHI Human Ethics Board (Protocol No. 20130429).

### Protocol

#### Recruitment

A convenience sample of participants was recruited between September and November 2013. Research staff informed nurses, administrative staff and nursing-leaders of the study by attending nursing meetings and morning rounds, and by distributing recruitment posters throughout the hospital (e.g., nursing lounges and stations, information boards, cafeterias). The posters contained a brief description of the study and contact information for the research staff. Hospital administrative staff and nursing-leaders assisted in distributing recruitment materials. Nurses interested in participating in the study contacted the research staff; screening was performed on-site.

Eligible participants were: (1) registered nurses; (2) able to walk unassisted; (3) willing to wear a stretchable ankle band which contained a PA monitoring device (i.e., accelerometer) and had access to the internet; and, (4) able and willing to provide written informed consent. Participants who: (1) were pregnant or lactating; (2) were unable to read and understand English; (3) had medical contraindications to exercise; and/or, (4) were already using an activity monitor to track their PA levels were not eligible.

#### Randomization and Intervention Groups

Motivation is a principal factor prompting changes in health behaviors (Teixeira et al., 2012). Framing behavior change interventions as games or competitive endeavors may be an effective strategy to motivate change (Baranowski et al., 2008). Given previous studies have shown that employing game design features in non-game contexts is effective in improving health and well-being (Johnson et al., 2016) and that the presence of others can increase performance, participants were randomly assigned in a 1:1:1 ratio to one of the following intervention groups: (1) individual, (2) friend or (3) team challenge, which allowed us to examine if the groups facilitated or inhibited behavior change. Research staff randomly allocated participants to intervention groups using the "RAND" function of a software spreadsheet program (Excel, Microsoft, Washington, USA), and notified them of their group assignment via email. Participants were provided with a: (1) unique username and password to access the online Tractivity <sup>R</sup> program which contained their individual, friend or team challenge; and, (2) Bluetooth USB key which enabled them to upload their activity monitor data into the online Tractivity <sup>R</sup> program.

Participants could monitor their distance (km), steps (number), active time (minutes) and calories (kcal) expended on an hourly, daily, weekly, and monthly basis in a graphical format in the online Tractivity <sup>R</sup> program (see **Supplementary Figure 1**). In the friend and team challenge groups, group features were added such that participants' PA levels were displayed to others in their group as a means to enhance motivation to perform well. Specifically, participants randomized to the friend challenge could also monitor the total distance (km) and steps (number) of another participant randomized to the friend challenge in a graphical format in the online Tractivity <sup>R</sup> program (see **Supplementary Figure 2**). Participants randomized to the team challenge could also monitor the total distance (km) and steps (number) of their team and other teams in a graphical format in the online Tractivity <sup>R</sup> program (see **Supplementary Figure 3**). For the team challenge, five groups of five participants were created, totaling 25 participants. Participants were blinded such that no-one knew the identity of the other person or persons in their group (to comply with ethical codes of conduct).

#### Study Assessments

#### **Physical activity**

Participants wore a Tractivity <sup>R</sup> activity monitor (Tractivity <sup>R</sup> , Vancouver, BC) held in a stretchable ankle band during waking hours throughout a baseline week and 6-week intervention, excluding periods when they engaged in water-related activities (e.g., bathing, swimming). The Tractivity <sup>R</sup> activity monitor is a lightweight, compact accelerometer that uses a proprietary signal processing algorithm to determine step counts in 1-min intervals. The activity monitor provides no visible feedback on the device and stores up to 30 days of data (i.e., distance, steps, active time, calories).

Research staff uploaded the participants' activity data into the online Tractivity <sup>R</sup> program at the end of the baseline week and 6-week intervention. Participants uploaded their activity data at times and frequencies of their choosing throughout the 6-week intervention. The Tractivity <sup>R</sup> activity monitor has been shown to be a valid measure of step counts in comparison to direct observation with less than a 0.5% error across a range of walking speeds (2.4, 3.1, 3.5, and 4.1 mph) (Warburton et al., 2013). Activity monitors were calibrated for stride length prior to the baseline week by having nurses walk 10 steps (at their usual walking speed) in a straight line on a large indoor track. These measures were performed in triplicate, and the average was entered into the online Tractivity <sup>R</sup> program to assist the proprietary signal processing algorithm in calculating step counts.

Tractivity <sup>R</sup> provided us with consecutively ordered minuteby-minute activity monitor data [i.e., steps, distance (km), active time (minutes), calories (kcal)] for each day of the baseline and intervention phases for all participants. We used a Hypertext Preprocessor (PHP, version 7.0) script to process the data. All activity monitor data were screened to identify valid and nonvalid days. Only days with at least 10 h of wear-time were retained for analyses, as a minimum accelerometer wear-time of 10 h has been used to provide a valid measure of daily PA (Troiano et al., 2008). Activity monitor determined step counts were used to calculate steps, minutes of MVPA and number of days PA guidelines of ≥150 min/week of MVPA in bouts of ≥10 min were met (Canadian Society for Exercise Physiology, 2011; World Health Organization, 2011). Using published guidelines (Tudor-Locke et al., 2005), a threshold value of at least 100 steps/minute was used to define MVPA. Weekly MVPA minutes in bouts of ≥10 min and daily steps were calculated.

#### **Cardiovascular health indicators**

Cardiovascular health measures were taken between 0630 and 1000 hours. Height was measured to the nearest 0.1 cm, body mass was measured to the nearest 0.1 kg, and body mass index (BMI) was calculated (kg/m<sup>2</sup> ). Waist circumference was measured to the nearest 0.5 cm (Seca 201) at the narrowest point of the torso while participants stood with arms at their sides, feet together and abdomen relaxed (American College of Sports Medicine, 2017). Body fat percentage was measured using bioelectrical impedance (BIA) (UM-041, Tanita, Roxton Industries Inc., Kitchener, Ontario). Participants were asked to adhere to the following prior to their anthropometric measurements: (1) no eating or drinking for 4 h; (2) no MVPA for 12 h; (3) no alcohol consumption for 48 h; (4) to void their bladder (within 30 min); (5) to refrain from consuming caffeine and diuretic use unless prescribed by a physician; and, (6) to postpone measurement if retaining water due to changes in menstrual cyclicity. Resting blood pressure (BP) and heart rate were assessed in a seated position after a 5-min rest period using an automated, non-invasive BP monitor (Bp-TRU, Coquitlam, BC, Canada). All measures were performed in triplicate at baseline and within 1 week post-intervention, and the average was reported for descriptive purposes. Research staff collecting cardiovascular health measures were blinded to participants' group assignment.

#### Statistical Analyses

Analyses were performed using SPSS for Windows (version 24; IBM Corp, Armonk, NY, USA). A complete case analysis was performed; only 4% (n =3/75) of the randomized participants did not complete the intervention. All outcome variables were tested for normality using Shapiro-Wilk tests of normality; number of days activity monitors were worn, MVPA levels, steps, and cardiovascular health indicators [body mass, BMI, waist circumference, resting BP (baseline phase)] were not normally distributed.

Friedman's two-way analysis of variance (ANOVA) by ranks summary was performed to examine changes in the number of days the activity monitors were worn throughout the baseline and intervention phases. A two-step approach for transforming continuous non-normalized variables to normal was applied to the MVPA levels and steps variables (Templeton, 2011). A one-way ANOVA was performed to examine differences in the normalized weekly MVPA levels and daily steps between intervention groups at baseline. A two-way repeated measures ANOVA was performed to examine changes in the normalized MVPA levels and steps variables throughout the baseline and intervention phases both within and between groups (i.e., individual, friend and team); significant values were adjusted using Bonferroni correction for multiple tests. Wilcoxon signed rank tests were performed to compare cardiovascular health indicators between time points (i.e., baseline and within 1 week post-intervention), and Kruskal Wallis tests were performed to compare changes (i.e., post-intervention values—baseline values) in cardiovascular health indicators between groups. Nonnormalized values are presented in the results for descriptive purposes. Data are reported as means ± standard deviations, unless otherwise noted, and p < 0.05 was considered statistically significant. Our post-hoc power analysis revealed that an etasquared value of 0.022 (i.e., small effect size) and alpha of 0.05, a sample size of 76 participants provides adequate power (1–β = 0.92) to detect significant differences in PA within (i.e., baseline and intervention weeks 1–6) and between groups.

## RESULTS

### Participants

All 76 screened participants met study eligibility criteria and consented to participate; 75 were randomized to the individual, friend and team PA challenges (see **Figure 1**).

Nurses' demographics, anthropometrics, types of work shifts, and nursing roles are presented in **Table 1**. On average, nurses were categorized as being overweight, normotensive, with a lowrisk waist circumference according to the American College of Sports Medicine (ACSM) guidelines (American College of Sports Medicine, 2017). Most were female (97%), working days (53%), and performing clinical duties (72%). They spent an average of 27.4±49.1 min/week in MVPA in bouts of ≥10 min; only three (4%) nurses met current PA guidelines at baseline.

### Dropouts

One participant dropped out after baseline due to a damaged device, and three (4%) participants dropped out during the intervention due to pregnancy (n = 1), loss of interest (n = 1), or time constraints (n = 1). Overall, 72 (96%) of the randomized participants completed all study assessments, including 23 (92%) assigned to the individual challenge, 25 (100%) assigned to the friend challenge, and 24 (96%) assigned to the team challenge.

#### Adherence to Intervention

Nurses wore the activity monitor for at least 10 h/day for an average of 31 of the total 42 intervention days (overall compliance rate of 74%). The number of days the nurses wore the activity monitor decreased significantly throughout the baseline and intervention phases (p < 0.05). Nurses wore the activity monitor for ≥10 h/day for an average of 6.0 ± 1.9 (baseline), 6.0 ± 2.0 (intervention week 1), 5.8 ± 1.8 (intervention week 2), 5.9 ± 1.9 (intervention week 3), 4.6 ± 2.0 (intervention week 4), 4.8 ± 2.4 (intervention week 5), and 3.5 ± 3.0 (intervention week 6) days. No significant differences in the number of days the nurses wore the activity monitor were observed between intervention groups (p > 0.05).

#### Effects of Intervention on Physical Activity

No significant differences in nurses' weekly MVPA levels (F = 0.407, p = 0.667, η <sup>2</sup> = 0.01) or daily steps (F = 1.696, p = 0.191, η <sup>2</sup> = 0.046) were observed between intervention groups at baseline. Nurses' weekly MVPA levels changed significantly over time (F = 4.022, df = 4.827, p = 0.002, η <sup>2</sup> = 0.055), and were greater during intervention week 2 when compared to intervention week 6 (p < 0.05; see **Figure 2**). No significant differences in MVPA levels were observed between intervention groups (F = 1.199, df = 9.654, p = 0.292, η <sup>2</sup> = 0.034; see **Figure 3**). Nurses' daily steps changed significantly over time (F = 7.668, df = 3.910, p < 0.001, η <sup>2</sup> = 0.100), and were greater during baseline and intervention weeks 1, 2, 3, and 5 when compared to intervention week 6 (p < 0.05) (see **Figure 4**). No significant differences in daily steps were observed between intervention groups (F = 1.146, df = 7.819, p = 0.333, η <sup>2</sup> = 0.032; see **Figure 5**). Two nurses (3%) nurses met current PA guidelines post-intervention.

#### Effects of Intervention on Cardiovascular Health Indicators

Nurses' cardiovascular health parameters are presented in **Table 2**. No significant changes in body mass, BMI or waist circumference were observed between baseline and within 1 week post-intervention (p > 0.05). Significant decreases in body fat percentage and resting systolic BP were observed within 1 week post-intervention when compared to baseline (p < 0.05). No significant differences in changes in cardiovascular health indicators were observed between groups (p > 0.05).

### DISCUSSION

This is the first randomized trial, to our knowledge, to examine the impact of a web-based intervention incorporating feedback and group features on the PA levels and cardiovascular health of nurses working in a cardiovascular setting. We observed

initial increases in nurses' PA levels (i.e., weekly MVPA levels and daily steps), though these were not sustained over the 6-week intervention and few met the current PA guidelines (≥150 min/week in bouts of ≥10 min). We also observed improvements in nurses' body fat percentage and resting systolic BP. Introducing web-based group features (i.e., friend and team PA targets) for motivation did not differentially impact PA or cardiovascular outcomes.

#### TABLE 1 | Participant characteristics.


BMI, body mass index; SD, standard deviation. \*, missing n = 2 for females for types of shifts and nursing roles.

Nurses reach a large proportion of the population making them a critically important element of the health-care workforce. Nursing practice is physically and psychologically demanding (Chin et al., 2016). Physical inactivity and cardiovascular disease have been shown to be related to lower ability to work and a greater incidence of absenteeism (Burton et al., 2014; van den Berg et al., 2017). Improving the lifestyle and overall well-being of nurses is important in order to permit optimal patient care. E-health broadly refers to the use of emerging information and communication technology to improve or enable health and health care (Government of Canada, 2010). E-health encompasses a wide range of services or systems, including electronic health records, e-prescribing, telemedicine (e.g., online and telephone coaching), consumer health informatics (e.g., on demand educational content), wearable devices (e.g., TractivityTM, FitbitTM) and real-time monitoring of user health and behavioral data. Evidence has suggested that e-health interventions may improve the PA levels and health outcomes of adults (Beratarrechea et al., 2014; Joseph et al., 2014; Direito et al., 2017). Providing web-based feedback

from wearable devices is acceptable and can increase the PA levels of inactive overweight and obese women (Cadmus-Bertram et al., 2015a,b). Our work extends these findings to nurses and provides support for the use of e-health interventions to target PA and cardiovascular outcomes in nurses working in a cardiovascular setting as revealed by the good compliance with the intervention and initial effects on behavior and health outcomes.

We found that web-based feedback from an activity monitor resulted in immediate increases in nurses' PA levels (i.e., weekly MVPA levels and daily steps). Yet, consistent with other webbased PA interventions demonstrating short-lived increases in PA levels (Vandelanotte et al., 2007; Kernot et al., 2014; Fjeldsoe et al., 2015; Joseph et al., 2015), nurses' PA levels decreased mid-way through the current intervention. Our 6-week intervention was relatively short in duration and incorporated a limited range of motivational strategies, namely group features. It could be argued that a longer intervention that combines feedback with additional strategies (e.g., social support, autonomy support, offering valuebased rationales for PA) within the web platform is needed to achieve and sustain improvements in PA levels, particularly for those who are not meeting current PA guidelines. Whether webbased interventions that encompass more strategies that can intrinsically motivate PA are more effective in sustaining nurses' PA levels requires further investigation.

We attempted to assist nurses in achieving PA targets by having participants share data with other nurses to motivate the initiation and maintenance of PA; this appeared to lack sustained appeal. We chose friend and team PA challenges based on research showing that: (1) the act of self-monitoring can improve PA levels (Michie et al., 2009); (2) social factors (e.g., social learning, comparison, normative influence, facilitation, cooperation, recognition) can be a powerful tool for increasing the effectiveness of web-based interventions (Matthews et al., 2016); (3) social competition via the web can motivate participants to become more physically active when compared to self-monitoring only (Prestwich et al., 2017); (4) gamification can have a positive impact on behavioral and health outcomes (Johnson et al., 2016); and, (5) the desire to enhance oneself and make positive impressions is an important motivator of human behavior (Leary, 1992). The integration of technologicalmediated group features did not, however, impact nurses' PA levels or cardiovascular health when compared to those who did not have access to these features. One explanation for the null finding is that the anonymity of nurses within the groups prevented social support and relatedness between participants. It is possible that nurses would have accumulated greater PA levels if the friend and team challenge conditions allowed them to feel connected and accountable to their friend or team members. It is also possible that this type of group feature is insufficient to change PA unless cash or prize incentives are provided. Finally, feedback may have undermined nurses' intrinsic motivation to perform PA because providing social pressures and rewards can give rise to extrinsic motivation (Deci and Ryan, 2012; DeSmet et al., 2014). Strategies which foster intrinsic motivation (e.g., when the behavior is done for enjoyment and personal satisfaction) may better promote behavior change and maintenance (Teixeira et al., 2012; Hancox et al., 2017; Quested et al., 2017). Future research developing and testing strategies to motivate nurses and engage them in PA is warranted.

We observed statistically significant improvements in body fat percentage (−0.8%), yet no changes in body mass, BMI or waist circumference. These latter findings were not surprising given that increases in nurses' PA levels were short-lived over the 6 week intervention and likely produced minimal, if any, deficits in energy expenditure (Reed et al., 2013). We also observed a significant improvement in resting systolic BP (−2.6 mmHg). Strong evidence from a meta-analysis of randomized controlled trials of exercise training in healthy adults (5,223 participants: 3,401 exercise training participants and 1,822 sedentary controls) suggests that resting systolic BP is reduced (−3.5 mmHg) after endurance exercise (Cornelissen and Smart, 2013). The decrease (−2.6 mmHg) we observed may not be clinically significant, yet the direction of change is nevertheless favorable and associated with reduced cardiovascular morbidity and mortality (Hansson, 1996; Padwal et al., 2016). Our findings contrast those of a


TABLE 2 | Participants' cardiometabolic health at baseline and post-intervention.

BMI, body mass index; SBP, systolic blood pressure; SD, standard deviation.

pedometer-based PA program for nurses in a Canadian multi-site health care center which reported no changes in resting systolic BP (Lavoie-Tremblay et al., 2014). This study, however, used selfreported BP measures which have been shown to have moderate agreement with measured BP (Taylor et al., 2010) and did not observe significant increases in PA levels.

Our study has several strengths. It is the first to examine the impact of web-based feedback from an activity monitor on the PA levels and cardiovascular health of nurses working long and rotating shifts in a cardiovascular health center. This is particularly important as innovative interventions are needed to address atrisk nursing populations (Reed et al., 2018). Second, we integrated technologically-mediated social participation into the web-based intervention toincrease participants'motivation, although this did not impact PA levels or cardiovascular health indicators. Third, nurses' PA levels were objectively measured in 1-min increments throughout a baseline week and 6-week intervention using a valid activity monitor. Fourth, we observed a low dropout rate of 5% (n = 4/76). A review of internet- and web-based PA interventions in which the majority of participants were women revealed a dropout rate of 21% for interventions <6 months in duration (Joseph et al., 2014). Further, a pedometer-based PA program for nurses in a Canadian multi-site health care center reported a response rate of only 55% (Lavoie-Tremblay et al., 2014).

Several limitations warrant discussion. First, the generalizability of our findings to male nurses is limited as 97% of our sample were female—characteristic of the Canadian nursing population (Shields and Wilkins, 2006). Second, the generalizability of our findings to older nurses and those working nights only is limited given most of our nurses were middle age and working days only. Third, this was a single-center study. Replication of this study across several hospitals is needed to confirm our findings. Fourth, we recruited 19% of nurses from the hospital (total nursing population = approximately 400 nurses); it is possible that nurses interested in participating in a PA and health study may be "healthier" and more active than average, thus limiting the impact of a PA intervention to improve PA and cardiovascular health. Finally, we cannot affirm that participants did not disclose their group assignment to one another, and consequently contaminate the group effects. No differences in PA or cardiovascular outcomes were, however, observed between groups.

#### CONCLUSIONS

Web-based PA interventions may be effective in initiating, but not sustaining optimal PA levels among Canadian nurses working in a cardiovascular setting. Improvements in nurses' body fat percentage and resting systolic BP were observed following the intervention. Embedding technologically-mediated social participation did not appear to impact nurses' PA levels or cardiovascular health. Nurses working in a cardiovascular setting do not appear to be meeting PA guidelines. Future larger multi-site randomized controlled trials are needed to confirm our findings. If our findings are replicated, alternative novel and multi-faceted interventions are needed to address the low PA levels and poor cardiometabolic health of at-risk Canadian nurses.

The growth in e-health interventions is occurring rapidly. It is foreseeable that new technologies (e.g., global positioning systems, smart watches, video games) will provide additional means of improving and or maintaining PA which is a known modifiable risk factor of cardiometabolic health. Consumers will be able to monitor their time spent in MVPA, daily steps and bouts of sedentary time. The cost of such technologies will, in all likelihood, continue to decrease as companies strive to provide competitive, accessible and affordable products for consumers. Future work is needed to synthesize all available data regarding the effectiveness of e-health interventions in improving PA levels and cardiometabolic health in adults, particularly in women as over half of women lack knowledge of cardiovascular disease risk factors and the majority are uninformed when it comes to their own level of risk (McDonnell et al., 2014; Reed et al., 2015).

### AUTHOR CONTRIBUTIONS

JR: Conceptualized and designed the study; JR: Performed the analyses and interpretations of the data, drafted the initial manuscript, and revised and approved the final manuscript as submitted; CC and MZ: Assisted with the acquisition of data, drafting and revising the manuscript; JB: Assisted in drafting and revising the manuscript; HT, HS, RR, and AP: Assisted in designing the study, selecting outcome measures, and provided critical revision of the manuscript.

### FUNDING

This was investigator initiated research. Funding was provided by the University of Ottawa Heart Institute to purchase the equipment required for this study. JR is currently supported by a New Investigator Award in Clinical Rehabilitation by the Canadian Institutes of Health Research (CIHR).

### ACKNOWLEDGMENTS

We would like to thank Angelica Blais, Asha Varughese, and Bryce Bongfeldt for their assistance in conducting this study. We would also like to thank Fraser Reed for building the database and assisting in computing the physical activity variables.

#### REFERENCES


### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphys. 2018.00142/full#supplementary-material

Supplementary Figure 1 | Online Tractivity® program which displayed participants distance, steps, active time and calories expended on an hourly, daily, weekly, and monthly basis.

Supplementary Figure 2 | Friend challenge in online Tractivity® program which displayed the total distance and steps of another participant randomized to the friend challenge.

Supplementary Figure 3 | Team challenge in online Tractivity® program which displayed the total distance and steps of others teams randomized to the team challenge.

behaviors: behavior change techniques, systematic review and metaanalysis of randomized controlled trials. Ann. Behav. Med. 51, 226–239. doi: 10.1007/s12160-016-9846-0


activity intervention for women with young children. PLoS ONE 9:e108842. doi: 10.1371/journal.pone.0108842


overweight and obese people: a systematic review of systematic reviews. J. Med. Internet Res. 19, e229. doi: 10.2196/jmir.6972


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Reed, Cole, Ziss, Tulloch, Brunet, Sherrard, Reid and Pipe. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Application of dGNSS in Alpine Ski Racing: Basis for Evaluating Physical Demands and Safety

Matthias Gilgien1,2 \*, Josef Kröll <sup>3</sup> , Jörg Spörri 3,4, Philip Crivelli <sup>5</sup> and Erich Müller <sup>3</sup>

<sup>1</sup> Department of Physical Performance, Norwegian School of Sport Sciences, Oslo, Norway, <sup>2</sup> St. Moritz Health and Innovation Foundation, Center of Alpine Sports Biomechanics, St. Moritz, Switzerland, <sup>3</sup> Department of Sport Science and Kinesiology, University of Salzburg, Hallein, Austria, <sup>4</sup> Department of Orthopedics, Balgrist University Hospital, University of Zurich, Zurich, Switzerland, <sup>5</sup> Group for Snowsports, WSL - Institute for Snow and Avalanche Research SLF, Davos, Switzerland

### Edited by:

Billy Sperlich, University of Würzburg, Germany

#### Reviewed by:

Giovanni Messina, University of Foggia, Italy Gerald Allen Smith, Colorado Mesa University, United States

> \*Correspondence: Matthias Gilgien matthias.gilgien@nih.no

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 21 September 2017 Accepted: 13 February 2018 Published: 06 March 2018

#### Citation:

Gilgien M, Kröll J, Spörri J, Crivelli P and Müller E (2018) Application of dGNSS in Alpine Ski Racing: Basis for Evaluating Physical Demands and Safety. Front. Physiol. 9:145. doi: 10.3389/fphys.2018.00145 External forces, such as ground reaction force or air drag acting on athletes' bodies in sports, determine the sport-specific demands on athletes' physical fitness. In order to establish appropriate physical conditioning regimes, which adequately prepare athletes for the loads and physical demands occurring in their sports and help reduce the risk of injury, sport-and/or discipline-specific knowledge of the external forces is needed. However, due to methodological shortcomings in biomechanical research, data comprehensively describing the external forces that occur in alpine super-G (SG) and downhill (DH) are so far lacking. Therefore, this study applied new and accurate wearable sensor-based technology to determine the external forces acting on skiers during World Cup (WC) alpine skiing competitions in the disciplines of SG and DH and to compare these with those occurring in giant slalom (GS), for which previous research knowledge exists. External forces were determined using WC forerunners carrying a differential global navigation satellite system (dGNSS). Combining the dGNSS data with a digital terrain model of the snow surface and an air drag model, the magnitudes of ground reaction forces were computed. It was found that the applied methodology may not only be used to track physical demands and loads on athletes, but also to simultaneously investigate safety aspects, such as the effectiveness of speed control through increased air drag and ski–snow friction forces in the respective disciplines. Therefore, the component of the ground reaction force in the direction of travel (ski–snow friction) and air drag force were computed. This study showed that (1) the validity of high-end dGNSS systems allows meaningful investigations such as characterization of physical demands and effectiveness of safety measures in highly dynamic sports; (2) physical demands were substantially different between GS, SG, and DH; and (3) safety-related reduction of skiing speed might be most effectively achieved by increasing the ski–snow friction force in GS and SG. For DH an increase in the ski–snow friction force might be equally as effective as an increase in air drag force.

Keywords: physical fitness, strength training, physical conditioning, external forces, air drag, ground reaction force, global navigation satellite systems, GPS

## INTRODUCTION

The physical demands on athletes in sport are primarily driven by the external forces acting in the interface between the athlete and the athlete's physical surroundings. In sport, the surroundings typically include the field of play where interaction forces occur between the athlete and the ground, sports apparatus (for example, high bars in gymnastics), sports gear (such as rackets in tennis) or fluids, such as air and water (water sports) (Knudson and White, 1989; Kolmogorov and Duplishcheva, 1992; Gastin et al., 2013). Hence, to quantify physical demands in sports, we first need to quantify the external forces acting on athletes. The validity of physical demand is therefore strongly related to the validity of the quantification of force. Validity of force measurement has two aspects; internal and external validity (Atkinson and Nevill, 2001). To maximize external validity the forces need to be captured in the natural sporting setting, preferably during competition, using measurement devices that provide minimal obstruction to athletes in the execution of their sport. Internal validity is achieved if precision and repeatability of force measurement is maximized. Alpine ski racing is an example of a sport that challenges both types of validity to a significant degree. The sport is executed in rough surroundings, athletes move at high speed over large distances (Kraemer et al., 2002; Kröll et al., 2016c), and safety and external validity aspects limit the force measurement equipment that can be mounted on athletes. Hence, the measurement of force is a difficult but important challenge in alpine skiing research and practice. Ground reaction forces are most commonly measured using pressure insoles or force plates (Mote, 1987; Lüthi et al., 2004; Federolf et al., 2008; Stricker et al., 2010; Nakazato et al., 2011; Kröll et al., 2016b; Falda-Buscaiot et al., 2017). Air drag force has been analyzed using wind tunnel testing (Luethi and Denoth, 1987; Savolainen, 1989; Thompson et al., 2001; Barelle et al., 2004; Meyer et al., 2011). However, to gain a holistic understanding of the external forces acting in skiing, these external forces need to be determined simultaneously and under field conditions. Since the measurement of ground reaction forces alone does not describe the entire physical demand, air drag force needs to be determined at the same time. Therefore, modeling has been applied to kinematic data to simultaneously derive air drag force and ground reaction forces in on-snow skiing for SL and GS (Brodie et al., 2008; Reid, 2010; Meyer et al., 2011; Supej et al., 2012; Gilgien et al., 2013). Such analysis has not so far been conducted for SG and DH, since methodologic limitations have not allowed for the measurement of skier kinematics over large capture volumes; hence, such knowledge is very limited in the speed disciplines (Gerritsen et al., 1996; Schiestl et al., 2006; Gilgien, 2014; Gilgien et al., 2014a, 2015a,b, 2016; Heinrich et al., 2014; Schindelwig et al., 2014; Yamazaki et al., 2015).

Recent advances in wearable measurement technology have allowed the reconstruction of skier kinematics across large capture volumes. These new methods combine differential global navigation satellite system technology (dGNSS) (Lachapelle et al., 2009; Andersson et al., 2010; Supej and Holmberg, 2011; Gilgien et al., 2014b) with digital terrain models (DTM) (Supej et al., 2012; Gilgien et al., 2013, 2015c; Nemec et al., 2014) or with inertial measurement technology (Brodie et al., 2008; Supej, 2010; Zorko et al., 2015; Fasel et al., 2016). Applying kinetic models to the captured kinematic data, both air drag force and ground reaction force and its components can be calculated simultaneously (Supej et al., 2012; Gilgien et al., 2013) without obstructing the athletes and thus ensuring high external validity (Atkinson and Nevill, 2001; Thomas et al., 2005), since skiers only wear a dGNSS unit on the body. This type of wearable technology allows the determination of skier kinematics and kinetics in skiing competitions across large capture volumes, such as entire SG and DH races, over several kilometers. The application of this new methodology is illustrated in computation of the physical demands with respect to adequate conditioning and an example taken from injury prevention for GS SG and DH.

### Physical Demands and Appropriate Physical Preparation

To prepare athletes for a certain sport the athlete's physical training needs to meet the coordinative affinity of the sport in competition (Muller et al., 2000). Specifically, the extent and magnitude athletes engage in static and dynamic muscular work and the nature of this muscular work need to correspond between training and competition. To ensure coordinative affinity between training and competition the prevalence, magnitude and the time–force pattern of the external forces need to be quantified and compared for the specific sport in training and competition. The physiological responses to alpine skiing in training and competition was assessed quite broad (Andersen and Montgomery, 1988; Neumayr et al., 2003; Turnbull et al., 2009; Ferguson, 2010). The scientific knowledge of the physical demands in alpine ski racing is limited to the technical disciplines slalom (SL) and giant slalom (GS) (Reid, 2010; Spörri et al., 2012b; Kröll et al., 2014, 2016c; Supej et al., 2014). Hence, to allow coaches and athletes to target their physical training specifically to the speed disciplines, the prevalence, magnitude and time– force patterns of the external forces need to be quantified for the speed disciplines SG and DH.

### External Forces and Injury Prevention

The ability to withstand external forces in alpine ski racing is not only beneficial from a performance perspective (Raschner et al., 2012); if external forces exceed those an athlete's body can withstand, they lead to injuries. Therefore, the external forces acting in alpine skiing were not primarily examined with respect to physical demands on the athletes, but as a cause of injury (Mote, 1987; Bally et al., 1989; Quinn and Mote, 1992; Read and Herzog, 1992; Herzog and Read, 1993; Gerritsen et al., 1996; Yee and Mote, 1997; Hame et al., 2002; Raschner et al., 2012; Spörri et al., 2015). To prevent injuries, a good understanding is first needed of the contribution of external forces, and second of the consequences of changes in external factors, such as course setting and equipment, on external forces and injuries. Investigations were therefore conducted into how external forces are related to injury rates in the ski racing disciplines GS, super-G (SG) and downhill (DH) (Gilgien et al., 2014a), and how changes in ski geometry (Zorko et al., 2015; Gilgien et al., 2016; Kröll et al., 2016a,b; Spörri et al., 2016), course setting (Reid, 2010; Spörri et al., 2012c; Gilgien et al., 2014a, 2015a,b) and terrain (Supej et al., 2014; Gilgien et al., 2015a,b; Falda-Buscaiot et al., 2017) alter speed and external forces in these alpine skiing disciplines. However, one possibility for reducing speed and external forces, which was suggested by expert stakeholders in the ski racing community (Spörri et al., 2012a), was not investigated scientifically: increasing air drag by raising the air drag coefficient through changes in the materials used in athletes' clothing. An increase in air drag may increase the air drag force and the share of mechanical energy that is dissipated to the skier's surroundings, which in turn has the potential to lead to a reduction in skier speed (Bardal and Reid, 2014). Reduced skier speed might reduce the risk of injuries, especially in the case of high-impact accidents (Gilgien et al., 2014a, 2016). Therefore, we need to understand to what extent the braking forces in skiing, which are the air drag force and the ski–snow friction force, contribute to energy dissipation to the surroundings and subsequent speed reduction. Knowing the relative contributions of air drag and ski–snow friction forces to energy dissipation will allow us to understand whether an increase in air drag force or in ski–snow friction force is more effective in reducing speed and impact forces in accidents in each skiing discipline.

In the current study a new, validated and wearable dGNSS measurement-based method (Gilgien et al., 2013, 2015c) was applied to capture the external forces acting on forerunners skiing World Cup (WC) races in GS, SG and DH. The collected data were applied to illustrate the potential of such technology to enhance knowledge for scientists and practitioners on the physical demands of alpine skiing and injury prevention. For the first time, (i) the physical demands on the athletes in alpine skiing were assessed for GS, SG, and DH; and (ii) the effectiveness of energy dissipation and hence the ability to reduce skier speed was assessed for both air drag and ski–snow friction forces for GS, SG, and DH.

### METHODS

#### Measurement Protocol

During the WC seasons 2010/11 and 2011/12, one male forerunner was equipped with a wearable dGNSS in various races. The forerunner was part of the official forerunner group and started directly prior to the respective WC races. Seven male WC giant slalom (GS) races—in total 14 runs—(Sölden (twice), Beaver Creek, Adelboden (twice), Hinterstoder, Crans Montana), 5 super-G (SG) races—in total 5 runs—[Kitzbühel, Åre, Hinterstoder, Crans Montana (twice)] and 5 downhill (DH) races—in total 16 runs including training runs—(Lake Louise, Beaver Creek, Wengen, Kitzbühel, Åre) were included in the analysis. In GS, each single competition run, and in DH, all training and competition runs were measured and analyzed. The forerunners were former male WC or current European Cup racers (age: 25.1 ± 3.6 years, mass: 86.1 ± 10.0 kg). This study was approved by the Ethics Committee of the Department of Sport Science and Kinesiology at the University of Salzburg and the athletes were informed of the investigation's purpose and procedures and signed written informed consent.

### Data Collection Methodology

The forerunner's head trajectory was captured using kinematic dGNSS with the antenna (G5Ant-2AT1, Antcom, Canada) mounted on the helmet, and a GPS/GLONASS dual frequency (L1/L2) receiver (Alpha-G3T, Javad, USA) was carried in a small cushioned backpack (**Figure 1**). The total weight of the measurement equipment carried by the skier was 940 g (receiver 430 g, backpack 350 g, antenna 160 g). Differential kinematic carrier phase position solutions of the skier's trajectory were computed at 50 Hz using the data from two base stations consisting of antennas (GrAnt-G3T, Javad, USA) and receivers (Alpha-G3T, Javad, USA) mounted on tripods. The geodetic postprocessing software GrafNav (NovAtel Inc., Canada) was used to compute differential kinematic carrier phase position solutions (Gilgien et al., 2014b).

The entire course width of the snow surface geomorphology was captured from start to finish using static dGNSS (Alpha-G3T receivers with GrAnt-G3T antenna, Javad, USA) and a Leica TPS 1230+ (Leica Geosystems AG, Switzerland). The number of points captured to describe the snow surface was dependent on the uniformity of the terrain. The less uniform the terrain, the more points were captured per area (in average on the entire course 0.3 points per m<sup>2</sup> ). Based on the surveyed snow surface points a DTM was computed by applying Delaunay triangulation (de Berg et al., 2008) and smoothing using bi-cubic spline functions (Gilgien, 2014; Gilgien et al., 2015a,b).

## Parameter Computation

#### Computation of the External Forces

The antenna trajectory of the skier and the DTM were used as input parameters in a mechanical model (Gilgien et al., 2013) from which the ground reaction force (**F**SKI) and its component in the tangential direction to the skiers' trajectory (**F**SKI**-**FRICTION) were computed. The model also derived the air drag force (**F**AIR-DRAG). For a detailed description of the force computations (see Gilgien et al., 2013). **F**AIR-DRAG was derived using body extension derived from the GNSS antenna position, a pendulum model attache to the antenna and the DTM, from skier speed which was derived from position data and a air drag cefficient model. The derivation of **F**SKI and **F**SKI-FRICTION was based on 1) the reconstruction of the center of mass position from the antenna position, the pendulum model attached to the antenna, and the DTM, 2) from the center of mass position the resultant force was calculated using time derivatives and mass of the athlete 3) **F**SKI and **F**SKI-FRICTION were calculated as the difference from the resultant force, **F**AIR-DRAG and gravity.

### Characterization of the Physical Demands

For characterization of the physical demands, **F**SKI was considered. The maximum **F**SKI (**F**SKIMAX) was calculated for each turn as the average of the highest 10% of **F**SKI for GS and SG, according to the method of Gilgien et al. (2014a). To approximate the fraction of time skiers were doing work in extended or crouched positions, the time in which skiers were skiing in a

FIGURE 1 | A forerunner equipped with a differential global navigation satellite system antenna on the helmet and a receiver in the cushioned backpack that was carried below a number bib during racing.

tucked position was approximated using the following criteria: (1) the CoM turn radius was larger than 125 m, and (2) the shortest distance from the GNSS antenna (which was mounted on the skier's helmet) position to the local terrain surface was less than the distance: 0.6 • body length + 6 cm. The time skiers were turning was defined as the periods when the CoM turn radius was smaller than 125 m. The time skiers were skiing straight but in an upright body posture (non-turning and non-tucked) was calculated as the difference between the sum of the time in tucked position and the time skiers were turning, as a percentage. CoM turn radius and distance to local DTM were computed according to the methods of Gilgien et al. (2015a,b,c).

To characterize the timing of **F**SKI through a turn cycle in GS and SG for each turn and averaged across all turns, the time for the following sections (phases) were calculated: from turn transition at the beginning of the turn (switch1) to gate passage; from switch1 to the time of **F**SKIMAX; from gate passage to turn transition at the end of the turn (switch2); and the overall turn cycle time (from switch1 to switch2). Turn transition was calculated as the deflection point of the CoM trajectory between turns (Gilgien et al., 2015a,b). Run time is a rough estimation of total workload, while impulse (the integration of air drag and ground reaction force over the run time), is a measure of the total workload. Impulse and run time were calculated according to the methods of Gilgien et al. (2014a).

### Contribution of External Forces to Energy Dissipation

The instantaneous energy dissipation due to ski–snow friction, EDISSSKI and energy dissipation due to air drag, EDISSAIR were computed according to Equations (1) and (2). The relative contributions of EDISSSKI and EDISSAIR to the total instantaneous energy dissipation (sum of EDISSSKI and EDISSAIR) were expressed as percentages of total instantaneous energy dissipation.

$$EDIS\_{\text{SKI}} = \int F\_{\text{SKI}-FRICION}(t)\nu(t)dt\tag{1}$$

$$EDI\text{SS}\_{AIR} = \int F\_{AIR-DRAG}(t)\nu(t)dt\tag{2}$$

#### Statistical Analysis

Normality of instantaneous data from all races in each discipline was tested using a Lilliefors test (α = 0.05). No parameter was found to be normally distributed, so non-parametric statistics were applied to compare all parameters between disciplines. Median and inter-quartile range (IQR) were computed for all parameters and disciplines. The relative sizes of parameters for GS and SG compared to DH were computed from the medians of each discipline and were expressed as percentages of DH medians. In addition, mean and standard deviation were calculated for the time skiers were in tucked position, the time skiers were turning, the time spent skiing in nonturning and non-tucked position, impulse, and run time for all disciplines. The medians of the disciplines were tested using an ANOVA, Kruskal–Wallis test (p = 0.01), followed by a Friedman's test (p = 0.01) if significant differences were found in the ANOVA.

For GS and SG mean and standard deviations were also computed for turn cycle time characteristics, number of direction changes and **F**SKIMAX. For **F**SKI, turn cycle means were computed for each 10% increment of the time-normalized turn cycles for SG and GS.

## RESULTS

### Overview of External Forces

The median, IQR and the percentage values for GS and SG in relation to DH are given in **Table 1**. The medians were significantly different (p = 0.01) between disciplines for all forces. **Figure 2** illustrates the differences in forces [expressed in body weight ([BW)] between disciplines in histograms. The median **F**SKI was 22% larger for GS and 15% larger for SG compared to DH. The IQRs were largest for GS, followed by SG and DH. In GS and SG skiers skied for about 40% of the time with **F**SKI values larger than 1.5 BW, while in DH values above 1.5BW were achieved for less than 20% of the time. **F**SKI-FRICTION median was doubled for GS compared to DH and 52% larger for SG compared to DH. The IQR was largest for GS, followed by SG and DH. The median **F**AIR-DRAG was largest for DH, followed by SG and GS, and was approximately twice as large for DH as for GS. IQR was largest for DH, followed by SG and GS. In DH, **F**AIR-DRAG was larger than 0.2 BW for ∼25% of the time, while this magnitude occurred for less than 2% of the time in GS.

### Characterization of the Physical Demands

The measures for total load on athletes, run time and impulse had showed the highest values for all measures in DH, followed by SG and GS (**Table 2**). The percentage of total run time in which athletes were turning was longest in GS, followed by SG and DH. The total time athletes were in tucked position was longest in DH, followed by SG and GS. The time when skiers were not turning and were in an upright position did not differ between disciplines. For results see **Table 2**.

An SG run consisted of 41 turns, while GS consisted of 51 turns, which indicates that SG consists of a highly cyclic turn pattern where skiers turn for 79.4% of the run time while in GS they turn for 92.8% of the run time (**Table 2**). Forcetime characteristics are illustrated in **Figure 3** and **Tables 3**, **4**. **Figure 3** shows the **F**SKI and COM turn radius as a function of mean turn time for GS and SG with the mean drawn in solid lines and standard deviations in dashed lines for **F**SKI. To

TABLE 1 | Median and interquartile range (IQR) of the absolute values for all disciplines and the relative values for Giant slalom and Super-G compared to Downhill.


\*The value of DH is equal to 100%.

FSKI (ground reaction force), FAIR−DRAG (air drag force), FSKI−FRICTION (ski – snow friction). allow quantitative reconstruction of the **F**SKI—turn cycle time relationships in GS and SG these are provided as 10% turn cycle time increments in **Table 3**. Turn timing characteristics, along with the number of direction changes and **F**SKIMAX characteristics, are provided in **Table 4**.

### Contribution of External Forces to Energy Dissipation

For the dissipative forces **F**AIR-DRAG and **F**SKI-FRICTION**,** median energy dissipation to the surroundings was not significantly different between GS and SG for EDISSSKI. All other skiing discipline median values were significantly different between disciplines for both energy dissipation types (**Table 5**). The median EDISSSKI was 41% (GS) and 42% (SG) larger than for DH. The median for EDISSAIR was found to be 41% (GS) and 71% (SG) of the median for DH. DH had also the largest IQR. The relative contributions of energy dissipation (median) due to air drag and ski–snow friction were found to be 23% (EDISSAIR) and 77% (EDISSSKI) in GS, 35% (EDISSAIR) and 65% (EDISSSKI) in SG and 51% (EDISSAIR) and 49% (EDISSSKI) in DH.

**Figure 4** illustrates the relative contribution of **F**AIR-DRAG and **F**SKI-FRICTION to the total energy dissipation (EDISS) as a percentage contribution of EDISSAIR to total energy dissipation for GS, SG and DH. The horizontal axis shows the contribution of EDISSAIR as a percentage of total energy dissipation, while the vertical axis shows the frequency of occurrence of these contribution patterns. The percentage contribution of EDISSSKI to total energy dissipation was complementary to the percentage contribution of EDISSAIR to total EDISS, since **F**SKI-FRICTION and **F**AIR-DRAG are the only sources for EDISS. For more than 80% of the time EDISSSKI had a larger contribution to total EDISS than EDISSAIR in GS, while in DH the contribution of EDISSSKI was larger than the contribution of EDISSAIR to total EDISS for less than 40% of the run time.

### DISCUSSION

The study revealed that: (1) the method was effectively applied to capture external force data from WC races; (2) the physical demands in alpine ski racing were mainly characterized by fluctuations in the ground reaction force, which followed a cyclic pattern and was most pronounced for GS, followed by SG and DH; and (3) injury prevention measures using an increase in air drag would be about equally effective as measures that cause an increase in ski–snow friction for DH, while for GS and SG measures that cause an increase in ski–snow friction would be most effective.

#### The Application of dGNSS Technology to Capture External Force Data From WC Races in Alpine Skiing

It has been shown that if high-end dGNSS devices are carefully applied, antenna position accuracy to less than 5 cm can be reached even in highly dynamic sports such as alpine skiing (Gilgien et al., 2014b, 2015c). It has also been shown that the

TABLE 2 | Mean and standard deviation for run time, impulse per run; percentage of time skiers are turning per run; percentage of time skiers are not turning but are not in tucked position per run; percentage of time skiers are in tucked position per run for all disciplines.


position accuracy of a dGNSS allows valid derivation of velocity and of the external forces acting on skiers simultaneously (Gilgien et al., 2013, 2015c). The present study showed that the high validity of the wearable technology allowed detailed investigation of aspects of physical fitness and injury prevention that are relevant for practitioners of a sport where athletes move at high speed through rough surroundings and over large distances. Also, the method proved to be valid and practicable to be applied in a large number of WC races.

### Characterization of the Physical Demands

To get a rough idea of the physical demands of a sport or a discipline the total physical load may serve as a good indication. Run time provides limited information, since the intensity

FIGURE 3 | Turn cycle characteristics for ground reaction force (FSKI) for Giant slalom in black and Super-G in gray as a function of mean turn cycle time. Instantaneous mean in solid line, Standard deviations in thin line.





TABLE 5 | Median and interquartile range (IQR) of the absolute values for all disciplines and the relative values for Giant slalom and Super-G compared to Downhill.


\*The value of DH is equal to 100%.

EDISSSKI (energy dissipation due to ski–snow friction), EDISSAIR (energy dissipation due to air drag force).

of the work done is not measured. Measuring impulse—the integration of the external forces **F**AIR-DRAG and **F**SKI over the run time—might describe the total load better. Comparing the three disciplines, impulse was highest in DH, followed by SG and GS if only one run was considered in GS (Gilgien et al., 2014a). In GS, athletes actually ski two runs, if they qualify for the second run. Hence, the impulse for the first run, the 3 h break between the two runs and the warm-up to the second run define the demands for physical recovery between runs for that discipline.

To understand the total physical load on athletes in more detail, we need to compare the factors contributing to the impulse. These are run time, **F**AIR-DRAG and **F**SKI. Run time was longest in DH followed by SG and GS, while the sum of median **F**AIR-DRAG and **F**SKI was highest for GS (1.53 BW), followed by SG (1.51 BW) and DH (1.34 BW). Hence, despite the higher external forces in GS and SG compared to DH, run time seems to have a major impact and lead to higher impulses and total physical loads per run for the speed disciplines compared to GS.

Comparing the type of work between the disciplines there is an obvious difference between GS and SG compared to DH. Inspecting the histogram for **F**SKI in **Figure 2**, **F**SKI is overrepresented in the small and high force ranges for GS and SG compared to DH. This might be a consequence of more pronounced repeated loading-unloading patterns and higher peak forces in GS and SG compared to DH. DH consists of longer sections of straight skiing, while SG and GS consist of more or less continuous turning. In GS, skiers turned for 92.8% of the run time, in SG for 79.4% of the run time, and in DH skiers turned for only 54.8% of the run time (**Table 2**). These differences in the amount of direction alteration in skier trajectory are reflected in the higher median **F**SKI for GS and SG compared to DH, and also indicate substantial differences in the type of physical work athletes conduct in the different disciplines. GS consists of 51 direction changes (**Table 4**), meaning that GS involves 51 body extension-contraction cycles, while SG consists of 41 direction changes and extension-contraction cycles. Therefore, GS and SG consist of more or less continuous turning and dynamic muscular work, while in DH skiers ski straight for about 45% of the run time and spend 36.8% of the run time in a tucked position (**Table 2**). The amount of skiing in the tucked position might be a consequence of both the extent of sections in which skiers can ski straight, and also the higher speed compared to the other disciplines, which increases the significance of air drag force as a dissipative force (**Table 5** and **Figure 4**). Therefore, skiers try to reduce the time skiing in upright body posture, since this is likely to increase the drag area exposed to wind and increase air drag (Barelle et al., 2004; Supej et al., 2012). Hence, in DH skiers try to reduce speed loss through energy dissipation by air drag force. Because of the lower number of direction changes, skiers spend more of the total run time in the tucked position undertaking work of a more static nature with less pronounced and less frequent unloading phases, over a longer period compared to GS for instance. An earlier comparative study on SG, GS and SL revealed that a more static nature of movement in SG results in deeper knee angle and is accompanied with significantly higher EMG activity (Berg and Eiken, 1999) compared to GS and SL. While the EMG activity during SG depicted for the quadriceps muscle values of 120% muscular voluntary contraction in GS and SL only values in the order of 70% MVC were observed. Hence, tucked body position is associated with more static muscular work and increased muscular activity.

Comparing GS and SG, which consist of more or less consecutive turning (Gilgien et al., 2014a, 2015a,b), with 51 turns, or 51 loading-unloading cycles in GS compared to 41 in SG, the duration of an average turn cycle in SG (2.28 s) is about 55% longer than in GS (1.47 s). However, mean **F**SKI and **F**SKIMAX are lower in SG compared to GS. Therefore, in SG athletes need to withstand a lower **F**SKI but over a longer period of time. **Figure 3** shows that the mean **F**SKI is larger than 1.5 BW for 1.18 s (from 0.50 to 1. 68 s after switch1) in SG, while in GS mean **F**SKI is larger than 1.5 BW for 0.81s (from 0.47 to 1. 31 s after switch1). In short, in SG athletes need to withstand a force larger than 1.5 BW for 0.37 s longer than in GS. In both disciplines, **F**SKIMAX occurs at gate passage and the time from turn initiation (switch1) to gate and time to the occurrence of **F**SKIMAX is longer than from gate to turn completion (switch2). This means that building up the maximal force occurs over a longer period of time than turn completion for both disciplines. The time to build up **F**SKIMAX is substantially shorter in GS compared to SG. Therefore, athletes face a substantially more pronounced loading–unloading pattern than in SG, with a higher **F**SKI but a shorter time to the next unloading phase. The loading–unloading pattern is even more pronounced in slalom, where the loading–unloading time is shortest and highest **F**SKI compared to the other disciplines (Reid, 2010; Kröll et al., 2016a). These substantial differences in **F**SKI characteristics between probably need different physical preparation to maximize performance and minimize injury risk. The **F**SKI and turn cycle timing information might be useful for coaches and athletes in adapting dryland training to the discipline-specific **F**SKI–time pattern, since dryland training that simulates the physical demands of competitive skiing might lead to an adequate physiological adaptation (Kraemer et al., 2002; Kröll et al., 2016c). In order to imitate the physical demands of alpine skiing in dryland training, skiing simulators (Nourrit et al., 2003; Deschamps et al., 2004; Hong and Newell, 2006; Teulier et al., 2006; Panizzolo et al., 2013; Moon et al., 2015; Lee et al., 2016) and skiing carpets (Fasel et al., 2017) are used to a certain extent. The data provided in this study might help to adapt these devices to the physical demands of competitive on-snow skiing with respect to the discipline-specific force–time pattern.

### Contribution of External Forces to Energy Dissipation

The analysis of the dissipative forces contributing to total EDISS (**Figure 4**) confirmed the finding from another study (Supej et al., 2012) that EDISS in GS is mainly determined by **F**SKI-FRICTION. In SG, **F**SKI−FRICTION was still clearly the major contributor to EDISS, while the contributions of **F**SKI-FRICTION and **F**AIR-DRAG were approximately balanced in DH. Hence, for slalom (Reid, 2010), GS and SG, a certain percentage increase of **F**SKI-FRICTION would have a larger effect on performance than a corresponding increase of **F**AIR-DRAG, while in DH the effect of an increase in the dissipative forces, **F**SKI-FRICTION and **F**AIR−DRAG by an increase in air drag coefficient through clothing would be about equal. Comparing DH with speed skiing, the contribution of **F**AIR-DRAG to total EDISS seems clearly smaller in DH than in the discipline speed skiing, where skiers do not turn, but ski straight along the fall line to reach maximal speed, **F**AIR-DRAG contributes up to 80% of total EDISS when maximal speed is reached (Thompson et al., 2001). Hence, for the alpine ski racing disciplines, an increase in **F**AIR−DRAG might only be an option for DH.

### LIMITATIONS

One potential drawback of the applied method is that ground reaction forces cannot be determined for single legs, but only for the sum of both legs. In addition, high frequency force components cannot be determined with the method used in this study. However, the method was chosen since it allows the measurement of all external forces and their components at the same time, allowing unique insight in their relationship as shown in this study. The applied method does not measure, but rather models the external forces based on kinematic data and was validated against the gold standard for GS (Gilgien et al., 2013, 2015c). Therefore, comparison of the findings from this study with previous findings reported in the literature, where forces were obtained with other methods, are of interest with respect to validity. An experimental GS study using a video-based photogrammetric method to compute skier kinematics, from which forces were derived in steep terrain (26◦ ), found mean turn **F**SKI s values of between 1.52 and 1.56 BW (Spörri et al., 2016). The maximal **F**SKI values found in that study ranged from 2.01 to 2.11 BW (Spörri et al., 2016), while a comparable study in 23◦ inclined terrain found a range of 2.32–2.44 BW for the maximal **F**SKI using pressure insoles to measure **F**SKI (Kröll et al., 2016a). Comparing these **F**SKI values with the **F**SKI values obtained in the current study for GS, we conclude that the **F**SKI values are comparable to those found for competitive skiing in previous studies and obtained with different methods. This finding increases confidence in the kinetic method applied in this study for SG and DH, where no **F**SKI data are available in the literature with which to compare our results for SG and DH. The applied method does not allow to analyze the distribution of **F**SKI between legs. This might be interesting for the speed disciplines, since previous studies found that the distribution changes from SL to GS (Kröll et al., 2016c).

### CONCLUSION

This study (1) illustrated that the validity of high-end dGNSS systems allows meaningful investigations such as characterization of physical demands and safety measures in highly dynamic sports; and (2) showed that the physical demands were substantially different between GS, SG and DH (specifically, the ground reaction force fluctuations followed a cyclic pattern, which was most pronounced for GS, followed by SG and DH, while median and peak ground reaction forces were highest for GS, followed by SG and DH); and (3) revealed that safety-related reduction of skiing speed might be most effectively achieved by increasing the ski–snow friction force in GS and SG. For DH an increase in the ski–snow friction force might be equally as effective as an increase in air drag force.

### AUTHOR CONTRIBUTIONS

MG, JK, JS, and EM designed the study. MG, JK, JS collected the data. MG and PC analyzed the data. All authors contributed to the writing.

### FUNDING

This study was financially supported by the International Ski Federation (FIS) Injury Surveillance System (ISS). The funding source had no involvement in the study design, the collection, analysis and interpretation of the data, the writing of the report or the decision to submit this article for publication.

### ACKNOWLEDGMENTS

We would like to thank Julien Chardonnens, Geo Boffi, the organizers of the World Cup alpine ski races and the International Ski Federation for their support. Parts of the present manuscript were included in the PhD thesis of the first author (Gilgien, 2014).

#### REFERENCES


skier speed in world cup alpine ski racing. PLoS ONE 10:e0128899. doi: 10.1371/journal.pone.0128899


knee injuries in alpine giant slalom ski racing. Br. J. Sports Med. 50, 14–19. doi: 10.1136/bjsports-2015-095737


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Gilgien, Kröll, Spörri, Crivelli and Müller. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Whole-Body Vibrations Associated With Alpine Skiing: A Risk Factor for Low Back Pain?

Matej Supej <sup>1</sup> \*, Jan Ogrin<sup>1</sup> and Hans-Christer Holmberg2,3,4

<sup>1</sup> Faculty of Sport, University of Ljubljana, Ljubljana, Slovenia, <sup>2</sup> School of Sport Sciences, UiT Arctic University of Norway, Tromsø, Norway, <sup>3</sup> School of Kinesiology, University of British Columbia, Vancouver, BC, Canada, <sup>4</sup> Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden

Alpine skiing, both recreational and competitive, is associated with high rates of injury. Numerous studies have shown that occupational exposure to whole-body vibrations is strongly related to lower back pain and some suggest that, in particular, vibrations of lower frequencies could lead to overuse injuries of the back in connection with alpine ski racing. However, it is not yet known which forms of skiing involve stronger vibrations and whether these exceed safety thresholds set by existing standards and directives. Therefore, this study was designed to examine whole-body vibrations connected with different types of skiing and the associated potential risk of developing low back pain. Eight highly skilled ski instructors, all former competitive ski racers and equipped with five accelerometers and a Global Satellite Navigation System to measure vibrations and speed, respectively, performed six different forms of skiing: straight running, plowing, snow-plow swinging, basic swinging, short swinging, and carved turns. To estimate exposure to periodic, random and transient vibrations the power spectrum density (PSD) and standard ISO 2631-1:1997 parameters [i.e., the weighted root-mean-square acceleration (RMS), crest factor, maximum transient vibration value and the fourth-power vibration dose value (VDV)] were calculated. Ground reaction forces were estimated from data provided by accelerometers attached to the pelvis. The major novel findings were that all of the forms of skiing tested produced whole-body vibrations, with highest PSD values of 1.5–8 Hz. Intensified PSD between 8.5 and 35 Hz was observed only when skidding was involved. The RMS values for 10 min of short swinging or carved turns, as well as all 10-min equivalent VDV values exceeded the limits set by European Directive 2002/44/EC for health and safety. Thus, whole-body vibrations, particularly in connection with high ground reaction forces, contribute to a high risk for low back pain among active alpine skiers.

Keywords: biomechanics, injury prevention, kinematics, kinetics, recreational skiing, shock, ski racing

### INTRODUCTION

Although physical activity is beneficial to human health, for example by reducing the risk of chronic disease, among the most common injuries in modern Western societies are those related to sports (Parkkari et al., 2001). For instance, alpine skiing is associated with high rates of injury for both recreational and competitive athletes

#### Edited by:

Luca Paolo Ardigò, University of Verona, Italy

#### Reviewed by:

Marco Tarabini, Politecnico di Milano, Italy Paolo Capodaglio, Istituto Auxologico Italiano (IRCCS), Italy

> \*Correspondence: Matej Supej matej.supej@fsp.uni-lj.si

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 21 November 2017 Accepted: 23 February 2018 Published: 09 March 2018

#### Citation:

Supej M, Ogrin J and Holmberg H-C (2018) Whole-Body Vibrations Associated With Alpine Skiing: A Risk Factor for Low Back Pain? Front. Physiol. 9:204. doi: 10.3389/fphys.2018.00204 (Hunter, 1999; McBeth et al., 2009; Hebert-Losier and Holmberg, 2013; Soligard et al., 2015; Stenroos and Handolin, 2015; Weber et al., 2015; Haaland et al., 2016; Müller et al., 2016; Supej et al., 2016). In addition, problems caused by overuse are also recurrent in alpine skiing, with low back pain (LBP) being the most common (Hildebrandt and Raschner, 2013; Spörri et al., 2015; Supej et al., 2016).

It has been proposed that such overuse injuries to the lower back might be reduced by controlling and/or reducing frontal and lateral bending, as well as torsion of the trunk and peak load while skiing (Spörri et al., 2015, 2016). Moreover, these studies found no differences in low-back kinematics when skis with different side-cut radii were utilized. Most of the underlying deteriorations of the spine develop early in the career of the alpine skier (Rachbauer et al., 2001), when on-snow training is not usually performed on courses resembling those used in competitions. Nevertheless, little is presently known about the nature and frequency of overuse injuries in alpine skiing, including when and why they occur (Supej et al., 2016).

On the other hand, exposure to whole-body vibrations (WBV) in connection with various occupations is strongly related to low back pain (Hulshof and van Zanten, 1987; Bovenzi and Hulshof, 1999; Lings and Leboeuf-Yde, 2000; Burström et al., 2015), which is one reason for the establishment of international health and safety standards by ISO 2631:1997 (ISO, 1997) and the European Directive 2002/44/EC (EU, European Parliament and the Council of the European Union, 2002) in this context (Griffin, 2004).

Commonly, the dynamic response of the body of an individual seated or standing still to vibrations is expressed in terms of mechanical impedance or apparent mass (i.e., the ratio between motion and force at the driving point) and transmissibilities (i.e., the ratio between two motions at distant points) (Matsumoto and Griffin, 1998). Recent studies have demonstrated that when standing still while barefoot or wearing regular shoes without any additional load, the dynamic response to vibrations (apparent mass) depends on posture (Subashi et al., 2006, 2008; Tarabini et al., 2013). More specifically, acceleration of higher frequencies at the driving point were found to be significantly more attenuated within the body with the knees bent than when erect. Nevertheless, higher spinal loads were caused by low and higher frequency WBV in both of these postures (Rohlmann et al., 2014).

During slalom and giant slalom ski racing, the most powerful vibrations had a frequency of less than 30 Hz, with the root mean square vibrational values being higher in the case of giant slalom (Spörri et al., 2017). In addition, that investigation suggested that vibrations of lower frequencies, i.e., between 4 and 10 Hz, might be particularly prone to cause injuries of the back in connection with alpine ski racing.

Furthermore, one beginner and one skilled skier skiing under uncontrolled conditions were found to be exposed to WBV that exceeded the 2-h equivalent values set by the European Directive 2002/44/EC (Tarabini et al., 2015), but no generalization could be made due to the small sample size. Therefore, it remains to be determined which forms of skiing (e.g., skidding, carved turns) are more likely to produce vibrations and whether these exceed the thresholds set by the ISO standard and European Directive. The current investigation was designed to answer these questions.

### METHODS

#### Measurements and Collection of Data

Eight highly skilled ski instructors, all former competitive racers, performed six different types of skiing on 165-cm slalom/carving skis (SLX, Elan d.d., Begunje, Slovenia) with a 14.5-m side-cut radius. These skis complied with International Ski Association (FIS) regulations for slalom. These six types of skiing corresponded to the core stages of progression typically followed by ski instructors:


It should be noted that snow-plowing, as well as basic and short swinging by definition involve skidding, where the ends of the skis glide out to the side; while with carving turns the tip and end of the ski follow the same trajectory. The ski course for testing was well prepared and groomed, the snow natural and well packed, and the air temperature between −3 and −7 ◦C with partially sunny weather that provided good visibility.

An integrated electronic piezoelectric accelerometer (sensitivity: 100 mV/g, range: 50 g, mass: 4.3 g) (3097A2, Dytran Instruments Inc., Chatsworth, CA, USA) was firmly attached to each ski boot to record accelerations of the superior-inferior axis. In addition, three variable capacitance accelerometers (sensitivity: 80 mV/g, range: 50 g, mass: 12 g) (7300A5, Dytran Instruments Inc., Chatsworth, CA, USA) attached to a belt tightened around and taped to the body measured accelerations of the sacrum in three dimensions aligned with the orientation of the trunk. A 10-Hz Global Navigation Satellite System (GNSS) (ST 1612 G, Locosys Technology Inc., Taipei, Taiwan) with external antennae tracking both United States (GPS) and Russian (GLONASS) satellites (1240, Locosys Technology Inc., Taipei, Taiwan) was positioned at the level of the upper thoracic spine (T2–T4) to track the skier's speed. These accelerometers and the GNSS were wired to a 24-bit, 200-kHz data acquisition system (DEWE-43 & DEWESoft X2, DEWESoft d.o.o., Trbovlje, Slovenia).

The data on accelerations, sampled at 5 kHz, were used to calculate power spectrum densities (PSD) and ground reaction forces, while the positioning data allowed monitoring of speed. The accuracy and tolerance of the entire set-up for determining WBV adhered to the requirements of the ISO 8041:2005 (ISO, 2005). The study design was pre-approved by the Regional Ethics Committee of the University of Ljubljana and informed written consent obtained from all the volunteers prior to testing.

#### Computation of the Parameters

To determine the frequencies of WBV, the single-sided nonparametric Fast-Fourier-Transform was first used to calculate power spectrum densities (PSD) from the ski boot accelerations for each skier and each type of skiing. The average PSD curves for both legs of each skier were created and then the data for all participants while performing the same type of skiing were combined to calculate the average power spectrum. For more effective illustration, the PSD graphs presented display a logarithmic scale and have been smoothed with an equallyweighted, zero-lag moving average filter with a length of five.

Further evaluation of WBV was based on the approach described in ISO 2631-1:1997 and in accordance with the European Directive 2002/44/EC. The raw data transmitted from the principal surface supporting the accelerometers on the ski boots were first bandpass-filtered from 0.5 to 80 Hz. Subsequently, frequency weighting of these accelerations in combination with the multiplier for a vertical z-axis for a standing position were applied, as required by the ISO 2631- 1:1997 standard, to calculate the following exposures to periodic, random or transient vibrations:

1. the weighted root-mean-square acceleration (RMS):

RMS = h 1 T R T 0 a 2 w (t) dt<sup>i</sup> <sup>1</sup> 2 , where T was the duration of measurement and a<sup>w</sup> acceleration weighted as a function of time t


$$\text{VDV} = \left[ \int\_0^\text{T} \mathbf{a}\_\text{w}^4(\mathbf{t}) \,\text{dt} \right]^{\frac{1}{4}}.$$

From each type of skiing involving turning and each skier, the first and last turn were excluded from the analysis, resulting in ∼15-s periods of data collection. In accordance with the standards, vibrations from both ski boots were considered. As required by the standard, the VDV values were expressed as 8 h and 10-min exposures for direct comparison to the action and limit values formulated in the European Directive 2002/44/EC (Griffin, 2004), while the RMS values were compared to the action and limit exposure values set by this same directive.

The three variable capacitance accelerometers positioned at the pelvis allowed estimation of ground reaction forces (GRF) as multiples of body weight (BW). These calculations involved the assumptions that the pelvis was the center of mass and air drag negligible. The GRF values were smoothed with a zero-lag thirdorder digital Butterworth filter employing a cut-off frequency of 7 Hz. Thereafter, the peak GRF values for turns were calculated and utilized for further evaluation. From the GNSS data mean skiing speeds were calculated. All calculations were performed with the DEWESoft X2 and Matlab 7.7 software (Mathworks Inc., Natick, MA, USA).

#### Statistical Analyses

The means and standard deviations for all parameters are presented. The Shapiro-Wilk test was used to explore the normality of distributions and, when necessary, the Box-Cox power transformation was performed to achieve normality. Oneway ANOVA with repeated measures was used to test for differences between parameters. Mouchly's W-test was used to indicate whether the assumption of sphericity had been violated and, if so, this was corrected for with the epsilon value, utilizing either the Huynh-Feldt or Greenhouse-Geisser procedure. For post-hoc analysis, paired sample t-tests were applied to test for differences between parameters. The false discovery rate for a family of hypotheses was controlled for by the Benjamini– Hochberg–Yekutieli procedure (Benjamini and Yekutieli, 2001). The level of statistical significance was set at p < 0.05 and all statistical analyses carried out with the Matlab software.

### RESULTS

#### Whole-Body Vibrations

Representative raw time-courses for acceleration at the ski boots of one subject while skiing with sub-techniques that involved turning are presented in **Figure 1** and the overall average PSD values for frequencies up to 80 Hz for the six types of skiing examined in **Figure 2**. Straight running and carved turns demonstrated similar patterns (**Figure 2A**), with highest densities between ∼3 and ∼8 Hz and continuous attenuation at increasing frequencies. At the same time, plowing, snowplow swinging, basic swinging and short swinging exhibited two regions of intensified PSD (**Figure 2B**), the first between ∼1.5 and ∼8 Hz and the second between ∼8.5 and ∼35 Hz, above which the PSD values declined. These attenuations of the PSD curves continued for all six types of skiing until ∼70 Hz, above which the values remained steady until 200 Hz, followed by another attenuation (∼200–500 Hz), finally remaining more or less constant until 2.5 kHz, where the power densities were 2–3 orders of magnitude lower than the maximal values.

Evaluation of the exposure to periodic, random and transient WBV associated with all six forms of skiing are also shown in **Table 1**. In the case of MTVV, one subject had considerably higher values (outliers) for snow-plow swinging and short swinging and had to be eliminated from evaluation of these particular forms of skiing, in order to achieve normality and sphericity of the data. Overall, the highest levels of exposure were observed in connection with short swinging followed by carved turns and the lowest with snow-plow swinging and straight running. The RMS and VDV values for plow were also high, but not the MTVV.

The ISO 2631:1997 states that if CF>9, then not only RMS, but also VDV and MTVV should be taken into consideration. The mean crest factors for straight running and snow plowing exceeded the ISO safety margin, while the maximal CF values for

all six types of skiing except plowing also exceeded this margin (**Table 2**).

#### Speed and Ground Reaction Forces

The mean values and standard deviations for speed and peak GRF are documented in **Figures 3**, **4**, respectively. In these cases, the Shapiro-Wilkinson test confirmed that all data were distributed normally and the condition of sphericity was also satisfied (in the case of speed data after appropriate correction). Application of paired t-tests revealed that the peak GRFs for straight downhill and plowing did not differ significantly (p = 0.11), as was also the case for plowing vs. snow-plow swinging (p = 0.27) and basic swinging vs. carved turns (p = 0.37). Comparison of all other possible pairs demonstrated significant differences (p < 0.0001). Overall, the mean GRFs were lowest for straight running (1.23 BW) and plowing (1.18 BW) and highest for short swinging (1.89 BW) and carved turns (1.93 BW).

With respect to speed, plowing and carved turns were associated with means that differed significantly from all the other types of skiing, while the mean values for straight downhill, basic and short swinging did not differ from each other. The lowest mean speed was observed in connection with snow-plow swinging (4.8 m/s) and the highest with carved turns (13.3 m/s).

#### DISCUSSION

The major novel findings of the present investigation were as follows: (i) all types of skiing examined produced whole-body vibrations (WBV), with the highest power spectrum densities (PSD) ranging from ∼1.5–8 Hz; (ii) intensified PSD between 8.5 and 35 Hz was observed only with the types of skiing that involved skidding; (iii) the RMS values for 10 min of short swinging and carved turns and all 10-min equivalent VDV values exceeded the limit values formulated by the European Directive 2002/44/EC for health and safety; and, finally, (iv) measurement of the WBV, particularly in connection with high ground reaction forces, revealed an important high-risk factor for low back pain in active alpine skiers.

### Whole-Body Vibrations Associated With Different Forms of Alpine Skiing

Our present findings demonstrate that all forms of alpine skiing produce vibrations. The WBV in the PSD spectrum below 8 Hz was associated with absence of both turning and skidding (straight running), presence of skidding (snow-plow swinging and short swinging), as well as carved turns. At the same time, higher frequency vibrations (8–35 Hz) were intensified only with skiing techniques that by definition involved side**-**skidding, in line with a previous pilot study (Supej, 2013). Furthermore, the


TABLE 1 |

skiing.

Parameter

RMS VDV 10 min

VDV 8 h

MTVV

Overall means ±

10 min, 10-min equivalent; 8 h, 8-h equivalent;

standard deviations and the results of the ANOVA and paired sample t-test are presented.

#

 22.19

 58.41

5.11

± 1.71

± 17.61

± 6.69

2.55

± 0.70

 6.05

 44.80

 116.10

 7.85

§, only pairs for which p ≥ 0.05 are shown; SR/SPS, straight running vs. snow-plow swinging;

± 2.63

± 31.50

± 11.97

 21.44

 56.44

5.29

± 0.49

± 9.17

± 3.49

 56.77

 149.43

 13.11

± 1.38

± 13.10

 252.67

 19.18

± 2.06

± 34.24

 177.62

 14.99

#, one skier was excluded from the MTVV

± 1.20

± 14.27

 92.48

78.17

± 4.98

± 1.38

2.78

± 0.25

7.73

± 0.50

 12.92

 96.00

± 13.01

± 1.49

 9.69

 67.48

± 5.42

± 0.63

147.06

92.48

0.0002

0.0054

0.0054

0.6284

 SR/SPS: 0.79

 SR/SPS: 0.78

 SR/SPS: 0.78

 SR/SPS: 0.40

Straightrunning

 Plowing

Snow-plowswinging

Basicswinging

Shortswinging

 Carvedturns

 ANOVA F

p-value

Paired

sphericity

sample t-tests

§

statistics

Comparison

 of the mean weighted

root-mean-square

 accelerations

 (RMS, n = 8), the fourth-power

 vibration of doses (VDV, n = 8) and maximum transient vibration value (MTVV, n = 7) for the six types of


TABLE 2 | Crest factors for all six forms of skiing.

SD, standard deviation.

regions of intensified and attenuated power spectrum densities associated with skidding here were similar to those reported for slalom and giant slalom ski racing (Spörri et al., 2017). This indicates that even competition skiing involves skidding, despite the fact that elite athletes strive for carving turns.

Interestingly, neither the peak PSD values for the lowfrequency vibrations associated with all skiing forms nor with the "skidding vibration" were centered around the first two typical eigen-frequency values reported previously for skis, i.e., f <sup>1</sup> = ∼10–13 Hz and f <sup>2</sup> = ∼40–50 Hz (Piziali and Mote, 1972; Fischer et al., 2007). This discrepancy has two interesting implications: first, the frequencies measured here are not caused by the ski's own chattering, but rather by movements (e.g., turning, skidding) during skiing. Secondly, in order to optimize performance and kinaesthetic feeling, manufacturers appear to have developed skis with properties that avoid resonances in the two most dominant frequency ranges of the PSD.

### Comparison of the Whole-Body Vibrations Associated With Alpine Skiing to Recommendations for Health and Safety

In comparison to the 8-h limit values set by the European Directive 2002/44/EC for health and safety, the RMS and 8-h equivalent VDV values here were 2–11- and 49–220-fold higher, respectively (**Table 1**). However, a typical descent from a ski lift or racing run lasts ∼1 min and most skiers perform 10 or more runs daily, making such comparisons to 8-h exposures somewhat problematic. On the other hand, even comparison to 10-min equivalents (Griffin, 2004) revealed that short swinging and carved turns exceeded the limit values, while basic swinging was close to this limit and plowing exceeded the action value (**Table 1**). For example, 10 min of short swinging was found to result in exposure to vibration equivalent to ∼18 min of carved turns or 216 min of snow-plow swinging. The 10-min equivalent VDV values here were ∼7–32-fold higher than the corresponding limits.

These observations reveal that WBV constitute an important risk factor for LBP in alpine skiers (Seidel and Griffin, 2001; Burström et al., 2015), particularly since substantial acceleration of the spine typically occurs between 4 and 10 Hz, during alpine skiing as well (Kiiski et al., 2008; Spörri et al., 2017). Interestingly, the average 10-min equivalent VDV values for plowing and basic swinging obtained here corresponded closely to the 2 h equivalent VDV values (scaled to 10-min equivalent values for comparison) reported previously for those two skiers under uncontrolled conditions (Tarabini et al., 2015).

Although the vibrations with all skiing forms examined here exceeded the WBV threshold limits for safety, there were substantial differences between these forms in this respect. Importantly, carved turns involved significantly less WBV than short swinging and only slightly more than basic swinging, the first and simplest form of "parallel skiing." Even though our measurements were performed during "free skiing" by former competitive ski racers, this observation should probably be taken into account when regulations concerning equipment, slope preparation and course setup are formulated with the aim of making competitive skiing safer.

Due to random bumps on uneven terrain, transient vibrations (occasional shocks or short-term vibrations) are also to be expected during alpine skiing. Indeed, these occurred in all forms of skiing investigated here, particularly during short swinging (**Figure 1**). However, the mean crest factor values here (**Table 2**) were not as high as expected. Surprisingly, only in the cases of straight running and snow-plow swinging did these values exceed the margin (CF > 9 according to ISO 2631-1:1997) above which evaluation of VDV and MTVV to verify exposure to WBV is obligatory. This reflected the fact that for all other types of skiing, the RMS values were so high that the ratio of the maximal instantaneous peak value of the frequency weighted acceleration signal to the corresponding RMS value (i.e., CF) remained below the threshold. Note that the latter situation, in combination with the high VDV and MTVV values observed, demonstrate clearly that our skiers were actually exposed to both short**-** and longer-term vibrations.

### Ground Reaction Force and Speed

Peak resultant forces on the spine are on the average 24% higher in the presence than absence of vibrations (Rohlmann et al., 2014). Therefore, the high GRF values observed both here and earlier in connection with competitive skiing (Supej et al., 2004, 2015; Supej and Holmberg, 2010; Vaverka and Vodickova, 2010; Spörri et al., 2015, 2016), in combination with intensive WBV (Burström et al., 2015), support the conclusion that vibrations are at least partially responsible for the high incidence of low back pain in alpine skiers. In addition, alpine skiing typically involves relatively extensive flexion of the hip joint, during which muscle forces, while maintaining trunk equilibrium, increase compression and shear forces on the spine substantially (Seidel and Griffin, 2001; Wang et al., 2010).

On the other hand, flexion, particularly at the knee joint, exerts an important influence on apparent mass behavior in response

FIGURE 3 | Peak ground reaction forces (GRF) for the six different types of skiing. In each box, the central line indicates the mean and the bottom and top edges the standard deviation. The whiskers extend to the maximal and minimal data points. Note that for more effective presentation, the p-values for pairs that did not differ significantly are the only ones shown.

FIGURE 4 | Speeds with the six different types of skiing. In each box, the central line indicates the mean and the bottom and top edges the standard deviation. The whiskers extend to the maximal and minimal data points. Note that for more effective presentation, the p-values for pairs that did not differ significantly are the only ones shown.

to WBV (Subashi et al., 2006, 2008; Tarabini et al., 2013). More specifically, the resonance frequency is reduced significantly as the knees become more bent. The static conditions employed in previous studies, with no additional load and standing either barefoot or in everyday shoes, differ considerably from those encountered during alpine skiing, thus, future studies should be designed to elucidate the effect of apparent mass in this context.

From our current findings, speed itself does not appear to have contributed directly to the WBV, since the highest exposure was observed even at quite low speeds during short swinging and, on the other hand, the highest speeds during carved turns were associated with substantially lower WBVs than short swinging. This is somewhat contradictory to the previous findings on one beginner and one skilled skier (Tarabini et al., 2015). More systematic investigations in the future will help to further elucidate the connection between WBV and speed.

#### Methodological Considerations

It was challenging to position the accelerometers here for measurement of WBV during alpine skiing. Nevertheless, the ISO 2631-1:1997 specifies that the transducers should be positioned so as to indicate the vibration at the interface between the human and source of vibration, or more specifically, measurements on the feet should be made at the surface where the feet are most supported. From this perspective, positioning on the ski boots, although not immediately obvious, was fully in line with this standard.

It is not yet known whether the ISO 2631-1:1997 and European Directive 2002/44/EC recommendations for health and safety are also appropriate for sports. However, these threshold values were set for occupations involving standingup, while not necessarily stationary or fully extended, as well as when bearing heavy loads. This is undeniably similar to the situation during alpine skiing and exposure of active alpine skiers, including competitors, to WBV (e.g., days per year) is comparable to that associated with certain occupations covered by the standard and the directive.

Estimation of the ground reaction forces using accelerometers here involve the assumption that the pelvis was the center of mass and may therefore be somewhat biased. The basic concepts of Newtonian mechanics dictate that the reliability of our estimation of the GRF on the basis of a singlepoint acceleration depends on the extent to which this acceleration matches the "model acceleration" of the center of mass. Since the largest contribution to the GRF during the alpine skiing turns can be attributed to the radial forces and body flexion-extension, for the purpose of this study this estimation of overall load was considered to be sufficient.

Finally, since our measurements were performed on a moderate incline under nearly ideal conditions of snow and weather, generalization is not straightforward. However, it can be speculated that, for example, icy conditions and/or more difficult slopes would result in more vigorous WBV.

### FUTURE PERSPECTIVES

The ISO 2631-1:1997 and underlying directives such as European Directive 2002/44/EC currently provide the only verified limits for health and safety concerning WBV. Accordingly, to fully comprehend the impact of WBV on the incidence of LBP in alpine skiers, additional systematic and/or epidemiological studies are required. Furthermore, various slopes, snow conditions, ski racing disciplines, ages of the skiers/athletes, etc. needed to be investigated in a standardized manner to enable more focused preventive measures. In particular, monitoring the training load of the athletes at highest intensity, as suggested earlier (Spörri et al., 2017), with miniaturized equipment would be beneficial, but the equipment must, of course, comply with the ISO 8041:2005 requirements.

### CONCLUSIONS

Here, we show that with all types of alpine skiing examined WBV exceeded health and safety limits, with the more advanced forms such as short swinging or carved turns exceeding these limits by as much as ∼30-fold. Thus, alpine skiing, where active participants can train 100–150 days each year with demanding snow conditions and slopes and high loads, appears to be associated with high long-term risks to health. One appropriate preventive measure would be to reduce the number of skiing days and/or at least the number of runs and skiing days involving conditions where WBV are strongest (e.g., with sideskidding). This is particularly important for younger skiers, since

### REFERENCES


many deteriorations of the spine develop early in adolescence (Rachbauer et al., 2001). At the same time, alpine skiing has several positive effects on health (Müller et al., 2011a,b). Therefore, for most recreational skiers, with relatively few skiing days each year, the preventive measures would be to ski on natural (not icy) snow and slopes where employing skiing techniques associated with weaker WBV are possible and safe.

### AUTHOR CONTRIBUTIONS

MS designed the study. MS and JO prepared the equipment for data collection, performed the measurements and prepared the platform for computations. JO performed the statistical analyses. MS, JO, and H-CH performed the data analysis and interpretation. MS wrote the first draft and all authors contributed substantially to and approved the final version.

### FUNDING

This study was supported financially by the Foundation for Financing Sport Organisations in Slovenia (grant No. RR-17-532) and the Slovenian Research Agency (grant No. P5-0147).

### ACKNOWLEDGMENTS

The authors would like to thank all of the participants sincerely for their involvement and Elan d.d. for contributing the skis used for testing.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Supej, Ogrin and Holmberg. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Validity and Reliability of 10-Hz Global Positioning System to Assess In-line Movement and Change of Direction

Pantelis T. Nikolaidis <sup>1</sup> , Filipe M. Clemente2,3, Cornelis M. I. van der Linden<sup>4</sup> , Thomas Rosemann<sup>5</sup> and Beat Knechtle5,6 \*

<sup>1</sup> Exercise Physiology Laboratory, Nikaia, Greece, <sup>2</sup> Instituto Politécnico de Viana do Castelo, Escola Superior de Desporto e Lazer, Viana do Castelo, Portugal, <sup>3</sup> Instituto de Telecomunicações, Lisbon, Portugal, <sup>4</sup> JOHAN Sports, Department of Sport Sciences, Noordwijk, Netherlands, <sup>5</sup> Institute of Primary Care, University of Zurich, Zurich, Switzerland, <sup>6</sup> Mebase St. Gallen Am Vadianplatz, St. Gallen, Switzerland

#### Edited by:

Billy Sperlich, University of Würzburg, Germany

#### Reviewed by:

Brendan Richard Scott, Murdoch University, Australia Chiara Milanese, Dipartimento di Neuroscienze, Biomedicina e Movimento, Università degli Studi di Verona, Italy

> \*Correspondence: Beat Knechtle beat.knechtle@hispeed.ch

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 15 October 2017 Accepted: 01 March 2018 Published: 15 March 2018

#### Citation:

Nikolaidis PT, Clemente FM, van der Linden CMI, Rosemann T and Knechtle B (2018) Validity and Reliability of 10-Hz Global Positioning System to Assess In-line Movement and Change of Direction. Front. Physiol. 9:228. doi: 10.3389/fphys.2018.00228 The objectives of the present study were to examine the validity and reliability of the 10 Hz Johan GPS unit in assessing in-line movement and change of direction. The validity was tested against the criterion measure of 200 m track-and-field (track-and-field athletes, n = 8) and 20 m shuttle run endurance test (female soccer players, n = 20). Intra-unit and inter-unit reliability was tested by intra-class correlation coefficient (ICC) and coefficient of variation (CV), respectively. An analysis of variance examined differences between the GPS measurement and five laps of 200 m at 15 km/h, and t-test examined differences between the GPS measurement and 20 m shuttle run endurance test. The difference between the GPS measurement and 200 m distance ranged from −0.13 ± 3.94 m (95% CI −3.42; 3.17) in the first lap to 2.13 ± 2.64 m (95% CI −0.08; 4.33) in the fifth lap. A good intra-unit reliability was observed in 200 m (ICC = 0.833, 95% CI 0.535; 0.962). Inter-unit CV ranged from 1.31% (fifth lap) to 2.20% (third lap). The difference between the GPS measurement and 20 m shuttle run endurance test ranged from 0.33 ± 4.16 m (95% CI −10.01; 10.68) in 11.5 km/h to 9.00 ± 5.30 m (95% CI 6.44; 11.56) in 8.0 km/h. A moderate intra-unit reliability was shown in the second and third stage of the 20 m shuttle run endurance test (ICC = 0.718, 95% CI 0.222;0.898) and good reliability in the fifth, sixth, seventh and eighth (ICC = 0.831, 95% CI −0.229;0.996). Inter-unit CV ranged from 2.08% (11.5 km/h) to 3.92% (8.5 km/h). Based on these findings, it was concluded that the 10 Hz Johan system offers an affordable valid and reliable tool for coaches and fitness trainers to monitor training and performance.

Keywords: GPS, team sport, tracking, direction, change

### INTRODUCTION

A global positioning system (GPS) is a satellite-based navigational technology that has been used extensively in outdoor team sports to track the players' activity (Cummins et al., 2013). Small portable GPS units have been progressively used to quantify players' locomotion and to characterize the external load (work performed) of training sessions and matches (Portas et al., 2010; Bourdon et al., 2017). Based on the information of GPS technology, it

**150**

is possible to measure basic components of players' patterns of movement, speed, distance covered and accelerations/decelerations in combination with inertial measurement unit, thus characterizing the physical impact of the session and evaluating the training programs (Cummins et al., 2013; Malone et al., 2017). Such metrics can be used in real-time or post data processing to control the training impact and to adjust the stimulus to find the "sweet-spot" of progressive training load and avoid injury risk situations (Gabbett, 2016).

Despite of the practical applications of this technology, some issues have been discussed (Malone et al., 2017): (i) reliability and validity of the device; (ii) data collection and processing; (iii) satellite connection and horizontal dilution of precision; and (iv) data exclusion criteria. GPS trackers are often commercialized

standard error of measure. The dashed line represents 200 m distance. TABLE 1 | GPS recorded distance for each participant in the five laps of 200 m.


and used before essential independent information about the precision and accuracy of the data is known (Russell et al., 2016). Both validation and accuracy are important contributors to ensure the quality of the information, thus essential independent studies allow confirmation of the usability of the data (Vickery et al., 2014). GPS devices are currently manufactured with 5 and 10-Hz sampling rates, suggesting that higher frequency rates provide greater validity for measuring distance (Cummins et al., 2013). Usually, GPS trackers are validated by using a tape measure to measure the distance between the timing gates at the start and finish to compare speed (Waldron et al., 2011). Comparisons with other tracking technologies such as a semiautomatic system or local position measurement have been also conducted (Buchheit et al., 2014; Beato et al., 2016). Frequency rates of 5-Hz seem to be enough to guarantee an acceptable level of accuracy and reliability for total distance (∼10% of variance) (Coutts and Duffield, 2010), although not satisfactory to measure high-speed running (Rampinini et al., 2015) or rapid directional change (Rawstorn et al., 2014). Based on that, 10- Hz units or higher combined with an inertial measurement unit (>100-Hz) have now been recommended to ensure the necessary level of accuracy and precision (Aughey, 2011; Rampinini et al., 2015).

Validation of GPS devices is usually done by completing a standard circuit, running at a linear sprint or with changes of direction, and uses specific tasks that simulate the game (Beato et al., 2016). In most cases, the validation studies only focus on one specific analysis (total distance or high-speed running), one kind of task (circuit, linear sprint or change of direction) and one type of comparison (tape measure, timing gates or other tracking methods) (Coutts and Duffield, 2010; Portas et al., 2010; Buchheit et al., 2014; Vickery et al., 2014). However, there is limited research that uses an integrative approach with multiple analyses, kinds of tasks and types of comparisons to test the validity and reliability of GPS units. Based on that, the purpose of this article was to determine the validity and reliability of the 10-Hz JOHAN sports tracker during straight line running and multi-direction movement patterns by comparing with a tape measure.

#### METHODS

The present cross-sectional study included two parts; in the first part, participants (female, n = 6, and male, n = 2, track-and-field athletes; age 13.1 ± 1.1 years, weight 49.9 ± 5.8 kg, height 163 ± 8 cm) performed five 200-m runs across a 200-m track-and-field stadium, whereas in the second part, participants (female soccer players, n = 20, age 15.5 ± 2.7 years, weight 60.9 ± 9.5 kg, height 162 ± 4 cm) performed the 20 m shuttle run endurance test. All participants' parents or guardians provided consent after having been informed about the content of the study. The study design was approved by the local institutional review board (Ethics Committee, Exercise Physiology Laboratory, Nikaia, Greece). In the first study, participants were eight young track-and-field athletes who performed five laps of 200 m high-intensity running (∼48 s per

lap, 15 km/h) with a 1 min break wearing the Johan GPS (JOHAN Sports, Noordwijk, Netherlands) consisting of a GPS sensor (10 Hz, including EGNOS correction), accelerometer, gyroscope and magnetometer (100 Hz, 3 axis, ±16 g). In the second study, participants were 20 female soccer players, members of a club participating in the first national league. All participants received the motion trackers before the warm-up to become familiarized with them. The motion trackers were worn in a body tight vest between the scapulae.

In the first study, participants were instructed to start the 200 m runs from a standstill and to slow their speed immediately at the finish. They ran in a single group consisting of four pairs and were asked to be close to each other continuously. The 200 m runs were captured separately and were repeated for each participant. The start of the 200 m run was chosen when the speed started to increase exponentially, whereas the end of 200 m run was highlighted after the speed started to decrease. In the 20 m shuttle run test, participants were instructed to run between two lines 20 m apart at a pace dictated by audio signal. The test started at 8.0 km/h with the speed increasing by 0.5 km/h every minute. It finished when the participants either stopped due to fatigue or failed to follow the pace on two consecutive occasions (Vanhelst et al., 2017). The number of shuttles (20 m) varies as the test progresses, e.g., seven shuttles (i.e., 140 m) are performed at 8.0 km/h and eight shuttles at 8.5 km/h. There were light clouds during the two testing days and there were no high buildings in the surroundings. Motion data from the trackers were uploaded post-experimentally to the JOHAN Sports online analysis platform. For both studies the JOHAN Software was used to capture the 200-m runs and shuttle runs motion data. This capturing was executed using 1 s data resolution (aggregated from 10 Hz motion data). The capturing was executed by one person who had three years of experience working with JOHAN Software. In the second study, participants ran multiple sets of shuttles with different speeds in the context of the 20 m shuttle run test. The capturing of the 200 m runs and 20 m shuttles were carried out for each player, separately. The start of one set of shuttles was chosen when the speed started to increase exponentially, whereas the end of one set of shuttles was highlighted by the dip in the speed (before the next set of shuttles started). Finally, all the capturing was exported from JOHAN to Excel for statistical analyses.

#### Statistical Analyses

All statistical analyses were performed using SPSS and Graphpad. The validity was tested against the gold standard of real distance (200 and 20 m with change of direction in the first and second study, respectively). An athletic track was also previously used as the criterion measure in the validation of a GPS system (Petersen et al., 2009). A repeated measures analysis of variance (ANOVA) examined differences between GPS measurements and five laps of 200 m at 15 km/h. The magnitude of these differences was examined using eta squared (η 2 ) and evaluated as: small (0.010 < η<sup>2</sup> ≤ 0.059), moderate (0.059 < η<sup>2</sup> ≤ 0.138) and large (η <sup>2</sup> > 0.138) (Cohen, 1988). The paired samples t-test examined differences between GPS measurements and 20 m shuttle run endurance test. The magnitude of the differences in the t-test was determined using the following criteria of Cohen's d: d ≤ 0.2, trivial; 0.2 < d ≤ 0.6, small; 0.6 < d ≤ 1.2, moderate; 1.2 < d ≤ 2.0, large; and d > 2.0, very large (Batterham and Hopkins, 2006). Validity was assessed using the standard error of the estimate (SEE), which was calculated as the SD (±90% CI) of the % difference between the known distance and the GPS recorded distance for each trial (Jennings et al., 2010). The percentage difference between the known distance and the GPS recorded distance was also calculated to indicate bias (Petersen et al., 2009). The percentage difference between the GPS recorded and the known distance was calculated as 100∗(GPS recorded distanceknown distance)/known distance. In addition, the GPS recorded distance and the known distance were compared using Bland-Altman plot, where the difference was calculated as recorded minus known distance and the average as (recorded-known distance)/2. Intra-class correlation coefficient (ICC) tested intraunit reliability among exercises of the same distance, i.e. in study 1, among the five laps, and in study 2, between stages of 160 m (8.5 and 9.0 km/h) and among stages of 200 m (10, 10.5, 11.0, and 11.5 km/h). ICC was interpreted as poor (<0.5), moderate (0.5–0.75), good 0.75 and 0.90, and excellent (>0.90). Interunit reliability was tested using coefficient of variation (CV) considering the performance of the same movements by different participants (Duffield et al., 2010). Statistical significance for all calculations was set at alpha = 0.05.

#### RESULTS

#### Study 1

No statistically significant difference was observed among the five 200 m GPS recorded distance trials and the known 200 m distance (p = 0.436, η <sup>2</sup> = 0.119). The difference between GPS measure and 200 m distance was −0.13 ± 3.94 m (95% CI −3.42; 3.17) in the first, 0.38 ± 3.42 m (95% CI −2.48; 3.23) in the second, 1.63 ± 4.44 m (95% CI−2.09; 5.34) in the third, 0.75 ± 3.99 m (95% CI −2.59; 4.09) in the fourth and 2.13 ± 2.64 m (95% CI −0.08; 4.33) in the fifth lap (**Figure 1**,**Table 1**). The mean difference between the GPS recorded distance and the reference distance was less than ∼1%. The Bland-Altman plot for each lap is shown in **Figure 2**. A good intra-unit reliability was observed at 200 m (ICC = 0.833, 95% CI 0.535; 0.962). Inter-unit CV ranged from 1.31% (fifth lap) to 2.20% (third lap) (**Table 1**).

#### Study 2

A statistically significant difference was observed between the GPS recorded distance and the known distance at 8.0 km/h (p < 0.001, d = 1.85), 8.5 km/h (p = 0.002, d = 1.13) and 9 km/h (p = 0.006, d = 1.09), but not at 9.5 km/h (p = 0.167, d = 0.53), 10.0 km/h (p = 0.274, d = 0.59), 10.5 km/h (p = 0.821, d = 0.15),

TABLE 2 | GPS recorded distance for each participant in the 20-m endurance shuttle run test.


\*Distance covered in each speed varies due to the different number of shuttles performed. CV, inter-unit coefficient of variation.

11.0 km/h (p = 0.794, d = −0.24) and 11.5 km/h (p = 0.902, d = 0.11). The difference between GPS measure and 20 m shuttle run endurance test was 9.00 ± 5.30 m (95% CI 6.44; 11.56) at 8.0 km/h, 7.11 ± 6.55 m (95% CI 3.95; 10.26) at 8.5 km/h, 4.59 ± 5.98 m (95% CI 1.51; 7.66) at 9.0 km/h, 2.13 ± 5.67 m (95% CI −1.01; 5.27) at 9.5 km/h, 1.20 ± 8.23 m (95% CI −9.02; 11.42) at 10.0 km/h, 0.60 ± 5.55 m (95% CI −6.29; 7.49) at 10.5 km/h, −1.33 ± 7.77 m (95% CI −20.63; 17.96) at 11.0 km/h and 0.33 ± 4.16 m (95% CI −10.01; 10.68) at 11.5 km/h (**Table 2**). The mean difference between the GPS recorded distance and the reference distance was less than ∼5%. The Bland-Altman plot for each lap is shown in **Figure 3**. A moderate intra-unit reliability was shown in the second and third stage of the 20 m shuttle run endurance test (ICC = 0.718, 95% CI 0.222;0.898) and good reliability in the fifth, sixth, seventh, and eighth (ICC = 0.831, 95% CI −0.229;0.996). Inter-unit CV ranged from 2.08% (11.5 km/h) to 3.92% (8.5 km/h) (**Table 2**).

#### DISCUSSION

The main findings of the present study were that Johan GPS system (i) accurately measured the distance in the 200 m and in the relatively fast stages of the 20 m shuttle run test; (ii) had interunit CV lower than 3.92% at short distances and 2.20% at longer distances; and (iii) had moderate-to-good intra-unit reliability in short and long distances, and the reliability was larger at relatively faster speeds. These results suggest that 10-Hz JOHAN sports GPS is valid and reliable for linear movements typically observed in team sports such as soccer. However, these properties differed between running long and short distances.

We examined the validity of the Johan GPS system against the gold standard of real distance (Muñoz-Lopez et al., 2017). Overall, the GPS shows accurate values since no difference was observed between measured and real distance in 200 m and in the relatively fast speeds of the 20 m shuttle run test. On the other side, the GPS overestimated the distance in the low speeds of the test, which should be attributed to the participants' behavior. Particularly, the participants might perform excess movements in the change of direction during the first slow stages of the test, whereas, as the test proceeded, they became more careful in order to avoid unnecessary movements that would result in additional fatigue. The ability of successful change of direction is related to speed, reactive strength, power and balance (Sheppard and Young, 2006) and characterizes athletes of team sports such as soccer (St Clair Gibson et al., 1998). Although the soccer players participating in the present study were experienced and were accustomed to the 20 m shuttle run test from previous testing sessions, the excess movements in the first levels of the test might partially explain the smaller accuracy of the GPS in this part of the test.

With regards to the reliability of the GPS, a previous review on acceptable error in GPS suggested CV values <5% can be classified as good, 5.1–10% moderate and greater than 10% poor results (Scott et al., 2016). In study 1, the inter-unit CV ranged from 1.31% (fifth lap) to 2.20% (third lap) and in

study 2, inter-unit CV ranged from 2.08% (11.5 km/h) to 3.92% (8.5 km/h), thus suggesting that the 10-Hz GPS (Johan Sports) ensures good results and can be classified as reliable to measure both long and short distances. The lower inter-unit reliability in the shorter distance might be due to the effect of acceleration and the change of direction. Previous research has shown that the validity of 10 Hz GPS is inversely related to acceleration (Akenhead et al., 2014). Moreover, it has been observed that fast change of direction reduces the accuracy of GPS (Rawstorn et al., 2014). For instance, a comparison of linear and non-linear 200 m courses showed larger error in the latter (Gray et al., 2010).

Ten Hz GPSs are more valid than GPS units with smaller sampling frequency such as 1 Hz (Coutts and Duffield, 2010) or 5 Hz (Duffield et al., 2010). A comparison between 1 and 5 Hz showed that a higher frequency rate improved validity (Jennings et al., 2010). A 10 Hz unit has been proved three times more valid and six times more reliable than 5 Hz unit (Varley et al., 2012). However, a comparative study of 10 and 15 Hz showed higher validity in the former than in the latter (Johnston et al., 2014). An explanation of the improved validity of GPS with increased sampling frequency might be that the larger sampling frequency results in the theoretically more precise identification of motion. For instance, a 10 Hz unit can analyze a motion with precision 0.1 s, whereas a 5 Hz unit can analyze with 0.2 s precision.

A limitation of this study was that it focused on linear movements of moderate intensity; thus, the findings should be generalized with caution to other modes of movements (such as multi-directional) and different speeds. One strength of this study is that it included 20 m with change of direction as well as linear running, and both are relevant for soccer. Considering the wide use of GPS units to monitor training and performance in team sports (Aughey and Falloon, 2010; Castellano and Casamichana, 2010; Wisbey et al., 2010; Clemente

REFERENCES


et al., 2017), the results of the present study will help coaches and trainers optimize their work. The results are of great practical value for professionals (e.g., coaches, fitness trainers, exercise physiologists, analysts) working with team sport players, especially soccer, as they demonstrate that a 10-Hz GPS system is a valid and reliable tool to monitor training. The error found by the GPS unit can be used by soccer professionals for detecting changes in performance (Waldron et al., 2011). Furthermore, this particular model offers an inexpensive solution compared to other commercially available models. Future studies should examine the validity and reliability of this GPS unit in larger samples of athletes performing more sport-specific movements.

### CONCLUSION

Based on the findings of the present study, we conclude that the 10-Hz Johan GPS system is a valid and reliable tool that professionals working with team sport players and endurance runners can use to monitor training involving linear in-line movement and change of direction. Moreover, those using this equipment should be aware of the differences in its accuracy between monitoring long-distances and short distances with change of direction.

### AUTHOR CONTRIBUTIONS

FC: Writing paper; CvdL: Data analysis; PN: Data collection and drafting paper; TR: Writing paper; BK: Writing paper.

### ACKNOWLEDGMENTS

We thank Patricia Villiger for her help in translation.


sport specific running patterns. Int. J. Sports Physiol. Perform. 5, 328–341. doi: 10.1123/ijspp.5.3.328


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Nikolaidis, Clemente, van der Linden, Rosemann and Knechtle. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Estimation of Vertical Ground Reaction Forces and Sagittal Knee Kinematics During Running Using Three Inertial Sensors

Frank J. Wouda<sup>1</sup> \*, Matteo Giuberti <sup>2</sup> , Giovanni Bellusci <sup>2</sup> , Erik Maartens 1,3 , Jasper Reenalda1,3, Bert-Jan F. van Beijnum1,4 and Peter H. Veltink <sup>1</sup>

1 Institute for Biomedical Technology and Technical Medicine (MIRA), University of Twente, Enschede, Netherlands, <sup>2</sup> Xsens Technologies B.V., Enschede, Netherlands, <sup>3</sup> Roessingh Research and Development, Roessingh Rehabilitation Hospital, Enschede, Netherlands, <sup>4</sup> Centre for Telematics and Information Technology, University of Twente, Enschede, Netherlands

Analysis of running mechanics has traditionally been limited to a gait laboratory using either force plates or an instrumented treadmill in combination with a full-body optical motion capture system. With the introduction of inertial motion capture systems, it becomes possible to measure kinematics in any environment. However, kinetic information could not be provided with such technology. Furthermore, numerous body-worn sensors are required for a full-body motion analysis. The aim of this study is to examine the validity of a method to estimate sagittal knee joint angles and vertical ground reaction forces during running using an ambulatory minimal body-worn sensor setup. Two concatenated artificial neural networks were trained (using data from eight healthy subjects) to estimate the kinematics and kinetics of the runners. The first artificial neural network maps the information (orientation and acceleration) of three inertial sensors (placed at the lower legs and pelvis) to lower-body joint angles. The estimated joint angles in combination with measured vertical accelerations are input to a second artificial neural network that estimates vertical ground reaction forces. To validate our approach, estimated joint angles were compared to both inertial and optical references, while kinetic output was compared to measured vertical ground reaction forces from an instrumented treadmill. Performance was evaluated using two scenarios: training and evaluating on a single subject and training on multiple subjects and evaluating on a different subject. The estimated kinematics and kinetics of most subjects show excellent agreement (ρ > 0.99) with the reference, for single subject training. Knee flexion/extension angles are estimated with a mean RMSE <5 ◦ . Ground reaction forces are estimated with a mean RMSE < 0.27 BW. Additionaly, peak vertical ground reaction force, loading rate and maximal knee flexion during stance were compared, however, no significant differences were found. With multiple subject training the accuracy of estimating discrete and continuous outcomes decreases, however, good agreement (ρ > 0.9) is still achieved for seven of the eight different evaluated subjects. The performance of multiple subject learning depends on the diversity in the training dataset, as differences in accuracy were found for the different evaluated subjects.

Keywords: machine learning, artificial neural networks, reduced sensor set, inertial motion capture, running, kinetics

#### Edited by:

Kamiar Aminian, École Polytechnique Fédérale de Lausanne (EPFL), Switzerland

#### Reviewed by:

Leonardo Alexandre Peyré-Tartaruga, Federal University of Rio Grande do Sul (UFRGS), Brazil Jean Slawinski, Université Paris Nanterre, France

> \*Correspondence: Frank J. Wouda f.j.wouda@utwente.nl

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 15 November 2017 Accepted: 26 February 2018 Published: 22 March 2018

#### Citation:

Wouda FJ, Giuberti M, Bellusci G, Maartens E, Reenalda J, van Beijnum BJF and Veltink PH (2018) Estimation of Vertical Ground Reaction Forces and Sagittal Knee Kinematics During Running Using Three Inertial Sensors. Front. Physiol. 9:218. doi: 10.3389/fphys.2018.00218

# 1. INTRODUCTION

Running is a very popular form of physical activity, that is often accompanied with a high occurrence of lower extremity injuries (incidence rate varies between 19.4 and 79.3%; van Gent et al., 2007). It is assumed that there is a correlation between the development of these injuries and a runner's technique (Goss et al., 2012). Additionally, improvements in running technique could lead to improved running performance (Kyröläinen et al., 2001; Tartaruga et al., 2012; Folland et al., 2017). Identifying the parameters in running technique that might be associated with injury development and/or running performance improvement requires a biomechanical analysis. This has traditionally been performed inside a gait laboratory using a three-dimensional optical motion capture system and force plates (Novacheck, 1998). The most relevant kinematic and kinetic parameters analyzed are: joint angles (Devita and Skelly, 1992; Edwards et al., 2012) and ground reaction forces (Cavanagh and Lafortune, 1980), as these are important determinants of running technique (Goss et al., 2012). Discrete kinetic parameters that are related to running injuries and/or performance are: loading rate and peak vertical ground reaction forces (Crowell and Davis, 2011; Goss et al., 2012; Schmitz et al., 2014), whereas maximal knee flexion during stance is a relevant discrete kinematic parameter (Edwards et al., 2012). However, a lab setting is not identical to the regular running environment and may therefore result in different kinematics and kinetics (Sinclair et al., 2013). Previous studies have confirmed this, showing significant differences between running on a treadmill and outdoors (Nigg et al., 1995). Furthermore, dissimilarities in running kinematics can also occur as a result of force plate targeting in overground lab running (Challis, 2001). Therefore, a system capable of measuring relevant parameters outside of a laboratory may address these shortcomings.

Kinematic analysis can be performed in an ambulatory setting using inertial measurement units (IMUs) (see for instance, Roetenberg et al., 2013). Reenalda et al. (2016) have used IMUs to measure the effects of fatigue on running mechanics during an actual marathon. However, this approach requires one sensor to be attached on each main body segment along a continuous "kinematic chain," and therefore results in a large number of sensors and extensive subject preparation. Data driven approaches were shown to have potential for reducing the number of sensors in motion capture. Tautges et al. (2011) proposed a method for full-body motion capture by using a limited number of accelerometers; however, their nearest neighbor approach requires a database of prerecorded movements to be available at run-time. Wouda et al. (2016) showed comparable performance with a reduced sensor setup using an artificial neural network (ANN), trained to map five orientations to a full-body pose. ANNs have the advantage to create a "model" for mapping certain inputs to outputs based on the dataset used for training (Alpaydin, 2009). Running applications using a minimal inertial sensor set have mainly focused on temporal outcomes, such as the use of gyroscopes on the feet to estimate temporal running parameters (McGrath et al., 2012). Bailey and Harle (2014, 2015) showed that with foot-mounted IMUs this can be extended to estimate spatiotemporal running parameters.

Ground reaction forces are also relevant outcome parameters for running analysis (e.g., Cavanagh and Lafortune, 1980; Novacheck, 1998; Riley et al., 2008; Caekenberghe et al., 2013; Clark et al., 2014), since abnormal peak and/or loading rate values can lead to impact and overuse injuries, when the stress/frequency combination is above the runner's threshold (Hreljac, 2004; Milner et al., 2007). However, none of the aforementioned approaches provided users with kinetic information. Efforts to move kinetic analyses out of the laboratory setting have proven to be effective for trunk bending (Faber et al., 2016), gait (Karatsidis et al., 2017), dance (Shippen and May, 2012), and running (Pavei et al., 2017). However, aforementioned approaches require full-body kinematic information. The peak vertical ground reaction forces (vGRF) estimation approach of Charry et al. (2013) relied only on tibial accelerations, but was not suitable for estimation of kinetics during the whole stance phase. An approach relying only on trunk accelerations was not sufficient for vGRF estimation using a mass-spring-damper model (Nedergaard et al., 2017).

To the best of our knowledge, there is no system that can provide runners with insights in both their kinematics and kinetics in an outdoor setting. The aim of this study is to assess the validity of a method to estimate knee joint angles and vertical ground reaction forces during running using an ambulatory minimal body-worn sensor setup. An ANN is trained to estimate joint angles based on lower leg orientations relative to the pelvis, similar to the approach presented in previous work (Wouda et al., 2016). Corresponding performance is evaluated using both inertial and optical full-body motion capture data. The estimated joint angles in combination with sensor accelerations can be fed into a second ANN which estimates (vertical) ground reaction forces. The proposed method was evaluated using continuous outcomes (vGRF and knee angle profiles) and discrete outcomes (peak vGRF, loading rate, and maximal knee flexion during stance). The findings of this study could have potential for future applications in prevention of running injuries and improvement of running performance.

### 2. MATERIALS AND METHODS

### 2.1. Measurement Protocol

Eight healthy experienced runners (8 males; age: 25.1 ± 5.2 years; height: 183.7 ± 4.5 cm; weight: 77.7 ± 9.4 kg; body mass index: 23.0 ± 2.5 kg/m<sup>2</sup> ) voluntarily participated in this research. The runners were recruited from a local track and field club and all reported no recent injuries. Subjects were instructed to run at 3 different speeds (10, 12, and 14 km/h, in this order) for 3 min each on an instrumented treadmill, located at the gait laboratory of the Roessingh Research and Development (Enschede, the Netherlands). A warm-up session at a self-selected running speed (of approximately 3 min) was performed by all subjects preceding the measurements. The ethics committee of the Faculty of Electrical Engineering, Mathematics and Computer Science at the University of Twente approved this protocol and all subjects provided written informed consent prior to the measurements.

### 2.2. Measurement Setup

Reference kinematics were recorded with an optical motion capture system using the Plug-in Gait protocol<sup>1</sup> (Nexus 1.8.5, Vicon, Oxford, UK), with 41 retroreflective markers placed directly on the runners' skin, as shown in **Figure 1**. The position of these markers was captured (at 100 Hz) by six high-speed infrared cameras (MX-13, Vicon, Oxford, UK) placed around the treadmill. Any object that could block the camera view or produce undesired reflections was removed from the measurement environment. Additionally, kinematics were synchronously captured using the Xsens MVN Link inertial motion capture system (Xsens, Enschede, the Netherlands), consisting of 17 IMUs placed at both shoulders, upper arms, lower arms, hands, upper legs, lower legs, feet, head, sternum, and pelvis (Roetenberg et al., 2013). The required full-body Lycra suit (for IMU placement) was modified with holes to reduce motion artifacts of the retroreflective markers, which are placed directly on the subject's skin. Full-body kinematics were exported using the accompanying software (MVN studio 4.3.7, Xsens, Enschede, Netherlands) at a selected sampling frequency of 240 Hz. Subjects ran on a S-Mill instrumented treadmill (ForceLink, Culemborg, the Netherlands), with a running area of 250 × 100 cm, which can be seen in **Figure 1**. The treadmill was equipped with a 1-dimensional force plate, able to measure reference vGRF at 1,000 Hz. Data of the different systems were synchronized using an analog synchronization signal.

### 2.3. Data Processing

The different trials were cropped to contain only kinematic and kinetic data of running at a steady speed, i.e., starting and stopping of the treadmill was disregarded. Optical kinematic

<sup>1</sup>https://www.vicon.com/downloads/documentation/plug-in-gait-product-guide

data was processed using Plug-in Gait (Kadaba et al., 1990; Davis et al., 1991). The optical and inertial motion data did not require coordinate systems alignment as the outcome measures were expressed in the joint frame, according to ISB conventions (Wu et al., 2002). The vGRFs were low-pass filtered at 20 Hz using a zero-phase 6th order Butterworth filter, to remove noise artifacts such as vibrations of the treadmill (Sloot et al., 2015), while neither the optical nor inertial motion capture data were filtered. Beside the temporal alignment (achieved with an analog synchronization signal), the data were resampled at 120 Hz using linear interpolation (for the optical data) and downsampling (for the inertial and vGRF data), such that all synchronized data can be used in the proposed machine learning approach. This data resampling does not significantly influence the measured kinematics and kinetics, as was also concluded by Pavei et al. (2017). For analysis, the kinematic and kinetic data were segmented in stance phases using a 20 N threshold (Milner and Paquette, 2015). All data processing and statistical analyses was done in MATLAB R2017a (Mathworks, Inc., Natick, MA, USA).

### 2.4. Learning Approach

The proposed learning approach relies on data from three bodyworn sensors (placed at the pelvis and lower legs), which are fed to a concatenation of two ANNs, as schematically represented in **Figure 2**. The first artificial neural network (ANN1) maps relative (to the pelvis) orientations (in quaternions) of the lower legs to joints angles, whereas the second artificial neural network (ANN2) maps the estimated joint angles in combination with vertical sensor accelerations (in the global frame) to vertical ground reaction forces. This architecture was chosen to allow for independent training of the two ANNs. Additionally, the proposed architecture separates the learning problems allowing for "selective" re-training of the ANNs (for instance, additional running environments can be included in the dataset of ANN<sup>1</sup> without measuring GRFs simultaneously).

TABLE 1 | The training and testing schemes for both the kinematic and kinetic estimations are represented.


Where input to ANN<sup>1</sup> is in all cases the measured relative orientations from the three on-body IMUs (placed at the pelvis and both lower legs), and the output can be from the inertial (IMU) or optical (Plug-In Gait) measurements. This is then input to the kinetic estimation part (ANN2), for which the output is all cases the measured vertical Ground Reaction Forces (vGRF) using the forceplates (FP).

Estimated kinematic outputs were being compared to measured reference kinematics, which were obtained from both inertial or optical motion capture systems. To that end, two training schemes were evaluated, as shown in **Table 1**, to test the proposed method irrespective of the motion capture technology.

Previous studies have achieved varying performance in GRF estimation (Shippen and May, 2012; Charry et al., 2013; Faber et al., 2016; Karatsidis et al., 2017; Nedergaard et al., 2017; Pavei et al., 2017). Therefore, several ANNs were trained using combinations of different input features (joint angles, pelvis, and lower leg vertical accelerations) to select the best set of input features. The selection of these input features is based on their physical relation to the ground reaction forces, where joint angles define the continuous kinematic chain (Faber et al., 2016; Karatsidis et al., 2017) and accelerations are related to force according to Newton's second law of motion.

In accordance with previous work of the authors (Wouda et al., 2016), a two-layer (with 250 and 100 neurons) function fitting neural network architecture was used for both ANNs, capable of mapping non-linearities between input and output. The networks were trained for 2,000 iterations and training was stopped early if the gradient did not decrease for 6 consecutive iterations or if the gradient was smaller than 1×10−<sup>6</sup> . The neural network toolbox of MATLAB R2017a (Mathworks, Inc., Natick, MA, USA) was used to design, train, and evaluate the ANNs described above.

Two different evaluation scenarios were evaluated to show single (section 3.1) and multiple subject (section 3.2) performance:


Scenario 1 would require every new user to perform a training phase. Scenario 2 could potentially produce a more generic model, although the lack of personalization of the network may result in decreased performance.

### 2.5. Outcome Measures

The performance of the proposed method was evaluated by comparing both discrete and continuous outcomes, as commonly done in similar works about biomechanical analysis of running (Cavanagh and Lafortune, 1980; Devita and Skelly, 1992; Crowell and Davis, 2011; Edwards et al., 2012; Schmitz et al., 2014). For the knee flexion/extension (F/E) the similarity between the estimates and reference was calculated using the Pearson's correlation coefficient (ρ) and Root Mean Squared Error (RMSE) (as defined by Ren et al., 2008). The mean ρ over these different strides was calculated using a Fisher transformation to obtain a more representative average Pearson's correlation coefficient (Corey et al., 1998). Additionally, the maximum knee F/E angle during the stance phase was evaluated using a paired t-test (significance level of 0.05) and Bland-Altman plot (Bland and Altman, 1986). Estimated vGRFs (normalized to body weight, BW) were also evaluated using both continuous (ρ and RMSE) and discrete metrics (loading rate and peak vGRF). The kinetic analysis was however limited to the stance phase of each leg (as there is no contact during swing phase). Since the passive vGRF peak is not clearly defined for mid- or forefoot strikers, this event was determined using the peak acceleration from the lower leg IMUs (Willy et al., 2008). Using this event the loading rate was calculated as the slope of vGRF between 20 and 80 percent of the passive vGRF peak time (Willy et al., 2008; Crowell and Davis, 2011).

### 3. RESULTS

Section 3.1 shows performance of the proposed method for training and evaluating on a single subject, where the difference between both sets is the running speed (scenario 1). Section 3.2 is about generalization of this approach over different subjects (scenario 2).

### 3.1. Single Subject Learning

#### 3.1.1. Kinematics Estimation

The accuracy of estimated knee F/E angles based on different references (full-body IMU motion capture system or optical Plug-In Gait output) is presented in **Table 2**. The estimates provided by most individually trained ANNs have excellent agreement (ρ > 0.99) with the reference joint angles. Furthermore, only subject eight shows significant differences in performance between the different references.

Mean (and standard deviation) of the estimated knee F/E angle profiles are shown in **Figure 3** for a representative subject (S03). The largest difference between the estimate and its respective reference can be seen at the largest flexion angle, which is overestimated in all cases. As observed before in **Table 2**, differences between the estimates based on the various references are limited (4◦ on average).

**Table 3** shows the mean (and standard deviation) of the maximal knee F/E angle for each subject. Only inertial results and the corresponding estimates are presented in this table for conciseness. The mean difference in maximal knee flexion angle during stance between the estimate and its reference are <2 ◦ for all subjects, and this result shows no significant differences (p > 0.05). A small bias of 0.4◦ was found with limits of agreement –4.1 to 4.9◦ for the comparison between the estimated maximal knee F/E angle during stance and the corresponding reference. **Figure 4A** shows the related Bland-Altman plot. Occasional outliers (for three of the evaluated subjects) can be observed, which are mostly overestimating the maximal knee F/E angle during stance.

#### 3.1.2. Kinetics Estimation

**Table 4** shows an overview of performance when different combinations of input features (joint angles, pelvis and lower leg accelerations) are evaluated. On average the best results (marked in bold for individual subjects) were achieved using a combination of all vertical accelerations and joint angles as input features. Therefore, results presented below are obtained when ANN<sup>2</sup> was trained using these features.

The estimated ground reaction profiles of a representative subject (S03) are shown in **Figure 5** for ANN<sup>2</sup> based on both reference kinematics (IMUs and Plug-In Gait). Similarly to what was observed for the estimated knee F/E angles, differences between the networks (ANN2) trained on the various references are minimal. Largest differences between the estimated and

TABLE 2 | Accuracy of estimated knee flexion/extension (F/E) angles (using ANN1) with different training outputs (namely: IMU or Plug-in Gait-based), using single subject training and evaluation.


Pearson's correlation coefficient (ρ) is calculated for each stride and averaged over approximately 200 strides for each subject (S01, S02, S03, S04, S05, S06, S07, and S08). The Root Mean Squared Error (RMSE) is calculated similarly over all strides. Training of the artificial neural networks was performed using running data at 10 and 14 km/h, while 12 km/h running data was used for evaluation.

FIGURE 3 | Mean (and standard deviation band) of the flexion/extension knee joint angle (in degrees) estimates are presented (normalized to the stride cycle) compared to their respective references (IMU and Plug-In Gait output). These estimates were obtained from training (using running data at 10 and 14 km/h) and evaluating (using running data at 12 km/h) on a single subject, similar results were obtained for the other subjects. The top row shows the angles of the left side and the bottom row presents the right side. At the top of each graph Pearson's correlation coefficient, root mean square error (RMSE) and the standard deviation (between the brackets) are specified, which were calculated for the estimate compared to its respective reference kinematics.

TABLE 3 | The mean (and standard deviation) of discrete outcome measures for both the estimate and its corresponding reference (based on inertial full-body motion capture data) of all subjects.


These estimates were obtained by training and evaluating on a single subject. Outcomes are averaged over approximately 400 steps (left and right combined). P-values are calculated using a paired t-test with the subject mean values.

reference vGRF can be seen at the beginning of stance phase. However, peak values are estimated with high accuracy, resulting in correlation coefficients larger than 0.96.

Results for the discrete outcomes (peak vGRF and loading rate) can be found in **Table 3**. Mean peak vGRF differences between the estimate and its reference are within 0.09 BW for all subjects, which resulted in no significant differences (p > 0.05). Variation between the estimate and its reference is larger for the loading rate, however this difference is still not significant (p > 0.05). **Figures 4B,C** show the Bland-Altman plots for both the peak vGRF and loading rate. A small bias of 0.01 BW is present in the estimated peak vGRF, with limits of agreement –0.17 to 0.18 BW. The loading rate is estimated with a bias of –2.9 BW/s with limits of agreement –16 to 10 BW/s. Both plots show occasional outliers for multiple subjects.

#### 3.1.3. Variation in Running Speeds

Extrapolation capabilities of the proposed approach were investigated by evaluating different running speeds for subject 3. **Figure 6** shows RMSEs for the evaluated speeds, where the remaining trials are in the training dataset. This figure shows that the most accurate continuous estimation can be achieve when an intermediate speed (12 km/h) is used, rather than the ones which are slower (10 km/h) or faster (14 km/h) than those in their respective training datasets.


TABLE 4 | Accuracy of the estimated vertical ground reaction force (vGRF) using different input features (namely: joint angles (θjoint), pelvis vertical acceleration (aP), all (pelvis, left and right lower leg) vertical accelerations (aP+L ) or a combination of these).

The evaluated set of features is shown above each column. These results were obtained using single subject training and evaluation. Pearson's correlation coefficient (ρ) is calculated for each contact and averaged over approximately 200 stance phases for each subject (S01, S02, S03, S04, S05, S06, S07 and S08). The Root Mean Squared Error (RMSE) is calculated similarly over all contacts, and the standard deviation of the RMSE is shown in brackets. The highest correlations (ρ) and smallest RMSE are shown in bold.

Additionally, discrete outcome measures were evaluated for the same subject, which are presented in **Table 5**. The peak vGRF and maximal knee flexion during stance also show that interpolating speeds results in more accurate outcomes than extrapolating. However, this trend is not present for the loading rate accuracy.

### 3.2. Multiple Subject Learning

The generalization performance of both ANNs were evaluated by training with all different combinations of subjects in the training and evaluation datasets. **Table 6** (top-half) shows the results of kinematics for the different evaluated subjects. Seven out of the eight subject show correlations larger than 0.9, indicating good agreement. However, the RMSE is expectantly larger than for single subject learning (section 3.1). The estimated knee F/E angles for subjects 1 and 3 are significantly less accurate. Additionally, the mean estimated knee F/E angle profiles of subject 4 are shown in **Figure 7**, with the measured references used for comparison. The stance phase (until approximately 30% of the stride cycle) is estimated with higher accuracy than the swing phase, same behavior can be seen for single subject learning (**Figure 3**).

Results of the kinetic estimations can be seen in **Table 6** (bottom-half) . Similar to the joint angles, vGRFs are mostly estimated with correlations larger than 0.9 indicating good agreement with the measurements. However, subjects 1 and 3 show lower correlation coefficients, as was also seen for the kinematics. Vertical ground reaction force profiles of one representative subject (S04) are shown in **Figure 8**, which shows an increase in RMSEs compared to the single subject learning (**Figure 5**). The maximum estimated ground reaction forces are mostly comparable to the reference.

The accuracy of estimating discrete outcome measures is shown in **Table 7**. The estimation accuracy varies between different subjects and outcome measures. However, in most cases an increase in error can be seen when comparing to the single subject training (**Table 3**). Additionally, an increase in the standard deviations of the different estimated outcome measures can be seen. However, the estimated outcome measures and the corresponding references were not found to be significantly different.

### 4. DISCUSSION

This work shows that sagittal knee kinematics and vGRF can be estimated using only three inertial sensors placed on the lower legs and pelvis, in particular, the peak vGRF, maximal knee F/E

FIGURE 5 | Mean (and standard deviation band) of the estimated ground reaction forces (in BW) are presented (normalized to the stance phase) compared to their respective references (IMU and Plug-In Gait joint angle output). These estimates were obtained from training and evaluating on a single subject, similar results were obtained for the other subjects. The top row shows the forces of the left contacts and the bottom row presents the right contacts. At the top of each graph Pearson's correlation coefficient, root mean square error (RMSE) and the standard deviation (between the brackets) are specified, which were calculated for the estimate compared to its respective reference kinematics.

FIGURE 6 | Accuracy of the estimated vertical ground reaction force (vGRF) and knee flexion/extension (F/E) angle for different evaluated speeds, hence the other speeds are part of the training dataset, using single subject training and evaluation, as described in section 2.4. The artificial neural networks were trained with and evaluated relative to a full-body inertial kinematic measurement (Table 1, training scheme 1). The results for a representative subject are shown in this graph. The Root Mean Squared Error (RMSE) is calculated over all stride/stance phases and averaged over approximately 200 strides for each different evaluated speed (10, 12, and 14 km/h).

TABLE 5 | The variation in discrete outcome measures for different speeds in subject 3.


The mean (and standard deviation) of peak vGRF, loading rate and max knee flexion during stance are shown for both the estimate and its corresponding reference (based on inertial full-body motion capture data), these are calculated over approximately 400 steps (left and right combined). The artificial neural networks were trained using running data of two speeds (different from the evaluation speed), while the shown speed was used for evaluation.

TABLE 6 | Accuracy of the estimated knee flexion/extension (F/E) angles (by ANN1) and vertical ground reaction forces (vGRF) (by ANN2) using different training outputs (namely: IMU or Plug-in Gait-based) by training on data of all subjects except for one which is used for the evaluation at 12 km/h.


vGRF accuracy


Pearson's correlation coefficient (ρ) is calculated for each stride and averaged over approximately 200 strides for each different test subject (S01, S02, S03, S04, S05, S06, S07 and S08). The Root Mean Squared Error (RMSE) is calculated similarly over all strides.

FIGURE 7 | Mean (and standard deviation band) of the flexion/extension knee joint angle (in degrees) estimates are presented (normalized to the stride cycle) compared to their respective references (IMU and Plug-In Gait joint angle output). These estimates were obtained from training on multiple subjects and evaluating on a different subject, and were comparable to the other evaluated subjects. The top row shows the angles of the left side and the bottom row presents the right side. At the top of each graph Pearson's correlation coefficient, root mean square error (RMSE) and the standard deviation (between the brackets) are specified, which were calculated for the estimate and its respective reference kinematics.

FIGURE 8 | Mean (and standard deviation band) of the estimated vertical ground reaction forces (in BW) are presented (normalized to the stance phase) compared to the measured reference. These estimates were obtained from training on multiple subjects and evaluating on a different subject, and were comparable to the other evaluated subjects. The top row shows the forces of the left contacts and the bottom row presents the right contacts. At the top of each graph Pearson's correlation coefficient, root mean square error (RMSE) and the standard deviation (between the brackets) are specified, which were calculated for the estimate and its respective reference kinematics.


TABLE 7 | The mean (and standard deviation) of discrete outcome measures for both the estimate and its corresponding reference (based on inertial full-body motion capture data) of all subjects.

These estimates were obtained by training on multiple subjects and evaluating on a different subject (using running data at 12 km/h). Outcomes are averaged over approximately 400 steps (left and right combined). P-values are calculated using a paired t-test with the subject mean values.

angles during stance, and the knee F/E angles and vGRF profiles are estimated with no significant differences with respect to the reference.

Estimation of joint angles for a single subject has shown to be more accurate (average RMSE < 5 ◦ ) than was achieved in previous work of the authors (average RMSE ≈7 ◦ ) (Wouda et al., 2016). This can partly be explained by the difference in composition of the training databases between both methods, since the current dataset had less variation of motions, i.e., only running. This approach requires obtaining reference kinetics and kinematics of each subject, i.e., each subject has to run on an instrumented treadmill.

Additionally, multiple subject learning results showed good agreement (ρ > 0.9) for most subjects in the continuous outcomes. However, the ANNs could not generalize over all idiosyncrasies of the individual subjects as RMSEs and differences in discrete outcomes increased, expectantly. Subjects had different landing patterns (heel, mid, or forefoot striking), which may be a reason for the degraded performance shown for example in subject 1. By including more subjects different models could be trained for each different landing phenotype. Alternatively, larger soft-tissue artifacts of the inertial sensors compared to the other subjects may explain the degraded performance.

No significant differences were found between any of the reference and estimated discrete outcome measures, for both evaluation scenarios. However, the required accuracy would largely be defined by the application of interest. An example of such an application could be tracking kinematic/kinetic changes due to fatigue, since they may relate to increased chance of injury (Reenalda et al., 2016). However, more data (specific for such an application, e.g., running under fatigue) should be acquired to evaluate if the proposed approach can track such differences.

The running mechanics in this work are estimated based on inertial or optical motion capture data. Each of these technologies have their advantages and disadvantages (Field et al., 2011). Differences in the reference knee F/E profiles for the different technologies are observed for the results in section 3.1.1, which can be explained by differences in the underlying models of the human body and their assumptions (Kainz et al., 2016). However, the estimated kinematics based on the different technologies are similar to their respective measured kinematics. This shows that the method has potential to be applied in this context irrespective of the preferred technology for recording training data. Therefore, the proposed method has potential to estimate output based on other kinematic references, such as biomechanical models driven by optical data (Delp et al., 2007; Stief et al., 2013).

The measured dataset contains only treadmill running, however, the proposed method is not limited to be applied under these conditions. Evaluating the proposed method in a different setting (e.g., outdoor running) might result in less accurate estimations of knee F/E angles and vGRFs. To improve such results, the dataset can be extended by including running at different slopes of the treadmill. Furthermore, 3D ground reaction forces could be measured using pressure insoles for example (Rouhani et al., 2010), which enables data collection in any running environment for training data collection. Extrapolating kinematic and kinetic data outside of the training dataset appears to be more difficult than interpolating such data. This was shown by the degraded performance after training with different running speeds or extrapolating over various subjects. This indicates that careful construction of the training dataset is required to obtain the best possible performance.

A limitation of the proposed method is that only vertical kinetics can be estimated. This can be contributed to the available measurement setup, since it would require a treadmill instrumented with a force plate that can measure threedimensional forces. However, our proposed method could be extended using the three-dimensional GRF estimation approach of Karatsidis et al. (2017) using full-body inertial motion capture. Furthermore, only sagittal plane knee kinematics could be estimated in the proposed approach, possibilities of estimating kinematics of other joints and/or planes would require additional research.

The concatenated ANN approach allows for training the ANN<sup>1</sup> (kinematics) independent of the ANN<sup>2</sup> (kinetics). This enables the use of only inertial motion capture data in various environments for training ANN1. Instead of concatenating two ANNs, a single ANN could be trained to map relative orientations and vertical accelerations to ground reaction forces and joint angles. Initial tests show comparable results for single subject training, however, multiple subject training was less successful. When one ANN is trained to estimate both kinematics and kinetics, cross-dependencies between features and outputs become important, which is less so for concatenated ANNs. This can be seen in the differences in accuracy between estimation of kinematics (ANN1) and kinetics (ANN2) for multiple subject training in section 3.2.

**Figure 5** shows differences in the measured reference vGRF between left and right stance phases, which can also be seen from the estimated output. This could indicate that the proposed method is capable of detecting differences between left and right kinetics. Note that, given the relatively short duration of the running sessions, effects of fatigue could not be evaluated using the current setup, but it is an interesting future development.

The estimated vertical ground reaction forces (ρ > 0.99 and RMSE < 0.27 BW) using the proposed method are comparable to that of Faber et al. (2016) (R <sup>2</sup> > 0.981 and RSME < 10 N), who estimated GRFs during a bending task by using a full-body inertial motion capture system. Karatsidis et al. (2017) evaluated a similar approach on walking using inertial sensors, where the errors are comparable to the ones reported in the proposed method. Charry et al. (2013) showed that by exploiting only tibial accelerations to estimate peak vGRFs an approximate RMSE of 6% can be achieved, however this method was only applied to training and testing on individual subjects. Shippen and May (2012) estimated vGRF more accurately (3% error) than the proposed method, by relying on full-body optical motion capture for their method. Pavei et al. (2017) reported similar performance in estimation of the loading rate, while our proposed method was shown to estimate peak vGRFs more accurately. Charry et al. (2013) reported peak vGRF estimation errors of approximately

#### REFERENCES


6%, whereas our proposed method is able to estimate peak vGRF with an accuracy of <0.10 BW (≈3.5%).

### 5. CONCLUSIONS

This work has shown the potential of estimating kinetics (vGRF) and kinematics (knee F/E angles) during running using a minimal on-body sensor setup (namely, three sensor devices placed on the lower legs and pelvis). Best performance can be obtained when the proposed approach is applied to a single subject. Training over multiple subjects was shown to be possible, since good agreement between the estimates and references were achieved, however the RMSEs are larger than for single subject training. In other words, the proposed method has potential to be applied for individual subjects, and with additional research can be extended for running in various environments.

#### AUTHOR CONTRIBUTIONS

The study design was conceptualized by FW, MG, GB, EM, and JR. The data collection was conducted by FW and EM. The data was analyzed by FW under the supervision of all authors. The manuscript was drafted by FW, and all authors contributed significantly to revisions, literature review and the discussion of results. All authors approved the final version and agreed to be accountable for all aspects of this work.

### FUNDING

This research (project No. 13917) is supported by the Dutch Technology Foundation STW, which is part of the Netherlands Organization for Scientific Research (NWO), and which is partly funded by the Ministry of Economic Affairs.

#### ACKNOWLEDGMENTS

The authors would like to thank the Roessingh Research & Development for the availability of the gait laboratory for the measurements. In particular the lab manager, Leendert Schaake, who helped significantly with the measurement setup and optical data processing.

vs. running on an accelerated treadmill. Gait Posture 38, 125–131. doi: 10.1016/j.gaitpost.2012.10.022


**Conflict of Interest Statement:** MG and GB are employed by Xsens Technologies BV.

The other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Wouda, Giuberti, Bellusci, Maartens, Reenalda, van Beijnum and Veltink. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Intra-session and Inter-day Reliability of the Myon 320 Electromyography System During Sub-maximal Contractions

#### Graeme G. Sorbie1,2, Michael J. Williams 1,3, David W. Boyle<sup>1</sup> , Alexander Gray <sup>1</sup> , James Brouner <sup>4</sup> , Neil Gibson<sup>3</sup> , Julien S. Baker <sup>1</sup> , Chris Easton<sup>1</sup> and Ukadike C. Ugbolue1,5 \*

<sup>1</sup> School of Science and Sport, Institute for Clinical Exercise and Health Science, University of the West of Scotland, Hamilton, United Kingdom, <sup>2</sup> Division of Sport and Exercise Sciences, Abertay University, Dundee, United Kingdom, <sup>3</sup> Oriam: Scotland's Sports Performance Centre, Heriot-Watt University, Edinburgh, United Kingdom, <sup>4</sup> School of Life Sciences, Pharmacy, and Chemistry, Kingston University, Kingston upon Thames, United Kingdom, <sup>5</sup> Department of Biomedical Engineering, University of Strathclyde, Glasgow, United Kingdom

#### Edited by:

Kamiar Aminian, Eidgenössische Technische Hochschule Lausanne, Switzerland

#### Reviewed by:

Marco Alessandro Minetto, Università degli Studi di Torino, Italy Vincent Gremeaux, Centre Hospitalier Regional Universitaire De Dijon, France

> \*Correspondence: Ukadike C. Ugbolue u.ugbolue@uws.ac.uk

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 20 October 2017 Accepted: 14 March 2018 Published: 29 March 2018

#### Citation:

Sorbie GG, Williams MJ, Boyle DW, Gray A, Brouner J, Gibson N, Baker JS, Easton C and Ugbolue UC (2018) Intra-session and Inter-day Reliability of the Myon 320 Electromyography System During Sub-maximal Contractions. Front. Physiol. 9:309. doi: 10.3389/fphys.2018.00309 Electromyography systems are widely used within the field of scientific and clinical practices. The reliability of these systems are paramount when conducting research. The reliability of Myon 320 Surface Electromyography System is yet to be determined. This study aims to determine the intra-session and inter-day reliability of the Myon 320 Surface Electromyography System. Muscle activity from fifteen participants was measured at the anterior deltoid muscle during a bilateral front raise exercise, the vastus lateralis muscle during a squat exercise and the extensor carpi radialis brevis (ECRB) muscle during an isometric handgrip task. Intra-session and inter-day reliability was calculated by intraclass correlation coefficient, standard error of measurement and coefficient of variation (CV). The normalized root mean squared (RMS) surface electromyographic signals produced good intra-session and inter-day testing intraclass correlation coefficient values (range: 0.63–0.97) together with low standard error of measurement (range: 1.49–2.32) and CV (range: 95% Confidence Interval = 0.36–12.71) measures for the dynamic-and-isometric contractions. The findings indicate that the Myon 320 Surface Electromyography System produces good to fair reliability when examining intra-session and inter-day reliability. Findings of the study provide evidence of the reliability of electromyography between trials which is essential during clinical testing.

#### Keywords: sEMG, ICC, squat, front raise, handgrip

### INTRODUCTION

Electromyography (EMG) is the study of electrical activity produced by skeletal muscles. EMG analysis has become an important tool in many areas of scientific and clinical research (Norali and Som, 2009). EMG signals can be recorded in many different ways; with electrodes being placed under the skin but over the muscle (subcutaneous EMG), in the muscles between the fibers (intramuscular EMG), or on the skin over the belly of the muscle (surface EMG) (Enoka, 2008). Surface EMG (sEMG) is a non-invasive technique that has been used to analyse muscle activity. The sEMG method has been used to diagnose muscle dysfunction for clinical purposes (Wakeling et al., 2007), provide insight into the neural control of gait (Byrne et al., 2007) and different muscular contraction types (Troiano et al., 2008). It can also be used to determine muscle activation levels when performing athletic actions. The usability of sEMG data however is dependent on the reproducibility of the signal detection both within and between recording sessions (Hashemi Oskouei et al., 2013).

Intra-session sEMG measurements largely show good relative reliability (intraclass correlation coefficient, ICC > 0.80) (Worrell et al., 1998; Dankaerts et al., 2004; Hashemi Oskouei et al., 2013; Jobson et al., 2013; Carius et al., 2015). During intrasession testing, variability of how the skin is prepared and electrode placement are excluded, therefore making the repeated measurements less variable (Carius et al., 2015). Intra-session reliability of the sEMG signal has been previously measured during isometric and dynamic contractions (Larsson et al., 1999; Pincivero et al., 2000; Larivière et al., 2004; Meskers et al., 2004; Hashemi Oskouei et al., 2013). Previous studies that have investigated sub-maximal isometric contractions during intra-session testing generally report good reproducibility of the sEMG signal (ICC > 0.80) (Allison et al., 1993; Larsson et al., 2003; Dankaerts et al., 2004). When investigating dynamic contractions, there are limited studies that compare the reproducibility of the sEMG signal during intra-session testing. The few studies that have investigated the sEMG signal during dynamic contractions report fair (ICC = 0.60–0.79) to good (ICC = 0.80–1.00) reproducibility for EMG amplitude and mean power frequency (Larsson et al., 1999; Dorel et al., 2008). Dorel et al. (2008) reported that no significant differences were found between test and retest for 10 lower limb muscles investigated during a cycling task. Larsson et al. (1999) also reported good levels of reproducibility (ICC > 0.80) during submaximal shoulder flexion movements when recording muscle activity from the deltoid muscle.

Studies examining inter-day reliability often report reduced ICC and increased coefficient of variation (CV) measures (Worrell et al., 1998; Hashemi Oskouei et al., 2013; Jobson et al., 2013). It has been suggested in the literature that skin preparation and electrode placement, even if care is taken to reposition electrodes, is a major influence on inter-day variance (Veiersted, 1991). Jobson et al. (2013) marked participants with henna markings in an attempt to replicate the electrode position for inter-day testing, however, this method still displayed variability within the sEMG signal (CV: 15.8–41.5%). Hashemi Oskouei et al. (2013) also reported poor inter-day reliability when testing various isometric handgrip forces (ICC < 0.60). With regards to inter-session reliability for dynamic movements, the literature is limited and contrasting (Hashemi Oskouei et al., 2013). Larivie et al. (2000) reported acceptable ICC values (range: 0.70–0.88) from the trunk muscles during lateral bending movements. However, Jobson et al. (2013)reported low reliability of the sEMG signal during cycling during inter-day testing (ICC < 0.60).

Literature discussing intra- and inter-session reliability often report ICC as a measure of relative reproducibility or CV as a measure of absolute reliability (Dankaerts et al., 2004; Hashemi Oskouei et al., 2013; Jobson et al., 2013). Standard error of measurement (SEM) is also often reported to quantify the absolute consistency of the measurement (Weir, 2016). Previous studies have conducted experiments using sEMG systems such as Delsys, Noraxon and Bortec (Dankaerts et al., 2004; Mathur et al., 2005; Hashemi Oskouei et al., 2013; Jobson et al., 2013; Carius et al., 2015). These systems are popular amongst researchers due to their proven reliability in peer reviewed research (Mathur et al., 2005; Auchincloss and McLean, 2009; Hashemi Oskouei et al., 2013; Jobson et al., 2013). This study was designed to enable future research to be conducted with the Myon 320 sEMG System. With the Myon AG Company being relatively new to the EMG market, a limited amount of research has been published using this system (Konrad and Tilp, 2014a,b; Rashid et al., 2015). Studies published previously have investigated stretching techniques in addition to engineering and textile related works. While these studies provide insightful information on the efficacy of the Myon 320 sEMG System, there is still a limited amount of biomechanical related research to support the reliability of the Myon 320 sEMG System as a useful tool kit for sEMG assessment. The reliability of the sEMG system that is employed during clinical and research trials is paramount in order to provide reliable and accurate findings in clinical settings, as it can be used to guide diagnosis or therapeutic option.

Therefore the aim of the study was to determine the intrasession and inter-day reliability of the Myon 320 sEMG System and Prophysics Software using dynamic and isometric sub-Maximum Voluntary Contraction (MVC).

### METHODS

Fifteen healthy male participants (Mean ± SD: age 23 ± 3 years, stature 180.8 ± 7.5 cm, mass 80.6 ± 9.6 kg), who were physically active, with no history of knee, hip or shoulder surgery or neuromuscular conditions volunteered for this study. Participants were asked to refrain from physical activity 24 h prior to taking part in the experiment in order to avoid the effects of cumulative muscular fatigue. All participants completed a physical readiness questionnaire and consent form before participating in the study. Ethical approval was granted by the University of the West of Scotland, School of Science and Sport Ethics Committee.

Participants were required to attend the laboratory on two separate occasions. The length between each of the trials was required to be greater than 2 days but no longer than 10 days. At the first visit to the laboratory the participants were familiarized with the environment and the exercises prior to data collection. All visits were performed at the same time of day to minimize the effects of diurnal variation and any variation of the procedure. Experimental data preparation and collection was performed by the same researcher to eliminate researcher variation. The order in which the exercises were performed was randomized for all testing conditions.

The sEMG activity was recorded using surface electrodes (AMBU, Cambridgeshire, UK) and a set of 6 Surface EMG Transmitters (Myon 320, Schwarzenberg, Switzerland). Prior to the sEMG data collection for the dynamic and isometric contractions, the skin was prepared by hair removal from the tested area, as well as skin abrasion and alcohol cleaning. This skin preparation procedure is essential in order to reduce the impedance of the interface between the skin and electrode. Pairs of sEMG electrodes were attached to the skin no more than 2 cm apart (center to center) over the dominant side of the anterior deltoid (AD) and vastus lateralis (VL) and extensor carpi radialis brevis (ECRB) muscles (**Figure 1**). To standardize the placement of the electrodes for the AD muscle, electrodes were placed one finger width distal and anterior to the acromion process, in the direction of the line between the acromion process and the thumb. For the VL muscle, electrodes were placed at two thirds on the line from the anterior superior iliac spine to the lateral side of the patella in the direction of the muscle fibers. These placement positions are in accordance with surface EMG for non-invasive assessment of muscles (SENIAM) guidelines. For the ECRB muscle electrode placement, a line was marked between the lateral epicondyle and the radial styloid process. The ECRB is located in the proximal half of the forearm, just lateral to the line (Basmajian, 1989; Sorbie et al., 2017). In order to ensure repeated sensor replacement between the days of testing, the location of the sensor was marked using a surgical skin demographic marking pen. Participants were instructed not to wash the markings off between the testing days.

For the dynamic contractions, two separate movement patterns were assessed: one for the upper and one for the lower extremity. For the upper extremity, a bilateral front raise, the lifting of an object in front of the body, exercise was performed with sEMG electrodes placed on the right AD muscle. All participants completed the bilateral front raise exercise with a calibrated 10 kg Taishan bumper plate weight (Taishan Sports Industry Group Co., Ltd, Leling, China). To execute the exercise, and standardize procedures, participants were instructed to stand with their feet shoulder width apart, holding the bumper plate with both hands around the waist line. From this position, participants raised the arms up in front of the body until the weight was directly above the head, with only a slight bend in the elbows, which was maintained throughout the movement. The shoulder at this stage of the exercise was required to be between 170 and 190◦ anterior to the body. The weight was then returned to the start position. Three trials of the front raise exercise were performed, with each trial consisting of three repetitions. Each of the three repetitions was performed at a rate of 4 s for the concentric phase and 4 s for the eccentric phase of the exercise, lasting a total of 24 s. This timing sequence was regulated through an interval timer, which enabled participants to move at a constant pace over the three trials, therefore making the movements more reliable. Between each trial, participants rested for 5 min to limit the effect of muscular fatigue. Retroreflective markers were applied to the shoulder and hip area. This enabled the researchers to identify joint angles required to complete the movement.

For the lower extremity, sEMG data was collected from the right VL muscle during the unloaded squat exercise. During the squat, participants were instructed to have their feet shoulder width apart, whilst looking straight ahead. They were then asked to flex their knees between 100◦ and 80◦ , before returning to full knee extension, keeping their back as straight as possible. Three trials of the squat exercise were performed, with each trial consisting of three repetitions. The timing sequence as detailed above for the front raise exercise was implemented for the squat exercise, with the 5 min rest period between trials. Retroreflective markers were applied to the hip, knee and ankle joints to enable the researchers to identify joint angles at the start and end of the exercise.

Isometric contractions were performed via three sub-MVC recordings from the right ECRB forearm muscle during a handgrip strength test. Following electrode placement and signals being verified, participants were seated with their right arm firmly strapped into the previously discussed experimental rig. Grip strength was recorded with a handheld dynamometer (Medical research Ltd digital analyzer, Leeds, UK). Firstly, participants were asked to perform two MVICs in order to normalize the sEMG data. Fifty percent of the greatest MVIC reading for the handheld dynamometer was selected for the three reproducibility trials. Participants had to build up to sub-MVCs in 3 s and then hold it for a further 3 s (Hoozemans and van Dieën, 2005). Participants were permitted to rest for 5 min between each trial to limit the effects of muscular fatigue on the ECRB muscle and surrounding forearm muscles.

The MVICs were recorded for 5 s for each muscle tested and was used as a reference for comparison of muscle activity during the bilateral front raise, squat and handgrip exercises (i.e., percentage of MVIC). Two 5 s MVICs were performed for each of the three muscles tested in the following positions; VL while the back was against the wall with 90◦ of knee flexion, AD while holding a 10 kg weight anterior to the body and shoulder flexed at 90◦ , and ECRB while seated with the right arm firmly strapped into a previously validated rig (unpublished data). In accordance with Hashemi Oskouei et al. (2013), the rig held the elbow at approximately 120◦ during repeated recordings, and kept the posterior side of the forearm stationary. The MVICs were performed prior to the front raise, squat and handgrip exercises on both testing days and controlled with the motion analysis device as described above.

All sEMG data was sampled at 1,000 Hz. During the processing procedures, all sEMG data was digitally filtered (20– 400 Hz) in order to reduce transients and instrumentational noise and root mean squared (RMS) values calculated. For MVIC recordings, the maximum 1 s value across the 2 MVIC recordings for all muscles was identified and selected in order to normalize the bilateral front raise, squat and handgrip exercises. For the dynamic contractions, an RMS time window of 50 ms was employed. For the bilateral front raise exercises, the total duration of the movement was averaged and analyzed for reproducibility between the three trials. The identical procedure was also carried out for analysis of the squat exercise. In order for the researcher to analyse the dynamic exercises, kinematic data was recorded through the Vicon Bonita Motion System (Oxford Metrics Ltd, United Kingdom), sampling at a rate of 250 Hz. For the sub-MVIC handgrip test, an RMS time window of 100 ms was used and the 3 s 50% contraction was averaged to determine reproducibility of the three trials.

A two way random effects model with single and average ICC measures, with a 95% confidence interval, was used to measure the repeatability of the average normalized RMS sEMG signal

during the intra-session testing. Inter-session reliability (ICC 2, 1) was determined by comparing the average normalized RMS sEMG muscle activity for the three trials for each exercise of both testing sessions. ICC, CV and SEM were obtained using the Statistical Package of Social Sciences (SPSS V 22.0). ICC was categorized as follows: good reliability: 0.80–1.00; fair reliability: 0.60–0.79; poor reliability: <0.60 (Sleivert and Wenger, 1994). Atkinson et al. (1999) also suggests a measurement tool is reliable if the ICC is above 0.800 and the CV is below 10%. SEM was used to express absolute reliability of the measure. The CV and the SEM were calculated as follows:

connected to the extensor carpi radialis brevis muscle; and (D) Myon receiver box with transmitters sitting in cradle.

$$CV = \frac{\text{SD}}{\text{Mean}} \times 100\% \qquad \text{SEM} \left(\text{x}\right) = \text{SD}\sqrt{1 - r}$$

Calculation acronyms: Coefficient of variation (CV), Standard deviation (SD), Reliability (r), Standard error of the measurement (SEM).

#### RESULTS

All participants successfully completed the required movements during the dynamic bilateral front raise and squat exercises. During the isometric handgrip task all participants, achieved 50% (±5%) of their MVIC value.

The average normalized RMS sEMG data between participants from the AD muscle over the three sub-MVC trials of the bilateral front raise exercise displayed good withinday reliability [ICC (2, 1) = 0.97] and an acceptable CV of 4.73% (95% CI = 1.35–9.79). The average muscle activation between participants was 66.05% ± 20.15 for the sub-MVC bilateral front raise exercise. SEM between participants was 2.06. Inter-day reliability for the average normalized RMS sEMG for the AD during the bilateral front raise exercise produced good reliability [ICC (2, 5) = 0.94] and an acceptable CV of 3.86% (95% CI = 0.82–7.46). The average muscle activation for inter-day testing between participants was 65.85% ± 18.51 for the sub-MVC front raise exercise. The SEM between participants during inter-day testing was 1.49.

For the squat exercise, the average normalized RMS sEMG data from the VL muscle over the three sub-MVC trials displayed good within-day reliability [ICC (2, 1) = 0.95] and an acceptable CV of 5.73% (95% CI = 1.48–8.94). The average muscle activation during intra-day testing between participants was 67.87% ± 21.25 for the sub-MVC squat exercise. SEM between participants was 2.32. Inter-day reliability for the average normalized RMS sEMG from the squat exercise produced good reliability [ICC (2, 5) = 0.93] and an acceptable CV of 4.77% (95% CI = 1.62–7.52). The average muscle activation for interday testing between participants was 67.10% ± 20.63 for the sub-MVC squat exercise. The SEM between participants during inter-day testing was 1.84.

For the isometric handgrip test the average normalized RMS sEMG data from the ECRB forearm muscle over the three trials displayed good within-day reliability [ICC (2, 1) = 0.87] and an acceptable CV of 5.89% (95% CI = 0.36–12.36). The average muscle activation between participants was 45.98% ± 8.82 for the handgrip test. SEM between participants was 1.57. On the other hand, inter-day relative reliability was fair during single isometric contractions [ICC (2, 5) = 0.63]. CV also increased to 7.18% (95% CI = 3.40–12.71). The average muscle activation for inter-day testing between participants was 45.91% ± 8.09 for the sub-MVIC handgrip test. The SEM between participants during inter-day testing for the isometric contraction was 1.93.

### DISCUSSION

This is the first study to assess the reliability of the Myon 320 sEMG system during low velocity controlled movements, such as those routinely used in rehabilitation. The researchers investigated intra-session and inter-day reliability during submaximal dynamic and isometric contractions while recording sEMG measurements using the Myon 320 sEMG System. The main findings were that the Myon 320 sEMG System displayed good reliability associated with normalized RMS sEMG measures (ICC > 0.80) for intra-session and inter-day testing during dynamic sub-MVC. During 50% MVIC contractions the Myon 320 sEMG System produced good intra-session repeated measures (ICC > 0.80) and fair inter-day measures (ICC 0.60–0.79). The normalized RMS sEMG within the group of participants in the study displayed a strong correlation with the 50% MVIC during the intra (45.98%) and inter-day (45.91%) testing.

The high intra-session ICC for the normalized RMS sEMG signal during the bilateral front raise and squat exercises presented in the current study is consistent with previously published literature (Worrell et al., 1998; Larsson et al., 1999; Jobson et al., 2013). Larsson et al. (1999) reported that reproducibility of the RMS sEMG signal was good and clinically acceptable during dynamic forward flexion exercises when recording muscle activity from the deltoid muscle. Similar to the current study, Worrell et al. (1998) used normalized RMS sEMG and reported good reliability when recording sEMG from the VL muscle during an unweighted lateral step exercise (LSU) (ICC = 0.91). During the LSU the VL muscle had an activation percentage of 63% ± 24 MVIC. These reported reliability and muscle activation results are similar to the current studies results (ICC = 0.95) (68% ± 21 MVIC). However, even with these good ICC reliability measures during dynamic contractions, two participants displayed high variability between the three trials performed on each of the testing days. The researchers suggest these inconsistences are a result of increased perspiration levels from the participants. This increased perspiration caused the AMBU surface electrodes to move or detach leading to artifacts within the sEMG signal. The movement of the surface electrodes was more noticeable during the dynamic contractions than the isometric contractions. These views are supported by Rashid and colleagues who also documented problems with perspiration when testing with the Myon 320 sEMG System (Rashid et al., 2015). In addition, signal artifacts were also displayed within one participant's data set when testing the VL during the squat exercise when the cable connection (length: 13 cm) between the transmitter box and surface electrode came in contact with the participants shorts. This problem was solved by taping the shorts above the VL muscle. The taping in no way restricted the participants' movements during the squat exercise.

When comparing intra-session to inter-day testing for dynamic exercises, the present study reported reduced ICC measures, however, these were still within the suggested range for good reliability (ICC > 0.80). The literature for inter-session reliability is somewhat contrasting to the findings of the current study. Worrell et al. (1998) reported poor ICCs during a dynamic lateral step task. Jobson et al. (2013) results also displayed poor ICC measures during cycling. One explanation for the contrasting results could be the highly standardized range of motion (ROM) of each of the dynamic exercises performed in this study. This could have resulted in more consistent measures. It could also be suggested that the step (Worrell et al., 1998) and cycling (Jobson et al., 2013) reliability tests were performed at a higher velocity than the squat and bilateral front raise tests performed in this study, which could have resulted in the contrasting findings. In addition to this, differences in findings could be attributed to surface electrode placement repeatability on the specified muscles and not the exercises performed within the different protocols.

With regards to isometric contractions, the good ICC (0.87) values for the normalized sEMG RMS data during intrasession testing in the current study is consistent with previously published research (Dankaerts et al., 2004; Hashemi Oskouei et al., 2013). Hashemi Oskouei et al. (2013) reported good intrasession ICC of 0.90 when recording muscle activity from the forearm flexor muscles during gripping tasks. Good within-day reliability (ICC = 0.91) has also been reported during MVIC trunk exercises (Dankaerts et al., 2004).

With regards to inter-session reliability during isometric contractions in this study, it would appear that reapplying the electrodes on a subsequent day reduces the repeatability of the normalized RMS sEMG signal. These findings are in agreement with previous published literature (Hashemi Oskouei et al., 2013) in which the removal and replacement of the surface electrodes to the flexor muscles of the forearm resulted in fair to poor interday reliability of the sEMG signal. A possible explanation for the reduction in ICC results during the isometric contractions within the two studies could be caused by the size and proximity of the flexor and extensor muscles of the forearm (Hägg and Milerad, 1997). The forearm area is comprised of many adjacent small muscles, therefore increasing the possibility of EMG crosstalk. When measuring muscle activity for the ECRB muscle during the current study an inter-electrode distance of 2 cm was selected which is in accordance with previous literature (Hägg and Milerad, 1997; Sorbie et al., 2017), however, a reduced inter-electrode distance should be considered in future reliability research in order to reduce potential cross-talk. The potential for surface electrodes to record signals from multiple extensor forearm muscles is a concern (Gallina and Botter, 2013). These suggestions are supported by Dankaerts et al. (2004) who reported good ICC values for inter-day reliability when testing muscles with a larger belly circumference (trunk muscles) than that of the forearm muscles. It could also be suggested that these contrasting findings could be the result of difficulty in controlling fatigue in the smaller forearm muscles. As a result of these

concerns, isometric contractions from larger muscle groups are preferred when using the Myon 320 sEMG System. In addition to this, the current study is limited with regards to measuring dynamic contractions from the forearm muscles. As a result of this limitation, the reliability of dynamic contractions from forearm muscles when using the Myon 320 sEMG System should be considered in future.

#### CONCLUSION

When using the Myon 320 sEMG System, the present study shows that it is possible to obtain good reliability for normalized RMS sEMG during intra-session and inter-day testing during dynamic sub-MVC, when exercises are performed at low velocities. This study also highlights the fair reproducibility of the normalized RMS sEMG from the extensor muscles of the forearm during a handgrip task during inter-session testing, which is in agreement with previously published literature. Therefore, the

#### REFERENCES


current study demonstrates that the Myon 320 sEMG System is a reliable sEMG measurement tool, for low velocity controlled movements.

#### AUTHOR CONTRIBUTIONS

All authors contributed to the development of this manuscript. UU, GS, and CE were involved in the experimental design. GS, MW, DB, and AG were involved in the data collection. UU, GS, MW, DB, and AG were involved with the data processing and analyses. UU, GS, CE, MW, JB, JSB, NG were involved in the writing and proof reading of the manuscript.

### ACKNOWLEDGMENTS

The authors wish to thank Myon AG for their technical assistance during the early stages of the project. The authors also thank Henry Hunter for assisting with participant recruitment.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Sorbie, Williams, Boyle, Gray, Brouner, Gibson, Baker, Easton and Ugbolue. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Validity of the Catapult ClearSky T6 Local Positioning System for Team Sports Specific Drills, in Indoor Conditions

Live S. Luteberget\*, Matt Spencer and Matthias Gilgien

Department of Physical Performance, Norwegian School of Sport Sciences, Oslo, Norway

Aim: The aim of the present study was to determine the validity of position, distance traveled and instantaneous speed of team sport players as measured by a commercially available local positioning system (LPS) during indoor use. In addition, the study investigated how the placement of the field of play relative to the anchor nodes and walls of the building affected the validity of the system.

#### Edited by:

Billy Sperlich, University of Würzburg, Germany

#### Reviewed by:

Alessandro Moura Zagatto, Universidade Estadual Paulista Júlio de Mesquita Filho (UNESP), Brazil Grant Malcolm Duthie, Australian Catholic University, Australia

> \*Correspondence: Live S. Luteberget

livesteinnes@gmail.com

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 15 November 2017 Accepted: 05 February 2018 Published: 04 April 2018

#### Citation:

Luteberget LS, Spencer M and Gilgien M (2018) Validity of the Catapult ClearSky T6 Local Positioning System for Team Sports Specific Drills, in Indoor Conditions. Front. Physiol. 9:115. doi: 10.3389/fphys.2018.00115 Method: The LPS (Catapult ClearSky T6, Catapult Sports, Australia) and the reference system [Qualisys Oqus, Qualisys AB, Sweden, (infra-red camera system)] were installed around the field of play to capture the athletes' motion. Athletes completed five tasks, all designed to imitate team-sports movements. The same protocol was completed in two sessions, one with an assumed optimal geometrical setup of the LPS (optimal condition), and once with a sub-optimal geometrical setup of the LPS (sub-optimal condition). Raw two-dimensional position data were extracted from both the LPS and the reference system for accuracy assessment. Position, distance and speed were compared.

Results: The mean difference between the LPS and reference system for all position estimations was 0.21 ± 0.13 m (n = 30,166) in the optimal setup, and 1.79 ± 7.61 m (n = 22,799) in the sub-optimal setup. The average difference in distance was below 2% for all tasks in the optimal condition, while it was below 30% in the sub-optimal condition. Instantaneous speed showed the largest differences between the LPS and reference system of all variables, both in the optimal (≥35%) and sub-optimal condition (≥74%). The differences between the LPS and reference system in instantaneous speed were speed dependent, showing increased differences with increasing speed.

Discussion: Measures of position, distance, and average speed from the LPS show low errors, and can be used confidently in time-motion analyses for indoor team sports. The calculation of instantaneous speed from LPS raw data is not valid. To enhance instantaneous speed calculation the application of appropriate filtering techniques to enhance the validity of such data should be investigated. For all measures, the placement of anchor nodes and the field of play relative to the walls of the building influence LPS output to a large degree.

Keywords: kinematics, position, instantaneous speed, accuracy, performance analyses, physical demands

## INTRODUCTION

Analyses of physical demands can improve the understanding of physical performance and injury risk in sports. Such analyses are therefore conducted in many individual and team sports (Bangsbo et al., 2006; Montgomery et al., 2010; Gabbett, 2013; Gilgien et al., 2013; Luteberget and Spencer, 2017). In investigations of physical demands in team sports, the overall workload is often reported as a measure of athletes' total effort. Overall workload is dependent on the intensity and duration of the tasks, and is often reported using parameters such as total distance covered and distance covered in different speed zones. Sometimes high intensity events are also measured, which are characterized by inertia-based measures (Bangsbo et al., 2006; Michalsik et al., 2013; Luteberget and Spencer, 2017). High intensity events are reported using variables such as number of sprints, number of accelerations, or distances covered above a predefined speed threshold (Bangsbo et al., 2006; Michalsik et al., 2013; Luteberget and Spencer, 2017). To measure the parameters that describe these physical demands, Global Navigation Satellite Systems [GNSS; e.g., Global Positioning System (GPS)], inertial measurement units, a combination of the two, or video-based analysis systems are used. In outdoor sports, GNSS is one of the most frequently used methods for kinematic metrics in team sports (Malone et al., 2016). Total distance traveled, speed (e.g., time and distance in different speed zones), and number of sprints are calculated from position data, which can be obtained using GNSS technology, (sometimes integrated with inertial measurement units). The main drawback of GNSS is its restriction to outdoor facilities; therefore, indoor sports cannot use GNSS for tracking of players in competition and training. In indoor sports such as team handball, video-based analysis has been the main method used to analyze position-related variables (Sibila et al., 2004; Chelly et al., 2011; Michalsik et al., 2012, 2013; Póvoas et al., 2012, 2014; Karpan et al., 2015). However, in the past decade local positioning systems (LPSs) have been developed, which complement the role of hand operated and semi-automatic video based analysis systems in team sports (Leser et al., 2011). Most LPSs used in team sports are radiofrequency based (Muthukrishnan, 2009; Frencken et al., 2010; Ogris et al., 2012; Sathyan et al., 2012; Leser et al., 2014; Rhodes et al., 2014; Stevens et al., 2014), in which radio-frequency signals are used to measure the distance between several base stations (anchor nodes) at known locations distributed around the field of play, and mobile nodes worn by the athletes (Muthukrishnan, 2009; Hedley et al., 2010).

To allow meaningful analysis in sports, internal and external validity (Atkinson and Nevill, 2001) of systems used for data collection (e.g., LPS or GNSS) are important. External validity is related to the degree the data acquisition setting reflects the real sport setting. To maximize external validity, data acquisition should be conducted in a real-life sport setting, with minimal obstruction of the execution of the sport. Internal validity relates to the accuracy and repeatability of the measurements, and should be of a quality that allows quantification of small changes of practical importance within and between athlete activity profiles (Jennings et al., 2010). If the validity of a system is not sufficient, the implementation of training or competition results based on the measurement system may cause harm to athletes in terms of prescription of inadequate training, leading to decreased performance and/or increased health risks (Foster, 1998; Gabbett, 2004). In turn, this can result in reduced team performance, thus affecting a team's structure and economic situation. Compared with investigating athletes in a laboratory setting, external validity has been improved to a large degree by systems such as GNSS and LPS, as these facilitate data acquisition in real-life training and competition. However, optimization of external validity can have a negative impact on internal validity (Atkinson and Nevill, 2001). Thus, investigations of the accuracy and repeatability of systems are important in order to be confident about the validity of data.

The accuracy of GNSS has been quantified for use in individual sports (Waegli and Skaloud, 2009; Gilgien et al., 2013, 2014, 2015; Supej and Cuk, 2014; Boffi et al., 2016; Fasel et al., 2016; Specht and Szot, 2016) and for team sports over a wide range of courses and velocities (Coutts and Duffield, 2010; Jennings et al., 2010; Cummins et al., 2013; Johnston et al., 2013, 2014; Scott et al., 2016). However, to our knowledge, only a small number of studies have investigated the accuracy of LPS for team sports (Frencken et al., 2010; Ogris et al., 2012; Sathyan et al., 2012; Leser et al., 2014; Rhodes et al., 2014; Stevens et al., 2014). The accuracy of LPS is mainly dependent on the signal type; environmental conditions, such as obstructions and materials in the surroundings of the field of play; the geometry between signal anchor nodes and the units on the athletes (Muthukrishnan, 2009; Malone et al., 2016); and the signal analysis and parameter calculation process. Indoor venues have been shown to elicit greater errors in LPS compared to outdoor venues, probably as a consequence of an increased multipath propagation compared to outdoor conditions (Sathyan et al., 2012). Thus, validation of a positioning system should be executed in the typical conditions in which it is used. In GNSS, the geometrical setup of the satellites (anchor nodes) is outside the user's control. In LPS, on the other hand, the geometry of the anchor nodes can be altered by the user in the installation process. To our knowledge, no studies have assessed the effect of the anchor node setup and the positioning of the field of play relative to the building's walls (signal multipath problem) on the accuracy of LPS.

In commercial positioning systems, data processing, such as derivation of kinematic metrics from position data, may vary between different LPS and GNSS systems, and even between different software in the same service product (Gilgien et al., 2014; Malone et al., 2016). However, the derivation of metrics is often not elucidated in the manufacturer's documentation, which complicates comparisons between different systems and software (Malone et al., 2016; Specht and Szot, 2016). Currently multiple LPS systems are commercially available, which differ in data acquisition technology, sampling rates and data processing steps; this affects the validity of the data output (Malone et al., 2016; Varley et al., 2017). Thus, the validity of one system does not apply to other systems, and individual validation of each system is required.

The aim of the present study was to (1) determine the validity of position, distance traveled and instantaneous speed of a commercially available LPS (Catapult ClearSky T6, Catapult Sports, Australia) for indoor use; and (2) to investigate how the placement of the field of play relative to the anchor nodes and walls of the building affects the validity of the system. The study investigated these two questions in a typical indoor sport application, comparing the raw data from the LPS with a gold standard reference system (infrared light-based camera system).

### METHOD

In the present study, we investigated the validity of an LPS system for monitoring movements in indoor team-sport athletes. Two male and two female active team handball players [age, 23.0 ± 2.2 years; body mass, 76.6 ± 11.4 kg; height, 172.3 ± 10.1 cm; mean ± standard deviation (SD)] participated in the study. All participants received verbal and written information about the procedures of the study, and gave signed consent to participate in the study. The Norwegian Social Science Data Services approved the study.

### Data Acquisition

The study was conducted in a sports hall measuring 50 × 70 × 11 m, on an indoor surface (Pulastic SP Combi, Gulv og Takteknikk AS, Norway). The participants completed a total of five tasks, all designed to imitate team-sports movements, as shown in **Figure 1**. Task 1: a straight-line sprint and deceleration to a stop. Task 2: two diagonal movements, forward and back to the left and the right, with the paths separated by an angle of ∼75◦ .Task 3: a straight-line sprint, a 90◦ turn, and then deceleration to a stop. Task 4: a zig-zag (angle of turns ≈ 60◦ ) course executed with sideways movements, and a 360◦ turn. Task 5: five continuous laps of the same course as in task 4, without the 360◦ turn. All tasks were commenced from a standing position. Each task was executed 5 times, with the exception of task 1, which was executed 9 times. Thus, a total of 116 trials were captured for each of the test conditions. Participants completed an individually selected warm-up before commencement of the tasks. All tasks were practiced during the warm-up. Participants were instructed to give maximal effort in all tasks. Subjects were tested on two separate days. The same protocol was completed in both sessions, on 1 day with an assumed optimal setup of the LPS (Optimal; **Figure 1**, field B), and on the other day with a sub-optimal setup of the LPS (Sub-optimal; **Figure 1**, field A). In the optimal setup, the LPS was arranged symmetrically, with a larger distance between the nodes and the testing area. In the sub-optimal setup, the LPS was asymmetrical, and the distance between the nodes and the testing area was small (**Figure 2**). This was done to replicate the effect of short distances between LPS anchor nodes and the field of play.

The LPS (Catapult ClearSky T6, Catapult Sports, Australia) and the reference system (Qualisys Oqus, Qualisys AB, Sweden) were installed around the field of play to capture the athletes' motion with both systems. During each trial 16 anchor nodes that were fixed around the handball court (**Figure 2**) collected LPS data, with a reported capturing frequency of 20 Hz. The LPS was set up to cover a field size of 20 × 40 m, the dimensions of an official team handball court. Each participant was instrumented with a lightweight (≈28 g) mobile node (firmware version: 1.40), measuring L: 40 mm × H: 52 mm × D: 14 mm. The mobile node was positioned between the shoulder blades, in the manufacturersupplied vest (Catapult Sports, Australia). At all times during the data acquisition, 14 mobile nodes were turned on to simulate the usual data load on the system. The spatial calibration of the LPS was conducted using a tachymeter (Leica Builder 509 Total Station, Leica Geosystems AG, Switzerland), according to the manufacturer's recommendations preceding the testing sessions. Reference data was collected using eight infra-red cameras mounted on tripods around the testing area (**Figure 2**), using a capture frequency of 100 Hz. The capture volume was 10 × 14 m. A reflective marker, 12 mm in diameter, was mounted on the mobile node's center to obtain a three-dimensional position. The reference system was spatially calibrated according to the manufacturer's recommendations prior to the testing sessions. Infra-red camera systems, such as the reference system in this study, can provide accuracy within a possible error range in a magnitude of millimeters (Chiari et al., 2005; Windolf et al., 2008; Jensenius et al., 2012). The accuracy is dependent on the number of cameras used, capturing volume, technical specifications and settings of system parameters (Windolf et al., 2008; Jensenius et al., 2012). In the current study, the calibration was carried out using a calibration wand, with the exact length of 749.2 mm. The calibration resulted in a 6.14 mm and 6.85 mm SD of the wand length, for optimal and sub-optimal condition, respectively.

### Data Processing

To compare the LPS-based data with the reference system, the coordinate system of the reference system was transformed into the LPS's coordinate system using a Helmert transformation (Sheynin, 1995). The transformation between the coordinate systems was based on four reference points (12 mm reflective markers, positioned 1 m above floor level, in the four corners of the testing area). The positions of the reference points were measured with the reference system in all trials, and with a tachymeter (Leica Builder 509 Total Station, Leica Geosystems AG, Switzerland) in the LPS coordinate system. The Helmert transformation resulted in a mean position residual per calibration point of 2.3 cm for the optimal condition and 0.4 cm for the sub-optimal condition.

Raw position data (X and Y coordinates) was extracted, both from the LPS and from the reference system, using their respective software (LPS: OpenField, Catapult Sports, Australia. Reference system: Qualisys Track Manager, Qualisys AB, Sweden). All data analyses were conducted in MatLab (The MathWorks inc., USA). Due to incomplete LPS raw data (resulting from loss of signal during parts of the trials), 22 (sub-optimal condition) and 1 (optimal condition) trials were excluded from further data analyses. The capture frequency of the LPS system was not constant. The mean capture frequency was calculated to be 17.5 Hz. To overcome the issue of a variable capture frequency, the position data, from both the LPS and reference system, were resampled at the mean capture frequency of the LPS using a second order natural spline function. Trials

including data gaps >1 s were excluded from the analyses. This resulted in the exclusion of 30 (sub-optimal condition) and 12 (optimal condition) trials from analysis. Thus, 64 (55%) trials (sup-optimal condition) and 103 (89%) trials (optimal condition) were available for analysis in this study. LPS and reference system data were time synchronized using cross-correlation of speed data. For that purpose the following steps were undertaken: (1) Position data in the horizontal plane (X and Y coordinates) were differentiated to obtain horizontal plane speed, for both LPS and reference system, using a four-point finite central difference formula (Gilat and Subramaniam, 2011). (2) LPS and reference system data were time synchronized using crosscorrelation (Buck et al., 2002) of horizontal plane speed data. After time synchronization, data was trimmed to reflect only the time athletes were performing the trials, by using a speed threshold of 0.5 m·s −1 (determined from the reference system). Two-dimensional position data at 17.5 Hz were used to calculate distance and speed. Distance traveled per trial was calculated as sum of the Euclidean distance between consecutive points. Speed in the horizontal plane (hereafter called speed) was calculated from position data, using a four-point finite central difference formula (Gilat and Subramaniam, 2011).

#### Method Comparison

The variables of position, distance and speed were compared for each task, using the norm of the differences between the LPS and the reference system. Mean difference, SD, and maximal difference in position were calculated. To express the results for position, the difference for each task from the reference system was assigned to bin limits in a histogram, and expressed as a percentage of the total number of raw data points, thus excluding the effect of duration of the task on the results. For distance, instantaneous and mean speed, the differences were characterized by mean, SD and maximal difference.

#### RESULTS

The mean difference between the LPS and reference system for all position estimations was 0.21 ± 0.13 m (n = 30′ 166) in the optimal setup, and 1.79 ± 7.61 m (n = 22′ 799) in the sub-optimal setup. Task 2 and task 5 showed the lowest mean (<0.20 m) and maximal differences (<1 m) in the optimal setup. In the suboptimal condition, task 3 showed the lowest mean and maximal differences, but all differences in the sub-optimal condition were greater than in the optimal condition. Mean and maximum position differences for all tasks are displayed in **Table 1**. **Figure 3** presents the difference distribution in position in the five tasks, for both the optimal and sub-optimal condition.

With respect to distance, the mean differences between systems were 0.31 ± 0.40 m and 11.42 ± 26.21 m in the optimal and sub-optimal condition, respectively, for all tasks combined. The mean difference was well below 2% in the optimal condition, for all tasks (**Table 2**). Task 5 showed the lowest difference in the optimal condition. In the sub-optimal condition, all tasks showed higher differences, of ≥15% in all tasks. The LPS overestimated the distance compared to the reference system for both the optimal and sub-optimal condition.

Instantaneous speed showed mean differences of ≥33% for both the optimal and sub-optimal condition (**Table 3**). **Figure 4** displays all instantaneous speed measurements and reveals a direct association between speed and mean error. For mean speed, the mean difference was below 3% for all tasks (**Table 4**) in the optimal condition. The sub-optimal condition showed higher values across all tasks (≈15–30%).

#### DISCUSSION

The aim of the current study was to investigate the validity of a commercially available LPS designed to track indoor team sports. The mean difference in position between the LPS and the reference system was below 0.35 m in all tasks in the optimal condition, while in the sub-optimal condition the difference was above 8 m in all tasks. Mean difference in distance was below 2% in the optimal condition, while it was below 30% in the suboptimal condition for all tasks. Instantaneous speed showed the largest differences between the LPS and reference systems of all measures tested, both in the optimal (≥35%) and sub-optimal condition (≥74%). Further, the difference between instantaneous speed measurement in the LPS and the reference system was TABLE 1 | Difference between the LPS and reference system for position, for optimal and sub-optimal condition respectively.


dependent on the reference speed, with a higher speed yielding a higher difference.

The position error of LPS is often investigated with static measurements due to the lack of a reference system that allows instantaneous position comparisons in motion. Static measurements of the validity of LPS have shown an error range of ∼1 to 32 cm (Frencken et al., 2010; Sathyan et al., 2012; Rhodes et al., 2014). This large range can partly be attributed to the different methodological setups and LPS technologies used. The largest error was found in an indoor environment (Rhodes et al., 2014), while the smallest error was found in an outdoor environment (Frencken et al., 2010). Only one previous study reported errors in position using LPS measurements in dynamic tasks, with a mean error of 0.23 m (Ogris et al., 2012). Although the previous reported value was from an outdoor environment, the results showed approximately the same error in position as in the optimal condition in the current study (0.21 m in the current study vs. 0.23 m in Ogris et al., 2012). Position measurements are mainly used for time motion analyses in sports, and thus our results seem acceptable for this purpose. However, for other applications, such as tactical analyses, the lack of information regarding the accuracy level needed makes it difficult to confidently state that the LPS is either acceptable or not. The similarity in error between the outdoor study by (Ogris et al., 2012) and the current indoor study could indicate that measurements in large halls with no obstructions may create measurement conditions that are not much different from outdoor conditions. However, the current study also seems to indicate that small distances to walls and corners of halls, along with the anchor node setup, have a major impact on position accuracy.

Previous studies on LPS in indoor conditions show mean errors ranging from 2.0 to 3.5% (Sathyan et al., 2012; Leser et al., 2014), while studies in outdoor conditions have shown errors ranging from 0.2 to 3.9% (Frencken et al., 2010; Sathyan et al., 2012; Stevens et al., 2014). Presumably, previous studies optimized the setup of the LPS when investigating the accuracy of the systems, resembling the optimal condition in the current study. The results of the current study showed a mean difference in distance from the reference system of between 0.5 and 1.8% in the optimal condition, which is lower than previously reported for indoor conditions. Some previous studies showed an underestimation of distance with LPS systems (Frencken

et al., 2010; Leser et al., 2014; Stevens et al., 2014), while others overestimated distance (Sathyan et al., 2012; Rhodes et al., 2014). The studies that showed an overestimation of distance were conducted indoors, as was the current study, leading to the speculation that indoor conditions may be a contributing factor to the overestimation. However, the differences could also be caused by differences in the filtering techniques applied in different studies (Sathyan et al., 2012). In the current study, no filters were applied to the data, in order to investigate the raw output from the LPS. Further investigations of the effect of filtering techniques on the validity of the current data could be interesting, as filtering techniques can affect the estimated distance and speed (Sathyan et al., 2012; Malone et al., 2016). Distance traveled might be less vulnerable to position error, since no amplification of error through position derivation of position was conducted, as was done with speed. However, error in distance traveled in suboptimal conditions was of a critically large magnitude, and not useful for quantifying the distance covered for training load purposes. Hence, for quantification of distance, only data from the optimal condition can be used with confidence. In addition, it might be reasonable to investigate whether filtering techniques could reduce the error in distance for sub-optimal conditions.

To our knowledge, very few studies have investigated the validity of instantaneous speed measurements in team sports (Varley et al., 2012). However, in match and training analyses, distance data are often categorized into speed zones in order to provide a more comprehensive metric for "intensity distribution" of the athletes external loading (Malone et al., 2016). Such categorization relies on instantaneous speed measurements. It has been previously shown that peak speeds in LPS are less accurate than mean speeds (Ogris et al., 2012; Rhodes et al., 2014; Stevens et al., 2014); however, no previous study has assessed the accuracy of instantaneous speed as determined with an LPS over the whole range of dynamic tasks in team sports. The current study shows that instantaneous speed differed substantially between LPS and the reference system in both TABLE 2 | Difference between the LPS and reference system for distance traveled, for optimal and sub-optimal condition respectively.


TABLE 3 | Difference between the LPS and reference system for instantaneous speed, for optimal and sub-optimal condition respectively.


the optimal and sub-optimal condition (**Table 4**), and that the differences were speed-dependent (**Figure 4**). Our study shows considerably higher errors than those previously shown in a GNSS study (Varley et al., 2012). However, the GNSS-based study investigated straight line running only, which could contribute to these results. In addition, time synchronization and filtering of raw data could play a significant role in error reduction for instantaneous speed (Ogris et al., 2012; Stevens et al., 2014), and the filtering techniques and time synchronization method used in the aforementioned study (Varley et al., 2012) were not disclosed. Mean speed has been investigated in several studies (Frencken et al., 2010; Ogris et al., 2012; Rhodes et al., 2014;


TABLE 4 | Difference between the LPS and reference system for average speed, for optimal and sub-optimal condition respectively.

Stevens et al., 2014), and is often used as an overall indicator of the intensity of an activity. Compared to previous studies, the current study shows similar results (**Table 3**) in terms of mean speed errors (Frencken et al., 2010; Ogris et al., 2012; Rhodes et al., 2014; Stevens et al., 2014), thus, the LPS can give an overall indication of the intensity of the activity.

In the current study, the same measurement system was applied with the same measurement setting, but in two different conditions (optimal and sub-optimal condition). The factors that changed between the two conditions were the anchor node positions relative to the field of play and the distance between the side walls and corners of the hall to the field of play. The current study shows that changes in the placement of anchor node positions relative to the field of play and the distance between the side walls and corners of the hall to the field of play can affect the accuracy of data. Placement of nodes has an effect on the geometry of the anchor nodes relative to each other and the mobile node. In addition to changes in geometry, close proximity of the edge of the field and the walls may cause the mobile nodes to go undetected by multiple anchor nodes, thus producing a higher error rate. Close proximity between the edge of the field and the walls may also increase multipath propagation (Muthukrishnan, 2009), which will reduce the accuracy of data. The current study was not designed to isolate the different contributors (geometry, undetected nodes, and multipath propagation), thus the results of this study show the sum of errors accumulated from all sources. Further investigations are needed to understand the impact of the different contributors and how this could contribute to the optimization of anchor node placement.

#### LIMITATIONS

The method used in this study resulted in a position difference of 2.3 and 0.4 cm between the LPS and reference system, during optimal and sub optimal conditionings respectively. This is sufficient to detect the differences between the systems.

The effect of anchor node placement is especially important in smaller sports halls, when all distances to the walls are small. In the current study, both conditions were tested in a large sports hall, in order to keep variables such as distance to ceiling and material of walls and floors constant. The current results for the sub-optimal setup cannot be assumed to be true for smaller sports halls, since small sport halls will have shorter distances between field of play and the walls on all four sides of the field, while in the current study only two side walls were close to the field of play. In small sports halls we might therefore expect even higher errors than in the sub-optimal condition of the current study. However, the study showed that changing the anchor node positions relative to the field of play and the distance between the side walls and corners of the hall to the field of play does affect the accuracy of the system. To optimize the measurement setup in small sport halls, future investigations should include tilting of nodes in the vertical direction to the field of play, and optimization of the geometry of anchor node positions relative to the field of play. Special attention should be given to multipath minimization to avoid mobile nodes going undetected by multiple anchor nodes close to corners by adjusting the tilting and positioning of nodes close to corners.

In the current study the raw positional data was examined. However, not all systems provide unfiltered raw positioning data for the user. In addition, practitioners will most likely not process data in independent software. Hence, validation of softwarederived metrics is still needed, and should also be undertaken in future for the system investigated in this study. The current study provides insight into the raw positional data and the errors in the acquisition technology, without the possible influence of the manufacturer's software, which is important for researchers who want to process data using independent software. The export of raw positioning data from the systems allows filtering and processing of metrics independent of the manufacturer's software. Using manufacturer-independent software for raw data treatment and metric calculation may not only increase control of the process (Malone et al., 2016), but also avoid inaccuracies when collecting longitudinal data, which will be affected by software updates and other changes in the capture system. In addition, independent processing allows the user to provide details on the data processing in publications to facilitate appropriate interpretations and ease replication by other investigators. The positioning data (granted that it is not subjected to any filtering) is not affected by software updates, and thus could be used as a more stable measure of validity than software-derived metrics. In addition, raw position might be the most unaffected variable and should be used as the primary variable to compare measurements between different positioning systems' acquisition technology.

### CONCLUSIONS AND PRACTICAL APPLICATIONS

The accuracy of LPS output is highly sensitive to relative positioning between field of play and walls/corners and anchor nodes. Measures of position, distance, and mean speed from the LPS can be used confidently in time-motion analyses for indoor team sports, in conditions similar to the optimal condition in this study. In small sport halls or in conditions when walls, and especially the corners of the room are close to the field of play, accuracy is relatively poor and caution is indicated.

The LPS is not valid in calculating instantaneous speed from raw data. Therefore the use of LPS systems for quantifying distance covered at different velocity bands is not recommended. The application of appropriate filtering techniques to enhance the validity of such data should be investigated.

Future studies should assess the relative contribution to total error of (1) signal multipath effects, which occur to a larger extent in close proximity to walls and corners; and (2) by the positioning and orientation of anchor nodes relative to the field of play. The inclusion of a dilution of precision measure would enhance the optimization of anchor node positions.

#### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of Regional Comitees for Medicine and Health Research Etichs with written informed consent from all subjects. All subjects gave written informed consent in

#### REFERENCES


accordance with the Declaration of Helsinki. The studys data storage methods was approved by the Norwegian Social Science Data Service.

### AUTHOR CONTRIBUTIONS

LL, MS and MG conceptualized the study design. LL and MG conducted the data acquisition. LL and MG contributed to the analysis of data, and all authors contributed to the interpretation of the data. LL drafted the manuscript, and all other authors revised it critically. All authors approved the final version and agreed to be accountable for all aspects of this work.

### FUNDING

This study was funden partly by The Norwegian School of Sport Sciences, partly by Catapult Sports, and partly by the Norwegian Olympic Sports Centre. The funding source had no involvement in the study design, the data acquisition, analysis and interpretation of the data, the writing of the manuscript or the decision to submit this article for publication.

#### ACKNOWLEDGMENTS

The authors would like to thank the participants for their time and effort. In addition, the authors would like to thank Vidar Jakobsen, Herman Haernes, Håvard Wiig and Jan Fredrik Baevre for assisting during data acquisition.

ski racing using differential GNSS and inertial sensors. Rem. Sens. 8:671. doi: 10.3390/rs8080671


Gabbett, T. J. (2004). Influence of training and match intensity on injuries in rugby league. J. Sports Sci. 22, 409–417. doi: 10.1080/02640410310001641638


within wheelchair court sports. J. Sports Sci. 32, 1639–1647. doi: 10.1080/ 02640414.2014.910608


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Luteberget, Spencer and Gilgien. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Muscle Performance Investigated With a Novel Smart Compression Garment Based on Pressure Sensor Force Myography and Its Validation Against EMG

#### Aaron Belbasis<sup>1</sup> and Franz Konstantin Fuss<sup>2</sup> \*

<sup>1</sup> School of Engineering, RMIT University, Melbourne, VIC, Australia, <sup>2</sup> Smart Equipment Engineering and Wearable Technology Program, Centre for Design Innovation, Swinburne University of Technology, Melbourne, VIC, Australia

Muscle activity and fatigue performance parameters were obtained and compared between both a smart compression garment and the gold-standard, a surface electromyography (EMG) system during high-speed cycling in seven participants. The smart compression garment, based on force myography (FMG), comprised of integrated pressure sensors that were sandwiched between skin and garment, located on five thigh muscles. The muscle activity was assessed by means of crank cycle diagrams (polar plots) that displayed the muscle activity relative to the crank cycle. The fatigue was assessed by means of the median frequency of the power spectrum of the EMG signal; the fractal dimension (FD) of the EMG signal; and the FD of the pressure signal. The smart compression garment returned performance parameters (muscle activity and fatigue) comparable to the surface EMG. The major differences were that the EMG measured the electrical activity, whereas the pressure sensor measured the mechanical activity. As such, there was a phase shift between electrical and mechanical signals, with the electrical signals preceding the mechanical counterparts in most cases. This is specifically pronounced in high-speed cycling. The fatigue trend over the duration of the cycling exercise was clearly reflected in the fatigue parameters (FDs and median frequency) obtained from pressure and EMG signals. The fatigue parameter of the pressure signal (FD) showed a higher time dependency (R <sup>2</sup> = 0.84) compared to the EMG signal. This reflects that the pressure signal puts more emphasis on the fatigue as a function of time rather than on the origin of fatigue (e.g., peripheral or central fatigue). In light of the high-speed activity results, caution should be exerted when using data obtained from EMG for biomechanical models. In contrast to EMG data, activity data obtained from FMG are considered more appropriate and accurate as an input for biomechanical modeling as they truly reflect the mechanical muscle activity. In summary, the smart compression garment based on FMG is a valid alternative to EMG-garments and provides more accurate results at high-speed activity (avoiding the electro-mechanical delay), as well as clearly measures the progress of muscle fatigue over time.

Keywords: smart compression garment, force myography, pressure sensors, EMG, cycling, crank polar diagram, muscle fatigue, fractal dimension

#### Edited by:

Kamiar Aminian, École Polytechnique Fédérale de Lausanne, Switzerland

#### Reviewed by:

Marco Alessandro Minetto, Università degli Studi di Torino, Italy Giovanni Messina, University of Foggia, Italy

> \*Correspondence: Franz Konstantin Fuss fkfuss@swin.edu.au

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 15 November 2017 Accepted: 04 April 2018 Published: 19 April 2018

#### Citation:

Belbasis A and Fuss FK (2018) Muscle Performance Investigated With a Novel Smart Compression Garment Based on Pressure Sensor Force Myography and Its Validation Against EMG. Front. Physiol. 9:408. doi: 10.3389/fphys.2018.00408

### INTRODUCTION

fphys-09-00408 April 17, 2018 Time: 19:8 # 2

The European Parliament Scientific and Technology Options Assessment Panel . . . identified wearables as one of the ten technologies which will change our lives. Market prospects for wearables are very promising: wearables shipments are forecasted to increase to \$150 billion by 2026 from the estimated level of \$30 billion in 2016 (European Commission, 2016).

Wearable technologies were the most popular and leading fitness trend in 2016 for the first time, and continued to be so in 2017 (Thompson, 2015, 2016). The major drawback of smart wearables, in contrast to non-wearable laboratory equipment, is that their technology is not very accurate yet, mainly due to too many unvalidated products in the market (Düking et al., 2016).

This research deals with smart wearables for muscle performance assessment, the gold standard of which is undeniably electromyography (EMG). There are several problems associated with EMG, clearly pointed out by De Luca (1997) which makes it difficult to use EMG in wearables:


Furthermore, gel-/salt-based electrodes are required to reduce the skin resistance, although special design of embroidered electrodes can overcome this problem (Taelman et al., 2007; Shafti et al., 2017).

In spite of the issues pointed out above, two companies are selling EMG-based garments for performance analysis: Athos (Mad Apparel Inc., Redwood City, CA, United States) and Myontec (Myontec Ltd., Kuopio, Finland). A third company, Leo (GestureLogic Inc., Ottawa, Canada), developed an EMG thigh-sleeve but never sold the product (Early, 2016). B10nix<sup>1</sup> (B10NIX Ltd., Milano, Italy) have announced an EMG-based shirt that is not commercially available yet. Athos<sup>2</sup> , for example, assesses right-left muscle imbalance. Given the fact that precise electrode placement is crucial for accurate results, equal activity levels of muscle groups on the right and left side of the body would generate different signals if the electrode were not placed on the same spot on both right and left muscle groups. To the best knowledge of the authors, there is not a single research paper available on validation of the Athos garments, in contrast to Myontec garments (e.g., Finni et al., 2007).

There are some research papers available that investigate prototypes of EMG-based garments for activity analysis (Taelman et al., 2007; Finni et al., 2007; Ribas Manero et al., 2016; Shafti et al., 2017). Finni et al. (2007) used traditional EMG electrodes incorporated in a garment, whereas Shafti et al. (2017) utilized customized, embroidered electrodes, validated with traditional gel-electrodes. Taelman et al. (2007) investigated the effect of electrode misalignment in a smart shirt, in the same way as Belbasis et al. (2015a) did (cf. Figure 1 of Belbasis et al., 2015a). Ribas Manero et al. (2016), however, did not validate their leggings prototype.

De Luca (1984) was the first to develop the concept of myoelectrical manifestations of localized muscle fatigue (Merletti et al., 1990). Fatigue is expressed in the EMG signal as an increase in EMG amplitude (increase of motor unit recruitment or synchronization by the central nervous system to maintain the required force level, related to central fatigue) and a shift to the lower frequencies of the EMG frequency spectrum (decrease of the conduction velocity of motor unit action potentials over the muscle, related to peripheral fatigue) (Mesin et al., 2009; Crozara et al., 2015).

The Myontec garment measures the muscle fatigue threshold (EMGFT2 according to Crozara et al., 2015), i.e., breakpoint in the linear relationship between EMG amplitude and exercise intensity (Lucia et al., 1999). The muscle fatigue threshold, however, is not suitable for measuring the increasing fatigue over time. Ribas Manero et al., 2016 were the first that attempted to measure fatigue with an EMG garment prototype, by using the instantaneous Average Rectified Value (iARV) signal. However, they did not validate the fatigue data they obtained. For example, although their iARV signal is supposed to increase with fatigue, their initial data at the beginning of the exercise are also very high. Another limitation in this technique is that sweat increases the iARV signal (Ribas Manero et al., 2016).

There are several methods available for the assessment of fatigue with EMG, such as FFT-based, time-based, amplitudebased, and wavelet-analysis-based methods. Details can be found in comprehensive reviews of Cifrek et al. (2009) and Gonzalez-Izal et al. (2012). Both papers mention fractal dimension (FD) methods without going into detail. The most common method for assessment of fatigue (gold-standard method) is FFT-based, and the onset of fatigue is characterized by a shift of the median frequency to smaller frequencies (De Luca, 1997). Basmajian and De Luca (1985) conducted an isometric experiment that shows the difference between mechanical fatigue and metabolic fatigue (measured with EMG and FFT method): the muscle force decreased at the failure point, whereas the preceding fatigue point was only detectable with EMG through the decreasing median frequency (see Figure 8.1 in Basmajian and De Luca, 1985).

The FD methods for assessing muscle fatigue have increased in importance over the last 10 years, with researchers using different methods, such as the box-counting method (Troiano et al., 2008, Beretta-Piccoli et al., 2015; Boccia et al., 2016) to understand the fractal behavior. Marri and Swaminathan (2016) used several methods [e.g., Higuchi (1988), Katz, Sevcik,

<sup>1</sup>http://wise.b10nix.com/

<sup>2</sup>https://www.liveathos.com/athletes

box counting; multifractal analysis]. In most cases, Marri and Swaminathan's (2016) monofractal algorithms delivered smaller FDs for fatigued muscles compared to non-fatigue; while the opposite was true for multifractal algorithms where the FD was mostly smaller than 1. In general, a signal's FD ranges between a value of 1 and 2, i.e., between a straight line or smooth curve, and a maximally noisy signal filling up an area (Fuss, 2013).

Mesin et al. (2009) compared the FD of EMG signals to other muscle fatigue indexes, indicating that EMG FD was least affected by changes in conduction velocity and most related to the level of motor unit synchronization, and suggesting that the FD is an index of central rather than peripheral fatigue.

Furthermore, Mesin et al. (2009) found that in a power-trained subject, FD does not have a clear trend, indicating that the level of motor unit synchronization does not change, whereas the rate of change of the median frequency is high. In an endurancetrained subject, the rate of change of the median frequency is lower than in the power-trained subject, whereas rate of change of FD was high. These results suggest that power-trained athletes are affected more by peripheral fatigue, whereas endurance-trained athletes suffer more from central fatigue. Consequently, EMG-FD seems to be more sensitive in endurance-trained muscles, and EMG-FFT more sensitive in power-trained muscles regarding fatigue.

An alternative method to EMG is mechanomyography (MMG; Islam et al., 2013). In contrast to surface EMG, the quality of the MMG signal is not affected by electrical interference and changes of skin conditions as MMG measures the mechanical action of a muscle. MMG offers two methodological options:

(1) Vibromyography or acoustic-myogram (phono-myography) using accelerometers and/or microphones.

The method assesses the low amplitude sound of lateral oscillations generated by volumetric changes in active muscle fibers at frequencies between 5 and 100 Hz with microphones or low mass accelerometers (Fang et al., 2015). However, the signals are affected by limb movements and ambient noise, such that the method is not suitable for sports applications (Islam et al., 2013).

(2) Pressure sensors used for force myography (FMG). The sensors measure the pressure exerted by the muscles against the skin by volumetric changes of the active muscles (Castellini et al., 2014; Connan et al., 2016). Muscle bulging increases the pressure non-linearly with respect to the increase in muscle force (Belbasis et al., 2015a). The most common sensors used for FMG purposes are offthe-shelf FSR (force sensing resistive) sensors, either as single sensors, several sensors (Connan et al., 2016) or sensor matrix arrays (Zhou et al., 2017), that are preloaded, compressed either by tight fitting garments or by elastic bands to the surface of the relevant muscles (Lukowicz et al., 2006; McLaren et al., 2010; Zhou et al., 2017), Velcro bracelets (Connan et al., 2016), integrated in a textile sleeve (Ogris et al., 2007), equipped with mechanical preload adjustments (Li et al., 2012), or placed inside a forearm orthosis (Wininger et al., 2008). Belbasis and Fuss (2015) and Belbasis et al. (2015a,b) used several customized piezoresistive polymer sensors sandwiched between compression garment and skin. Meyer et al. (2006) applied a capacitance pressure sensor array embedded in textiles. Alternatively, Cheng et al. (2010) did not use any sensors but instead measured the body capacitance and its changes with movement.

The FMG or pressure sensor-based garments are a typical example of lateral innovation, i.e., achieving the same goal with other or alternative means, a common precursor of a disruptive technology. Lateral innovation is characterized by, e.g., lower costs, higher accuracy, better user-friendliness, smaller hardware, simpler solution, simpler implementation, less affected by error and method, better wearability, providing additional information, or improved manufacturability (Fuss, 2017). However, none of these FMG solutions are commercially available yet.

The aim of this paper was to explore an existing prototype of pressure sensor-based garment (Belbasis and Fuss, 2015; Belbasis et al., 2015a,b) for opportunities in performance analysis, specifically muscle activation and fatigue, and to validate the prototype against EMG, used as the gold standard for muscle performance assessment.

The method selected for this task had to comprise of a standardized repeatable activity and a defined fatigue protocol. We used cycling on a stationary power-controlled bicycle as the method of choice. Fatigue was assessed through the Fast Fourier Transform (FFT, gold standard) of the EMG signal, as well as with FD signal processing. For the latter, the Higuchi's (1988) method is considered the gold standard method, however, a new customizable FD method (Fuss' method; Fuss, 2013) was selected that offers advantages over Higuchi's (1988) method.

### MATERIALS AND METHODS

#### Participants

Seven male participants (age: 28 ± 3.6 years; body height: 1.751 ± 0.059 m; body mass: 78.7 ± 7.9 kg) were involved in the experiments. This study was granted Ethics approval by the RMIT University Human Ethics Committee (approval no. ASEHAPP 45-15) and adhered to the Declaration of Helsinki. An informed consent form was filled in by all the participants before the start of the experiment.

All participants were deemed healthy volunteers, passing RMIT University Ethics Committee approval for health requirements to sustain the level of exertion required during the tests. The participants were all of above-average levels of fitness participating in various sports such as running (participant 1 and 5), soccer (2 and 4), and cycling (3, 6, and 7) at least three times a week. The overall cycling skill range was from Amateur (participant 2) through to Semi-elite (participants 3 and 7).

#### Data Collection

A motion capture system (9 Camera – Qualisys Oqus System, Göteborg, Sweden) was utilized to capture the limb segment

angles of the participants, as well as providing tracking for the rotational crank angle of the bicycle (**Figure 1**). The data sampling frequency for motion tracking was set at 100 Hz, where the marker positions are shown in **Figure 1**.

A previously developed smart compression prototype garment (Belbasis and Fuss, 2015; Belbasis et al., 2015a,b) was utilized for the testing of each athlete. The garment provided capability for measuring and mapping changes in the surface pressure above a muscle (**Figure 1**) where the active movement of the muscle under the compression fabric was detected by a distributed network of pressure sensors. The low-pressure sensors were manufactured from two layers of a conductive piezoresistive polymer, with an almost linear calibration curve of the average equation of P = 97282000 σ <sup>1</sup>.<sup>184335</sup> for two layers, where P is the pressure in Pascal, and σ is the conductivity in Siemens (Fuss et al., 2016).

The sensors were positioned over five of the thigh muscles [rectus femoris (RF), vastus medialis (VM) and vastus lateralis (VL), biceps femoris (BF), and semitendinosus (ST)] of the participant's right leg. In addition to the utilization of pressure sensors, a 16-channel wireless EMG system (Wave Plus Wireless EMG, Cometa Systems, Bareggio, Italy) was used for recording the electrical signal (**Figure 1**) of the same muscles. The general placement of the electrodes followed the recommendations of SENIAM [Surface Emg for NonInvasive Assessment of Muscles] (1999) and the optimum placement of the electrodes was achieved by using the method of Belbasis et al. (2015a). To ensure accurate capture of the muscle behavior throughout the tests a data sampling frequency of 2000 Hz was utilized for both the pressure and EMG sensors.

#### Experimental Method

A fatigue-inducing regiment based upon work by Dorel et al. (2009) was developed to quantify the effects of fatigue during cycling. The test protocol deliberately introduced fatigue to the active muscles, allowing for the analysis of muscle activity and performance under two known definitive conditions, namely, a non-fatigued and a fatigued state. To allow for sufficient muscle recovery, participants were asked to follow the following testing procedures over two testing sessions which were separated by at least 4 recovery days.

The tests were performed on the participant's own bicycle mounted on the stationary ergometer (Wahoo Kickr, Wahoo Fitness, United States).

To ensure that muscles were activated during the upstroke of the pedal phase (180–360◦ of the crank cycle) clip-in shoe/pedal combinations or caged pedals were utilized to prevent separation of the foot and pedal.

The test persons performed a cycling exercise at a constant power output equal to 80% of their functional threshold power (FTP) for as long as possible; and maintained a constant pedaling rate (cadence). The test continued until the cyclists were no longer able to maintain their initial test cadence (±5 rpm).

#### Session One: FTP Ramp Test

Each participant was tasked with completing an incremental cycling exercise (Ramp test). This involved the incremental ramp-up of generated power to determine the exercise limitations of the participant. Other than a heart-rate strap, no instrumentation of the participant's body was necessary for this session. All testing begun at a target power output of 120 Watts with increasing workload increments of 20 W/min until the target power could no longer be satisfactorily sustained.

To ensure consistent power output during the test the ERGmode setting of the Wahoo Kickr ergometer was utilized. This setting constantly monitors the generated power and cadence (angular velocity), and enforces a consistent target power output through automatic adjustments to the cycling resistance level (torque) through a magnetic actuator.

To prevent artificially enforcing an earlier end to the test, reasonable changes in both cadence and gearing were permitted by the participant to find their comfort zone to complete the task. The FTP, defined as the last stage that was completed in its entirety, was used to calculate the appropriate workload imposed by the cycle ergometer during the second test session.

FIGURE 1 | Experimental set-up, motion capture, EMG signal, and muscle pressure signals; the latter three subfigures are screen shots of the software; the unit of the EMG signal on the screen shot is mV·10−<sup>2</sup> and unit of the pressure signal on the screen shot is V.

#### Session Two: 80% FTP Fatigue Test

The second session, notably the primary data collection session, involved the complete instrumentation of the participant's right upper leg with EMG, motion capture, and pressure sensor equipment. Participants performed a self-directed warm-up routine consisting of at least 3 min of cycling at a lower power output to the test condition, ensuring sufficient preparation of the participant for the test. Following the warm-up, subjects performed a cycling exercise at a constant power output equating to 80% of their measured FTP for as long as physically maintainable. The ergometer was set at a fixed resistance setting and the participant instructed to maintain the two target parameters displayed to them; the target power output (80% FTP), and a constant cadence freely adopted from the end of the warm-up session. Surface muscle pressure, EMG and angular parameters were recorded continuously throughout the session.

To enforce repetitive muscle activation, participants were asked to maintain a single cycling position, where shifting along the saddle or handlebars was not allowed. The test continued until the cyclists voluntarily chose to stop the exercise (fatigue-induced exhaustion) or until they were no longer able to maintain their initial test cadence (± 5 rpm), which was considered as a failure to maintain the required task (the target power output at a constant cadence).

### Data Analysis and Statistics

The raw data of both pressure and EMG signals were recorded in volts and millivolts, respectively, at a frequency of 2000 Hz, simultaneously and synchronized with the motion capture data utilizing a centralized trigger device. From the pedal marker, the top dead center of the crank (highest marker position) was set to 0◦ with increases in crank angle in the clockwise direction (as viewed from right-hand side of the bicycle).

For the muscle activity analysis, the signal amplitude (of pressure signal and EMG) for ± 1.5 SD (removal of outliers) was assigned to the crank angle. The average amplitude was calculated with a running median filter of a window width of 7.5◦ .

Subsequently, the average crank cycle data were normalized. In order to calculate the average signal of each muscle across all seven participants, the data of all participants were averaged, squared (thereby assigning a greater weight to higher data), and normalized again. The average crank angle of each muscle was determined from that angle that divides the areas under the signal into two equal parts (integration window = 180◦ ). The average crank angle represents the position of the activated muscle on the crank diagram as a single number for comparative purposes.

For the fatigue analysis, the raw signal amplitude was expressed as a time series with a fourth-order Butterworth bandpass filter (10–350 Hz) applied to the EMG data to remove noise. Raw pressure values were utilized with no further filters applied, however, the original sampling frequency was reduced to 80 Hz via postprocessing, due to the smoothness of the pressure signals. Each of the muscle signals were subjected to FFT (EMG only) and fractal dimensional analysis (EMG and pressure signal). De Luca (1997) established that the median frequency of an EMG signal over a set time period shifted toward lower frequencies as a result of increasing muscular fatigue. The negative trend of the median frequency over time provided an understanding of the performance decrease in the muscle under investigation. More specifically to cycling and this research, the analysis builds on the approach taken by Dingwell et al. (2008) by utilizing a Short-Time Fourier Transform (STFT) technique, whereas the calculation of the power spectrum, and the resultant median frequency, is performed over individual time segments attributed to each crank cycle revolution. All calculation was made using the FFT function within MATLAB (The MathWorks, Inc., Natick, MA, United States) and a sliding average window of 1 min width to define the averaged trend of the data.

The FD of EMG and pressure signals was calculated with the method developed by Fuss (2013). This method allows for maximal separation of two conditions (e.g., fresh and fatigued muscle states) by means of adjusting and optimizing the signal amplitude multiplier. If this multiplier is set to high values (infinity in theory), then Fuss' method is identical to Higuchi's (1988) method. In order to identify the optimal amplitude multiplier, the EMG and pressure signal's FDs were calculated for the second (fresh muscle) and second last (fatigued muscle) full minute of the tests at different multipliers. The differential of the FDs of fatigued and fresh states (**Figure 2**) was plotted against the decadic logarithm of the multiplier (Fuss, 2013) and the optimal multiplier was identified at the maximum differential. This amplitude multiplier was then used to calculate the FDs continuously through the signals with a running window width of 1 min.

Both median frequency data and FDs data were normalized. For comparing the fatigue development across all participants, the time was normalized as well (due to different experiment durations; cf. **Table 2**). The median frequency data and FDs data were linearly correlated to the normalized time to asses**s** the percentage of the time dependence by means of R 2 . The R 2 -values were compared as to their significant difference with Fisher's Z-test for comparing correlations from independent samples.

# RESULTS

### Power Data

The primary objective of the first testing session was to determine each participant's achievable FTP wattage level, allowing for the normalized testing FTP target during the second test. Outputs from the cycling trainer pertaining to the participant's performance data were collected and are shown in **Table 1**. Application of the ramp test specifically assessed an individual's ability to increasingly deliver higher power output over time, as such we expect a distribution in the resultant efforts throughout the sample group because of differences in physical ability and familiarization with the task. Due to the similarity in skill set and fitness between the participants, five participants fell within the bounds of one SD from the mean of the duration and achieved FTP level. The other two participants, namely, the

FIGURE 2 | Fractal dimension (FD) optimization procedure (Fuss, 2013); (Left) EMG; (Right) pressure; top row: raw data and data segments used for calculating the FD differential of fresh (blue) and fatigued (red) muscle; bottom row: FDs and FD differential against multiplier of signal amplitude; blue curve: FD of fresh muscle; dashed red curve: FD of fatigued muscle; bold orange curve: FD differential (FD of fatigued muscle – FD of fresh muscle); the optimal multiplier of signal amplitude is found at the maximum (peak) of the bold orange curve.


TABLE 1 | Session one activity summary.

FTP = functional threshold power, i.e., the maximal power achieved in the incremental ramp-up of generated power (stepwise increase of power starting at 120 W).

least experienced cyclist (participant 2; **Table 1**), and the most experienced (participant 3) were within two SDs.

Following the determination of the participant's FTP level, individual 80% FTP calculations were made for each participant and utilized for the second session to ascertain fatigue behavior. This inclusion of the additional biomechanical measurement systems (Pressure, EMG, and MOCAP) within the second test session allowed for greater insight into the onset and continued fatigue behavior of the muscles in the lower limbs.

A summary of key test data relating to each test is shown in **Table 2**. Accuracy of achieving the target of 80% FTP loading required was met within a satisfactory range (5%) for each participant with the mean accuracy within 1% of the grouped aim.

A noticeable deviation in the results was the duration of the test for participant 2 (least experienced). While all other participants concluded the test within one SD of the test mean (9:33 min of exercise), the fatigue tolerance for participant 2 forced an end to the test after only 3:28 min. This result aligns with the experience level of the participant in comparison with that of the other participants, where duration of the test is largely driven on the physiological and psychological conditioned nature of the muscle and participant to operate under increasing fatigue-limiting conditions. The experience level also correlated with the mean power and torque (**Table 2**) such that the least (participant 2) and most experienced (participants 3 and 7) participants exhibited the lowest and highest values, respectively.

#### Muscle Activation

Through the motion capture of the pedal stroke movement, the muscle activity was resolved to the corresponding angle of the crank where each individual muscle was utilized, shown on polar diagrams.

The polar diagrams of three representative participants are shown in **Figure 3**. The EMG graphs of the extensors (RF, VM, and VL) exhibited overlapping activity in the same sector of the diagram, with individual differences: in **Figure 3** (top and bottom rows) at 330–360◦ , whereas in **Figure 3** (middle row) at 30◦ . The pressure-based activity deviated from the EMG-based activity in general by a clock-wise phase shift. For example, in

#### TABLE 2 | Session two activity summary.

fphys-09-00408 April 17, 2018 Time: 19:8 # 7


Target power = 80% of FTP shown in Table 1; target accuracy = target power/mean power × 100.

**Figure 3** (bottom row), the extensors still overlap, although not that perfectly as in the EMG plot, but the peak activities are shifted by 30–60◦ clockwise. In **Figure 3** (middle row), RF shows pressure and EMG activity in the same sector, whereas for VM, the pressure signal is shifted counter clockwise by approximately 30◦ with respect to the EMG signal, and VL is shifted clockwise by more than 60◦ . In **Figure 3** (top row), RF and VM are shifted clockwise by 30◦ and 70◦ , respectively, and VL by almost 180◦ .

Comparing the three pressure plots, the activity of RF ranges from 20 to 30◦ , VM from −10 to 50◦ , and VL from 40 to 150◦ .

The flexor muscles (BF and ST) showed less consistent EMG activation patterns than the extensors: ST at 90◦ , 90◦ , and 180◦ ; and BF at 100◦ , 110◦ , and 340◦ . The pressure activation patterns are, in general, shifted clockwise as already seen in the extensor muscles, namely the BF by 70◦ , 70,◦ and 200◦ ; and the ST by −30◦ , 70◦ , and 150◦ .

Comparing the three pressure plots, the peak activity of BF occurs around 170–180◦ , whereas the one of ST ranges from 150 to 240◦ .

**Figure 3** (top row) shows a co-contraction of the three extensors and the BF on the EMG plot, whereas the pressure plot confines the co-contraction to VL and BF. The same is true for both the hamstrings and the VL on the pressure plot [**Figure 3** (middle row)], whereas the EMG plot appears to be free of co-contractions. The latter is true for both pressure and EMG plots in **Figure 3** (bottom row).

**Figure 4** shows the average muscle activation patterns of all seven participants combined, thereby highlighting the sectors used by most participants.

In general, while the muscle activities, measured with EMG or pressure, are relatively consistent across athletes, they do not coincide when the two different methods are compared directly (**Figures 3**, **4**).

The average angles of the EMG signal are: RF – 8◦ , VM – 24◦ , VL – 23◦ , BF – 110◦ , and ST – 122◦ ; and of the pressure signal of the five muscles are: RF – 24◦ (phase shift +16◦ ), VM – 8◦ (phase shift −16◦ ), VL – 124◦ (phase shift +101◦ ), BF – 143◦ (phase shift +33◦ ), and ST – 156◦ (phase shift +34◦ );

The pressure plots of all but one muscle are characterized by a clockwise phase shift with respect to the EMG plots of 16–101◦ . Only VM is shifted counter-clockwise by 16◦ . This phase shift phenomenon is attributed to the electromechanical delay of the muscle signal, which will be explained in detail in the section "Discussion."

#### Muscle Fatigue

Assessment of the fatigue performance over the entirety of the second test was made through two different measurement methods and two different algorithms resulting in the need to compare by correlation three different fatigue signals, namely, the FFT (FFT median frequency; FFT-EMG) and the FD (FD-EMG and FD-Pressure).

In general, when considering the overall behavior of each participant (**Figure 5**), the overall fatigue trend is clearly seen in all signals, with increasing (fractals) and decreasing (FFT) trends.

The normalized pressure fractals correlate with the normalized cycling time in 84% of the results (R <sup>2</sup> = 0.8405, linear fit; 84% of the fatigue level is explained from the time progression of the exercise). The normalized EMG fractals and median frequencies correlate with the normalized cycling time in 51% (R <sup>2</sup> = 0.5081) and 71% (R <sup>2</sup> = 0.7092) of the results respectively. All three R 2 values are significantly different (p = 0).

The R 2 -value expresses merely that, for the FFT method, 71% of the fatigue level are time-dependent whereas 29% are not time-dependent. Time-independent fatigue would be if a fatigue level or the average fatigue was kept relatively constant over a longer time. Furthermore, the different performance levels of the subjects could also contribute to the time-independent fatigue; for example, more experienced athletes are more skilled in fatigue management over time. FD-EMG reflects more time independent fatigue (49%) compared to FD-pressure (16%), i.e., approximately three times as much. This phenomenon will be discussed in more detail in the section "Discussion."

#### DISCUSSION

The purpose of this study was to explore the applicability of a smart compression garment based on FMG with pressure sensors (Belbasis and Fuss, 2015; Belbasis et al., 2015a,b), measuring muscle contraction, for assessment of muscle activity and fatigue, as an alternative to EMG.

First of all, it is worth noting, that while the maturity of the pressure monitoring technique is still in development, the majority of common experimental issues (such as restarts, corrupted data, time-consuming instrumentation of athletes) was attributed to the installation and attachment of the EMG equipment and electrodes, and the motion capture markers. The simplicity and robustness of the wearable smart compression garment system limited the possibility of experimental failures.

The first question to address is whether muscle activity can be assessed and measured with the smart compression garment. The signals obtained, related to the contraction pattern when cycling, were highly comparable and consistent on the polar diagrams, with some individual differences between participants.

The second objective of this study was to validate the muscle activity pattern obtained from the smart compression garment with a gold standard, i.e., a laboratory-based EMG system. However, the muscle activation patterns obtained from EMG and the smart compression garment were, to some extent, not comparable (**Figures 3**, **4**). The reason for this is not the inferiority of the smart compression garment, which could be easily deduced from the data, but rather the choice of the gold standard. Undoubtedly, EMG is the (even if the only) gold standard for assessment of muscle activity and fatigue. Yet, EMG measures the electrical activity of the muscle, whereas the smart compression garment detects the mechanical activity, i.e., muscle bulging that compresses the pressure sensors between skin and garment. The difference between EMG and pressuresensor polar plots simply reflects the difference between electrical and mechanical activity. The electro-mechanical delay (De Luca, 1997) of the contraction force with respect to the electrical stimulation of a muscle is explained from the time difference between onset of electrical activity and the increasing muscle force. This delay is also dependent on muscle fiber distribution, i.e., the percentage of fast- and slow-twitch fibers. For example, to reach a contraction level of 50% of the maximal muscle force, it takes a fast- and slow-twitch fiber approximately 0.15 and 0.25 s, respectively (De Luca, 1997). When cycling at a cadence of 73 rpm (average cadence from **Table 2**), these two delay times would cause, in theory, a phase shift of 66◦ and 110◦ on the polar diagram. The differences seen in the EMG and pressure sensor polar diagrams are therefore expected. According to EMG data of Jorge and Hull (1986) and Hug et al. (2010), the quadriceps is active from 300 to 130◦ and from 235 to 162◦ , respectively, and the hamstrings from 15 to 255◦ and from 324 to 288◦ , respectively (maximal ranges). The data seen in **Figure 4** perfectly fit into these ranges, which the exception of the VL, which exceeds 130◦ . Jorge and Hull (1986) also reference other papers, the results of which show considerable differences and fluctuations, suggesting that there is considerable variety of EMG results.

Nevertheless, EMG is still a gold standard for validating the smart garment, as there is no other system available. The gold standard therefore serves primarily for understanding the differences between the data, and the underlying principles of the different measurement systems. Validation is still possible, if differences are known in the first place or at least expected, and subsequently confirmed through a validation study. This issue poses a new challenge for wearable technology not experienced before, specifically when dealing with lateral innovation (Fuss, 2017). Finding a suitable gold standard could then become a problem.

The third objective of this study was to assess whether muscle fatigue can be measured from the pressure signals. The evaluation was based on the calculation of the FDs of pressure and EMG signals. For calculating these FDs, Fuss' method was used as it maximally separates the FDs of a normal and an abnormal signal, by finding the maximum differential of FD-abnormal − FD-normal, when subjecting both signals to the same amplitude multiplier. Normal and abnormal signals could be physiological/pathological ones, less/more chaotic ones, signals from fresh and fatigued states, low/high activity signals, etc. From common sense, the abnormal signal is expected to have a higher FD. Common sense is confirmed if there is a maximum

differential, and the two asymptotic fractal differentials at multipliers of close to 0 or to infinity (**Figure 2**) are smaller than the maximum. It has been seen on numerous occasions, that Higuchi's (1988) method, corresponding to Fuss' method with an infinite multiplier, returns higher FD for normal signals (Fuss, 2013, 2016), compared to abnormal ones. This problem is seen in **Figure 2** as well, more pronounced in the EMG FD data, though. This behavior is not unexpected in the EMG signal, as the decreased amplitude of high frequencies in the power spectrum (typically seen in fatigued muscles) leads to a decrease of FD.

The increase in EMG amplitude, also typical for fatigue, increases the FD. If the cadence drops, so does the FD. Even if there are multiple influences that affect the FD, it would be more logical to assume that the FD of a fatigued muscle's signal is smaller than the one of a fresh muscle, if the principle of left-shift of the median frequency is known.

Irrespective of logical assumptions, all three methods applied, FD-Pressure, FD-EMG, and FFT-EMG, showed the same clear trend, namely, that fatigue increases with time, with some individual differences between participants.

The fourth objective of this study was to validate the muscle fatigue trend obtained from the smart compression garment with a gold standard, i.e., a laboratory-based EMG system. The same gold-standard problem as seen in the muscle activation patterns is also applicable to fatigue to some extent. When comparing FD-Pressure and FD-EMG to FFT-EMG, all three variables correlated to the normalized time of the experiments, FD-pressure showed highest time dependent correlation (84%), and FD-EMG the highest time-independent component (41%). These differences come from the fact that FD-Pressure is more related to mechanical fatigue, whereas FD-EMG and FFT-EMG are related to central and peripheral fatigue, respectively.

There is indication (Mesin et al., 2009) that shift of the median frequency of the EMG signal is related to peripheral muscle fatigue (decrease in conduction velocity) whereas the FD of the EMG signal is related to central fatigue (increase in motor unit synchronization). This seems illogical at first sight, as the higher the amplitude of higher frequencies is, the greater is the FD, and therefore any reduction of median frequencies is coupled to a smaller FD. This principle can be easily verified when using synthetic fractal signals, such as Knopp/Takagi function, Weierstrass cosine and Weierstrass-Mandelbrot functions, and stochastic Brownian Motion function (Fuss, 2013). However, EMG data are not based on functions that generate signals with predefined FDs. As such, low median frequencies and small FD do not necessarily exhibit a parallel trend. This possibility is also affected by the method used for calculating FDs.

Furthermore, there is indication that a power-trained subject was more affected by peripheral fatigue whereas an endurancetrained subject was more prone to central fatigue (Mesin et al., 2009). It is therefore expected that the correlation of fatigue parameters that measure different components of fatigue is not necessarily high. This correlation is not just affected by the fatigue component, but also by the distribution of training type across the participants of a study. For example, participant 3 is a long-distance cyclist and therefore endurance-trained, whereas participant 4 is a soccer player and thus power-trained.

If FD-EMG and FFT-EMG are related to central and peripheral fatigue, respectively, then FD-pressure could be

#### REFERENCES

Basmajian, J. V., and De Luca, C. J. (1985). Muscles Alive: Their Functions Revealed by Electromyography, 5th Edn. Baltimore, MD: Williams & Wilkins.

Belbasis, A., and Fuss, F. K. (2015). Development of next-generation compression apparel. Procedia Technol. 20, 85–90. doi: 10.1016/j.protcy.2015.07.015

related to mechanical fatigue. Mechanical fatigue is actually defined as the failure of the muscle system, i.e., that the force level cannot be maintained anymore (Basmajian and De Luca, 1985). Nevertheless, metabolic fatigue (measured with EMG) becomes apparent even before system failure (Basmajian and De Luca, 1985). As such, the term mechanical fatigue is probably not appropriate, and should be replaced by mechanical pre-fatigue.

#### CONCLUSION

The smart compression garment based on FMG with pressure sensors returned performance parameters (muscle activity and fatigue) comparable to the surface EMG, used as gold standard for validation. The major differences were that the EMG measured the electrical activity whereas the pressure sensor measured the mechanical activity. As such, there was a phase shift between electrical and mechanical signals, with the electrical ones preceding the mechanical ones in most cases. This is specifically important in high-speed cycling, the activity investigated in this study. Using the activity sectors on the polar diagrams, obtained from EMG, for biomechanical models, could result in incorrect outcomes, compared to using the activity data obtained from FMG. The latter are considered more appropriate as input for biomechanical modeling.

In terms of fatigue, apart from individual differences between the participants, the fatigue trend over the duration of the cycling exercise was clearly reflected in the fatigue parameters (FDs and median frequency) obtained from pressure and EMG signals. The fatigue parameter of the pressure signal (FD) showed a higher time dependency (R <sup>2</sup> = 0.84) compared to the EMG signal. This reflects that the pressure signal puts more emphasis on the fatigue as a function of time rather than on the origin of fatigue (peripheral or central).

#### AUTHOR CONTRIBUTIONS

AB and FF contributed equally to the design of the study, execution of the experiment, data analyses, and writing and editing the manuscript.

#### ACKNOWLEDGMENTS

The authors would like to thank the participants who volunteered in the cycling experiments. They also would like to thank the two reviewers for their valuable comments and suggestions to improve the quality of the paper.

Belbasis, A., Fuss, F. K., and Sidhu, J. (2015a). Muscle activity analysis with a smart compression garment. Procedia Eng. 112, 163–168. doi: 10.1016/j.proeng.2015. 07.193

Belbasis, A., Fuss, F. K., and Sidhu, J. (2015b). Estimation of cruciate ligament forces via smart compression garments. Procedia Eng. 112, 169–174. doi: 10. 1016/j.proeng.2015.07.194


in Proceedings of the 29th Annual International Conference of the IEEE EMBS Cité Internationale, Lyon, 3966–3969. doi: 10.1109/IEMBS.2007.435 3202


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Belbasis and Fuss. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Heart Rate Monitoring in Team Sports—A Conceptual Framework for Contextualizing Heart Rate Measures for Training and Recovery Prescription

Christoph Schneider <sup>1</sup> \*, Florian Hanakam<sup>1</sup> , Thimo Wiewelhove<sup>1</sup> , Alexander Döweling<sup>1</sup> , Michael Kellmann1,2, Tim Meyer <sup>3</sup> , Mark Pfeiffer <sup>4</sup> and Alexander Ferrauti <sup>1</sup>

<sup>1</sup> Faculty of Sport Science, Ruhr-University Bochum, Bochum, Germany, <sup>2</sup> School of Human Movement and Nutrition Sciences, The University of Queensland, St. Lucia, QLD, Australia, <sup>3</sup> Institute of Sports and Preventive Medicine, Saarland University, Saarbrücken, Germany, <sup>4</sup> Institute of Sport Science, Johannes-Gutenberg University, Mainz, Germany

#### Edited by:

H.-C. Holmberg, Mid Sweden University, Sweden

#### Reviewed by:

Ferdinando Iellamo, Università degli Studi di Roma Tor Vergata, Italy Giovanni Messina, University of Foggia, Italy

\*Correspondence: Christoph Schneider christoph.schneider-a5c@rub.de

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 05 March 2018 Accepted: 11 May 2018 Published: 31 May 2018

#### Citation:

Schneider C, Hanakam F, Wiewelhove T, Döweling A, Kellmann M, Meyer T, Pfeiffer M and Ferrauti A (2018) Heart Rate Monitoring in Team Sports—A Conceptual Framework for Contextualizing Heart Rate Measures for Training and Recovery Prescription. Front. Physiol. 9:639. doi: 10.3389/fphys.2018.00639 A comprehensive monitoring of fitness, fatigue, and performance is crucial for understanding an athlete's individual responses to training to optimize the scheduling of training and recovery strategies. Resting and exercise-related heart rate measures have received growing interest in recent decades and are considered potentially useful within multivariate response monitoring, as they provide non-invasive and time-efficient insights into the status of the autonomic nervous system (ANS) and aerobic fitness. In team sports, the practical implementation of athlete monitoring systems poses a particular challenge due to the complex and multidimensional structure of game demands and player and team performance, as well as logistic reasons, such as the typically large number of players and busy training and competition schedules. In this regard, exercise-related heart rate measures are likely the most applicable markers, as they can be routinely assessed during warm-ups using short (3–5 min) submaximal exercise protocols for an entire squad with common chest strap-based team monitoring devices. However, a comprehensive and meaningful monitoring of the training process requires the accurate separation of various types of responses, such as strain, recovery, and adaptation, which may all affect heart rate measures. Therefore, additional information on the training context (such as the training phase, training load, and intensity distribution) combined with multivariate analysis, which includes markers of (perceived) wellness and fatigue, should be considered when interpreting changes in heart rate indices. The aim of this article is to outline current limitations of heart rate monitoring, discuss methodological considerations of univariate and multivariate approaches, illustrate the influence of different analytical concepts on assessing meaningful changes in heart rate responses, and provide case examples for contextualizing heart rate measures using simple heuristics. To overcome current knowledge deficits and methodological inconsistencies, future investigations should systematically evaluate the validity and usefulness of the various approaches available to guide and improve the implementation of decision-support systems in (team) sports practice.

Keywords: player monitoring, cardiac autonomic nervous system, individual response, smallest worthwhile change, multivariate analysis, decision-making

## INTRODUCTION

Successful training and recovery management aims at optimizing adaptation and overall preparedness for enhanced competitive performance (Buchheit, 2014; Cardinale and Varley, 2017; Coutts et al., 2018; Kellmann et al., 2018). Monitoring the training dose and athletes' responses (e.g., fitness, fatigue, performance, and wellness) is crucial in making informed decisions on training and recovery prescriptions (Halson, 2014; Bourdon et al., 2017; McGuigan, 2017; Coutts et al., 2018; Kellmann et al., 2018). Current technological developments in the field of wearable sensors enable steady improvement in the quantification of internal- and external-load indicators during athletic activity and expand the variety of tools available to measure training responses (Cardinale and Varley, 2017). Ideally, a comprehensive monitoring system includes markers for all relevant physiological and psychological aspects of training and performance, combining them into a holistic approach (Heidari et al., 2018). Nevertheless, the handling of collected data poses a great challenge for researchers and practitioners, and available analytical strategies have rarely been systematically investigated (Thorpe et al., 2017). In this context, it is necessary to clarify how the individual longitudinal data can be analyzed on the one hand, and in which form the various parameters should be linked to one another, on the other hand.

Because team sport performance is a complex and multidimensional construct, comprehensive monitoring is crucial in understanding athletes' training response to modify training and recovery strategies (Halson, 2014; Bourdon et al., 2017; McGuigan, 2017; Coutts et al., 2018). Moreover, team sport coaches and practitioners usually deal with a large number of athletes. Another great challenge is, therefore, the implementation of a simple but effective monitoring system that involves at least some measures of training load, wellness, fitness, and readiness (Gabbett et al., 2017; McGuigan, 2017). The frequent assessment of various metrics could be difficult as compliance can be affected by the busy schedule and complex requirements of the team sport athlete.

In this regard, the use of heart rate (HR) and heart rate variability (HRV) measures in sports have been discussed for decades, as they represent an inexpensive, time-efficient, and non-invasive method to monitor the status of the autonomic nervous system (ANS) and cardiovascular fitness (Achten and Jeukendrup, 2003; Aubert et al., 2003; Borresen and Lambert, 2008; Alexandre et al., 2012; Daanen et al., 2012; Buchheit, 2014). Despite the large body of research and possible applications, monitoring athletes' training responses with HR measures is not widely implemented (Buchheit, 2014), which is due in part to contradictory findings (Alexandre et al., 2012; Bellenger et al., 2016), methodological inconsistencies (Plews et al., 2013), or partial misinterpretations (e.g., assuming that HR measures can reflect overall fatigue or fitness directly) (Achten and Jeukendrup, 2003; Buchheit, 2014). In any case, it is indisputable that HR data can measure only a limited number of aspects of performance or training response, and therefore must be combined with additional parameters.

In this technology report, we first briefly outline current applications and limitations of monitoring training response with HR and HRV in team sport athletes. Second, we present a conceptual framework for contextualizing HR measures, and methodological considerations of univariate and multivariate analysis approaches of HR monitoring data are addressed. Finally, we illustrate how different analysis concepts may affect the evaluation of data, and provide two case examples for practical decision-making with a simple, multivariate heuristical approach.

### HR MONITORING IN ATHLETES

HR measures are used as surrogate markers of the cardiac ANS status (Aubert et al., 2003; Michael et al., 2017). As the ANS is interlinked with many physiological systems, HR measures might reflect (aerobic-based) adaptation and fatigue status (Buchheit, 2014; Hottenrott and Hoos, 2017; Thorpe et al., 2017). However, HR measures are determined by multiple influencing factors, such as environmental (e.g., noise, light, temperature), physiological (e.g., cardiac morphology, plasma volume, autonomic activity), pathological (e.g., cardiovascular disease), psychological (e.g., mood, emotions, stress) conditions, and non-modifiable factors (e.g., age, sex, ethnicity), as well as lifestyle (e.g., fitness, sleep, medication, tobacco, alcohol) and determinants of physical activity (e.g., intensity, duration, modality, economy, body position) (Sandercock et al., 2005; Buchheit, 2014; Fatisson et al., 2016; Sessa et al., 2018). Nevertheless, it is assumed that, in competitive sports, the influence of training plays a predominant role in ANS status changes and, therefore, HR measures might be able to represent the athlete's training status (Lamberts et al., 2010; Buchheit, 2014).

The large number of original and review articles on HR monitoring published in recent decades documents the high interest in exercise and sport science (Task Force, 1996; Achten and Jeukendrup, 2003; Aubert et al., 2003; Carter et al., 2003; Sandercock et al., 2005; Hottenrott et al., 2006; Borresen and Lambert, 2008; Bosquet et al., 2008; Alexandre et al., 2012; Daanen et al., 2012; Plews et al., 2013; Stanley et al., 2013; Buchheit, 2014; Hettinga et al., 2014; Bellenger et al., 2016; Kingsley and Figueroa, 2016; Berkelmans et al., 2017). The growing popularity of HR measures among practitioners (Akenhead and Nassis, 2016; Thorpe et al., 2017), combined with the increasing number of commercial products and software for HR recording and analysis (Naranjo et al., 2015;

**Abbreviations:** %HRmax, Percentage of maximum heart rate; ANS, Autonomic nervous system; CV, Coefficient of variation; HR, Heart rate; HRex, (Submaximal) exercise heart rate; HRmax, Maximum heart rate; HRR, Heart rate recovery following (submaximal) exercise; HRrest, Resting heart rate; HRV, Heart rate variability; HR(V), Heart rate and heart rate variability; HRVpost, Post-exercise heart rate recovery; HRVrest, Resting heart rate variability; Ln rMSSD, Natural logarithm of the rMSSD; Ln rMSSD/RR, Ln rMSSD to R-R interval ratio; rMSSD, square root of the mean squared differences of successive normal R-R intervals; RPE, Rating of perceived exertion; SD, Standard deviation; SWC, Smallest worthwhile change; TE, Typical error.

Flatt and Esco, 2016; Perrotta et al., 2017; Plews et al., 2017b) further highlights the practical significance of this research field. While relying on countless years of scientific and practical experience (Israel, 1982), no other physiological parameters are available that provide a non-invasive, timeefficient, cost-effective, and continuous insight into a human's physiological response in almost any environment or stress situation. Nevertheless, HR measures cannot address all aspects of performance, fatigue, and well-being, but are mainly reflective of ANS status and cardiovascular fitness (Buchheit, 2014).

#### HR Measures and Protocols

Heart activity (HR and stroke volume) is integrated into numerous feedback (e.g., muscle mechanoreceptors) and feedforward (e.g., "central command") loops, and is continuously modulated by ANS activity on a beat-to-beat basis (Michael et al., 2017). Thus, it is critical to consider standardized procedures when collecting, analyzing, and comparing HR and HRV [HR(V)] within or between athletes. All HR measures are somehow related to ANS activity, but differ in their physiological determinants and their time course of adaptation, and display different sensitivity to changes in fitness, performance and training load (Bosquet et al., 2008; Buchheit, 2014). In this chapter (HR Monitoring in Athletes), we refrain from a detailed survey of the literature, as many review articles have already described the relationships between HR measures, the ANS, and other influencing factors, and have further defined general methodological guidelines for data collection and preparation. For example, an excellent overview of monitoring training status with HR measures has been provided by Buchheit (2014). Nevertheless, we provide a brief and focused account of the application and limitations of HR monitoring in team sports.

#### Resting Measures

Supine or seated short-term (5–10 min, Task Force, 1996) resting HR measures (HRrest, HRVrest) are currently suggested as a best practice for monitoring an athlete's ANS status (Buchheit, 2014). Resting HR(V) can be directly influenced by shortterm (e.g., blood/plasma volume changes, fatigue) and longterm training responses (e.g., cardiac morphology), which in turn may obscure the observation of changes in ANS activity (Fellmann, 1992; Zavorsky, 2000; Achten and Jeukendrup, 2003; Buchheit, 2014). Resting measurements (during nocturnal sleep or after awakening) are attractive since they are characterized by a high degree of standardization and, therefore, minimize many confounding factors (e.g., previous activity, time of day) (Achten and Jeukendrup, 2003; Fatisson et al., 2016). Additionally, these measurements can also be collected on resting days, in case of injury or sickness, and can further be used to modify individual training and recovery plans before the first daily session (Buchheit, 2014). Although some authors suggest that resting HRV might be more sensitive to training status than resting HR (Naranjo et al., 2015; Flatt and Esco, 2016), the superiority of HRVrest could be neither confirmed nor rejected (Billman et al., 2015). There are still large methodological inconsistencies in HRV assessment that impede the comparison and summary of findings (Task Force, 1996; Bellenger et al., 2016).

In team sports, daily morning assessments may prove useful, especially in short- to mid-term periods of increased stress, such as the evaluation of pronounced travel loads or training camps (Fowler et al., 2017; Malone et al., 2017). Under field conditions, time-domain HRV indices (e.g., Ln rMSSD: natural logarithm of the square root of the mean squared differences of successive normal R-R intervals) have become established to assess daily changes in ANS status, as they are more reliable (Al Haddad et al., 2011) and less affected by different breathing patterns (Penttilä et al., 2001; Saboul et al., 2013) compared to spectral analyses. When assessing long-term changes, it is suggested to analyze (rolling) weekly averages (≥3–4 measurements per week) to increase validity (Plews et al., 2014) and express day-to-day-fluctuations as a weekly coefficient of variation (CV; Plews et al., 2012; Flatt and Esco, 2016). However, it might be unrealistic in practice to implement frequent (≥3–4 times per week) home-based resting measures in an entire squad of elite or high-level players over a prolonged training period (Buchheit, 2014; Thorpe et al., 2017). An alternative approach could use pre-training recordings (Nakamura et al., 2016; Malone et al., 2017). Furthermore, the extended evaluation and application of ultra-short-term recordings (<5 min, often ≤1 min; Flatt and Esco, 2013; Esco and Flatt, 2014; Nakamura et al., 2015; Pereira et al., 2016; Esco et al., 2018) with commercial software, such as smartphone applications (e.g., Elite HRV Perrotta et al., 2017; ithlete Flatt and Esco, 2013; HRV4Training Plews et al., 2017b), enables feasible analysis of an entire team's data almost immediately after the assessment. These technological developments may improve compliance and increase the applicability of resting measurements in the future, at least in settings with high formal program commitment as in junior or high school and college athletes.

#### Exercise Measures

Over a wide range of endurance exercise intensities, exercise HR (HRex) is linearly related to oxygen uptake and energy expenditure during continuous work and is therefore commonly used to monitor and prescribe exercise intensity and training load (Achten and Jeukendrup, 2003; Borresen and Lambert, 2009; Alexandre et al., 2012; Berkelmans et al., 2017). Furthermore, exercise HR has been traditionally evaluated under submaximal (HRex) and maximal efforts (HRmax) using incremental tests to assess cardiovascular fitness (Achten and Jeukendrup, 2003; Buchheit, 2014). As the relationship between common (vagalrelated) HRV measures and exercise intensity is flawed (Buchheit, 2014; Michael et al., 2017; see also section Limitations of Univariate HR Monitoring) and beat-to-beat recordings during exercise are susceptible to artifacts (e.g., lost beats due to HR belt movement), only HRex at fixed external loads (not exercise HRV) averaged over the last 30-60 s can be recommended for longitudinal athlete monitoring (Buchheit, 2014). Whether exercise HR can depict fitness impairments sensitively is still unclear, as increased HRex does not indicate impaired performance per se (Buchheit, 2014; Thorpe et al., 2017) but likely occurs with prolonged detraining (Mujika and Padilla, 2000a,b). Moreover, similarto interpreting changes in resting HR(V), longterm fitness-related changes in HRex may also be skewed due to acute or short-term responses to training or environmental conditions.

Since the repeated assessment of maximal physical performance is unsuitable in (team sport) athletes, submaximal, non-exhaustive tests have been more frequently adopted by researchers and practitioners during recent decades (Buchheit, 2014; Halson, 2014; Akenhead and Nassis, 2016; Capostagno et al., 2016; Thorpe et al., 2017). However, the protocols used vary greatly in modality (running Malone et al., 2017 vs. cycling Thorpe et al., 2015), load characteristics (continuous Buchheit et al., 2010 vs. intermittent Brink et al., 2013, linear Buchheit et al., 2010 vs. shuttle runs Bradley et al., 2011, constant Buchheit et al., 2010 vs. graded Bradley et al., 2011), test duration (5 min Buchheit et al., 2010 to 16 min Vesterinen et al., 2017), intensity (low-intensity Buchheit et al., 2013c vs. high-intensity Vesterinen et al., 2017) and workload prescription (standardized Bradley et al., 2011 vs. individualized Buchheit et al., 2010, internal Vesterinen et al., 2017 vs. external Bradley et al., 2011).

In team sports, standardized (rather than individualized) submaximal running tests seem to be most appropriate in a variety of settings (level of competition, team budget, squad size). Low-intensity exercise could be implemented in the first part of the warm-up for most athletes (fit, unfit, fatigued, early stage of return to activity after an injury or sickness) and scenarios (training camps, preparation and recovery periods, in-season) without adding substantial fatigue, whereas higher intensities might be associated more closely with sport-specific performance (Bangsbo et al., 2008; Lamberts et al., 2010, 2011; Bradley et al., 2011). In absence of definite protocol recommendations in terms of test quality criteria (validity, reliability, signal-to-noise ratio), we suggest using either submaximal versions of established fieldtests (Multi-stage Fitness Test Léger and Lambert, 1982, Yo-Yo Tests Bangsbo and Mohr, 2012, 30-15 Intermittent Fitness Test Buchheit, 2010) or fixed-intensity runs on a specific shuttle length (or field size). **Figure 1** shows exemplary HR recordings of a semi-professional basketball player during submaximal and maximal shuttle runs, which display typical changes in HRex in response to a preparation period (see figure legend for details).

#### Post-exercise Measures

Following exercise cessation, HR decreases exponentially, and HRV indices start to increase. Post-exercise HR measures (HRR: HR recovery, HRVpost) reflect general hemodynamic adjustments and might be related to aerobic fitness, wellness, and readiness to perform (Buchheit, 2014). ANS activity following exercise cessation is influenced primarily by parasympathetic reactivation in the early stage of recovery [during the first minute(s)], followed by additional sympathetic withdrawal during mid- to long-term recovery (minutes to hours; Borresen and Lambert, 2008; Hottenrott and Hoos, 2017; Michael et al., 2017; Peçanha et al., 2017). However, post-exercise ANS activity and HR(V) recovery are influenced by the preceding (relative) intensity (Stanley et al., 2013; Michael et al., 2017), and may, therefore, be more indicative of fitness than ANS status (Buchheit, 2014). In general, HRR is more favorable than HRVpost. It requires shorter recording periods (HRR: 30–60 s vs. HRVpost: ≥3–5 min), is accessible with any HR device, and may have a superior signal-to-noise ratio (Buchheit, 2014). The easiest way to calculate HRR is by taking the difference of HR at exercise cessation and after, for example, 1 min recovery (Peçanha et al., 2017). However, it is recommended to average HR recordings over several seconds (typically 5–15 s) to increase objectivity and reduce (measurement) error (Daanen et al., 2012; Buchheit, 2014).

From a practical point of view, team sports practitioners should evaluate the additional effort and benefit of post-exercise measures critically in their own setting. While an additional (standing or seated) 30–60 s recording seems to be reasonable, it remains unclear whether HRR after submaximal exercise adds beneficial information (to HRex), especially when workloads are fixed rather than individualized in team sports (different relative intensities between players). Additionally, post-exercise measures could unnecessarily complicate data collection and interpretation in the worst-case scenario (see Buchheit, 2014 for discussion).

## Monitoring Training Response With HR Measures

#### Acute Responses

Monitoring an athlete's acute changes in HR measures in response to training is a critical but, at the same time, debated topic in HR(V) research. A major component of the scientific discussion is centered around day-to-day fluctuations in (especially resting) HR measures and possible causes of these variations (Buchheit, 2014). The underlying mechanisms are not entirely clear yet. There are arguments for daily changes as reflective of measurement noise (i.e., measurement error), which results in poor reliability of daily resting measures (Al Haddad et al., 2011) compared to exercise HR (Buchheit, 2014) and should, therefore, be interpreted as random error. Furthermore, day-to-day fluctuations might be interpreted as (physiological) signal, and changes being related to training load, stress, and fatigue (Stanley et al., 2013). In line with the latter assumption, several attempts have been made to guide training programs based on daily (resting) HRV as a marker of (cardiovascular) recovery, resulting in either larger adaptations or more efficient training compared to conventional predefined training programs (Kiviniemi et al., 2007, 2010; Vesterinen et al., 2016; da Silva et al., 2017; Nuuttila et al., 2017). However, it must be considered that HRV-guided training programs have always been exclusively based on endurance training and were subject to certain restrictions and training principles (for example, a maximum of two successive high-intensity training days).

In general, training intensity is a key determinant of cardiac autonomic activity alterations following aerobic-oriented exercise (e.g., the higher the intensity, the longer the homeostatic distraction) and might be more influential than duration (Stanley et al., 2013; Hottenrott and Hoos, 2017; Michael et al., 2017). Complete cardiac autonomic recovery requires up to 24 h

following low-intensity, 24–48 h following threshold-intensity and at least 48 h following high-intensity endurance exercise (Stanley et al., 2013). Therefore, acute changes in training load can result in altered vagal-related HRV (Stanley et al., 2013; Malone et al., 2017; Michael et al., 2017), HRR (Borresen and Lambert, 2007; Daanen et al., 2012; Malone et al., 2017) and HRex (Buchheit et al., 2013a,c; Malone et al., 2017). Furthermore, stable (Plews et al., 2012) or reduced (Flatt and Esco, 2016) dayto-day variations (expressed as a weekly CV) in resting HRV have been observed together with positive adaptation, but also a large reduction in CV was reported before non-functional overreaching (Plews et al., 2012). However, as previously described, numerous circumstances are known to acutely affect HR indices, such as plasma volume changes [e.g., due to heat acclimatization, (intense) aerobic exercise (Fellmann, 1992)], hydration status (Achten and Jeukendrup, 2003; Buchheit, 2014), sickness (Buchheit et al., 2013c), or long-haul travel (Fowler et al., 2017), which must be considered when interpreting day-to-day changes. Typically, these acute effects are reversed within a few days.

starting at 50%HRmax (e.g., red bar: 90–100%HRmax). HRex: exercise HR; HRR: HR recovery over 60 s; Prep: preparation period.

#### Short-Term Responses

During short- to mid-term periods of increased stress or intensified training, such as long-haul flight travel (Fowler et al., 2017) and heat, altitude, or training camps with increased volume and/or intensity (Achten and Jeukendrup, 2003; Buchheit et al., 2011; Berkelmans et al., 2017), HR monitoring might enable practitionersto assess an athlete's ability to cope with, and recover from, the induced demands. In the context of training, all of the previously described HR measures have been shown to reflect overload-induced performance changes sensitively on several occasions (Pichot et al., 2000; Borresen and Lambert, 2007; Bosquet et al., 2008; Bellenger et al., 2016; Capostagno et al., 2016; Hammes et al., 2016; Flatt et al., 2017) and therefore are possibly reflective of short-term (i.e., cumulative) fatigue responses. For example, in unpublished studies, we observed substantially increased HRrest (decreased HRVrest) in supine position within 6-day overload microcycles of either high-intensity interval training or intensive whole-body strength training. While these changes in the supine recording position might be somewhat plausible due to the excessive overload, the standing HR(V) recordings displayed a large progressive reduction in HRrest (increased HRVrest) during the high-intensity interval training period. In the subsequent 4-day recovery phase, these alterations showed reverse trends. In summary, the changes in (supine) resting HR measures were parallel to the (stress- and fatiguerelated) changes in training-specific performance (repeated sprint ability and maximal strength, respectively; see **Table 1** in section Training Context is Key for further details).

#### Long-Term Responses

Since an athlete's training status is influenced by acute, shortterm, and long-term responses, it is of central importance to consider the (aerobic) fitness level, chronic training loads, and the current training phase of the athlete for correct interpretation and contextualization of HR measures. In general, HR measures correlate with aerobic fitness or performance markers, with resting and exercise HR being lower and resting HRV being higher in better-trained athletes (Achten and Jeukendrup, 2003; Aubert et al., 2003; Sandercock et al., 2005; Hottenrott et al., 2006; Messina et al., 2012; Plews et al., 2013; Hottenrott and Hoos, 2017; Proietti et al., 2017; Thorpe et al., 2017; Sessa et al., 2018). However, it must be considered that increased exercise or test performance is not necessarily reflective of positive adaptation since increased "readiness" or motivation at the same fitness level may cause higher performance outcomes (Plews et al., 2013; Coutts et al., 2018). This likely contributes to some of the contraindicatory findings in research (see section Contextualizing HR Measures). Overall, fewer data exist on the sensitivity of HR measures to detect negative training response or maladaptation (Buchheit, 2014; Bellenger et al., 2016).

In trained athletes, moderate training loads typically increase aerobic fitness and HRV, whereas high training loads reduce HRV (Iellamo et al., 2002; Manzi et al., 2009; Plews et al., 2013). HRR is typically accelerated with high training volume (Buchheit, 2014). It is generally assumed that increased training volume likely results in HR(V) changes reflecting increased parasympathetic activity (e.g., decreased HRrest and increased HRVrest), whereas increased training intensity with a concomitant decrease in training volume results in HR(V) changes reflecting increased sympathetic activity (increased HRrest and decreased HRVrest) (Israel, 1982; Fry and Kraemer, 1997; Lehmann et al., 1998; Armstrong and VanHeest, 2002; Plews et al., 2013; Buchheit, 2014; Hottenrott and Hoos, 2017).

In endurance athletes, a bell-shaped time course of resting HRV in the weeks leading up to a key race may reflect an optimal scenario for peak competitive performance (Manzi et al., 2009; Plews et al., 2013, 2017a; Buchheit, 2014). Vagal-related HRV likely increases during the building phase, which is characterized by high training volume at low intensities (Buchheit, 2014). During tapering, decreased HRVrest and increased performance is typically observed, which could be explained by a shift of training distribution toward high-intensity exercise, as well as pre-competition stress (Edmonds et al., 2013; Plews et al., 2013; Buchheit, 2014). We assume that some contradictory findings on the relationship between HR measures, performance, and fatigue are caused by these observations, since neither aspects of periodization nor delayed training effects have been adequately considered in the available meta-analysis (Bosquet et al., 2008; Bellenger et al., 2016), nor has inter-individual time course of HR(V) response been properly assessed or reported, with the exception of several case studies (Plews et al., 2012, 2017a; Stanley et al., 2015). In summary, cumulative, and long-term HR(V) responses during different training phases could be explained by a prolonged accumulation of intensity-related acute effects of single training sessions in the presence or absence of sufficient recovery to reach baseline levels (Stanley et al., 2013; Buchheit, 2014). An overview of acute, short-term and long-term training responses in HR measures is provided in **Table 1** (section Training Context is Key).

### Applications in Team Sports

In recent years, elite team sport athletes have become more exposed to high competitive loads due to the increased frequency and intensity of domestic and international competitions during both the domestic season and the off-season period (Thorpe et al., 2017). As increased player availability may lead to an increase in chances for success, fatigue management is crucial for injury and illness reduction (Bourdon et al., 2017; Thorpe et al., 2017). However, at moderate to high performance levels, there is usually a consistent and similar structure for each week during the competitive period, which may intuitively lead to weekly scheduling of training and testing relative to days until or after game-day (McGuigan, 2017; Thorpe et al., 2017). This weekly structure creates regular and comparable testing conditions (e.g., two days after competition), which may help to minimize acute "confounding" effects (e.g., fatigue) when interpreting long-term training changes in HR measures (e.g., fitness).

A large challenge in team sport monitoring is the complex and multifactorial nature of sports performance, training, and game demands, which includes technical, tactical, physiological, psychological, and social components (Coutts et al., 2018). To date, there is no uniform definition of player or team performance, which limits its quantitative description and the identification of possible influencing factors. Further, it

FIGURE 2 | Changes in HR measures in a semi-professional basketball player during a preseason preparation period and the first half of the competitive season. Resting HR measures (HRrest, Ln rMSSD) were assessed daily with 1-min ultra-short-term recordings upon awakening, in a seated position using commercial HR monitoring software (HRV4Training, Plews et al., 2017b). Values are displayed as daily values and rolling 7-day averages. Exercise HR (HRex) and HR recovery (HRR) were assessed weekly with a submaximal shuttle run (see Figure 1 for details) during the warm-up in the team's evening practice 2-days post game-day. Acute and chronic training loads were calculated over 1 and 4 weeks of training, respectively [training load (AU, arbitrary units) = session-RPE (0–10) × training duration (min), (Gabbett, 2016)]. The gray horizontal bars represent trivial changes based on the suggested smallest worthwhile change for each measure: 0.5 × SD during the first 2 weeks for HRrest and HRVrest (Ln rMSSD), 1% for HRex and 7% for HRR (Buchheit, 2014).

remains speculative as to which amount the previously described associations between changes in training volume and intensity with changes in HR measures in endurance athletes are transferable to team sports, since the appropriate quantification of training load, volume, and intensity over the variety of training modalities and biological systems stressed in team sport practice is challenging (Buchheit, 2014; Bourdon et al., 2017).

Despite these limitations, analyzing dose-response relationships is a central component of athlete management (Gabbett et al., 2017; McLaren et al., 2018), as it helps to assess injury risk (Gabbett, 2016; Bourdon et al., 2017) and thus may indirectly influence sports performance (i.e., success) through increased player availability (Thorpe et al., 2017). Since physical performance measures during sport-specific drills and match play are highly variable, external-internal load relationships are commonly assessed using submaximal tests (Buchheit, 2014; Thorpe et al., 2017). The protocols are typically based on continuous or intermittent aerobic-based exercise (Bradley et al., 2011; Brink et al., 2013; Buchheit et al., 2013a), which are well standardized but correspondingly less valid for overall physical performance (Thorpe et al., 2017). The use of sportspecific "closed-loop" drills might be an alternative approach, as sport-specific motion patterns and demands are simulated and performance output might be less variable than during an actual match (Buchheit et al., 2013a; Malone et al., 2017; Thorpe et al., 2017). Also, developments in wearable sensor technology will enable researchers and practitioners to assess integrated external and internal loads during any sport-specific training modalities in the future (see Lacome et al., 2018 for practical example). These developments, for example, may allow (almost) real-time analysis of locomotor movement patterns on the physiological response, such as changes in running technique and, therefore, running economy on HR response. For illustrative purposes, **Figure 2** represents an overview of currently suggested applications of resting and exercise HR measures in a semi-professional team sport athlete during a preparatory phase and the first half of the competitive season.

### CONTEXTUALIZING HR MEASURES

### Limitations of Univariate HR Monitoring

Although each of the previously described HR measures was sensitive to changes in fitness, fatigue, and performance in several instances, a recent meta-analysis found that the direction of change was the same for both increased and decreased performance (Bellenger et al., 2016). For example, vagal-related HRVrest increased parallel to both increased and decreased (aerobic) performance, representing either increased parasympathetic modulation or parasympathetic hyperactivity. Similarly, decreased HRex was observed in both concurrent performance increases (Buchheit, 2014) and overreachingassociated performance impairments (Bosquet et al., 2008). To date, the only promising approach for deciphering this dilemma lies in the contextualization of HR measures and the use of multivariate approaches (Bosquet et al., 2008; Lamberts, 2009; Plews et al., 2013; Buchheit, 2014; Bellenger et al., 2016; Capostagno et al., 2016; Bourdon et al., 2017; Hottenrott and Hoos, 2017; Thorpe et al., 2017; Coutts et al., 2018; Kellmann et al., 2018).

As previously described, a fundamental difficulty is that fatigue and performance are multifactorial constructs (Fry and Kraemer, 1997; Armstrong and VanHeest, 2002; Borresen and Lambert, 2008; Meeusen et al., 2013; Buchheit, 2014; Thorpe et al., 2017; Coutts et al., 2018; Kellmann et al., 2018), which, under certain circumstances, can be influenced measurably by changes in an athlete's ANS status (Israel, 1982; Lehmann et al., 1993) and vice versa. However, training elicits a variety of responses and adaptations on various levels (e.g., cardiovascular, hormonal, neuromuscular, psychological), any of which may result in performance or fatigue changes, either in isolation or combination. Conversely, it is unlikely that any single marker can accurately display changes in a multidimensional construct, such as performance or fatigue (Meeusen et al., 2013; Bourdon et al., 2017; Coutts et al., 2018; Kellmann et al., 2018). Therefore, HR(V) measures can only be used to assess ANS status (at rest, exercise onset, post-exercise) and overall cardiovascular function (during exercise; Buchheit, 2014) and should be considered as only one of the determinants influencing an athlete's training status.

Also, the (mathematical) relationship between ANS activity and HR(V) is indirect and is an often-overlooked limitation in research, which could cause partial misinterpretations (Plews et al., 2013; Buchheit, 2014). More precisely, this means that changes in ANS status (i.e., ANS activity) are not directly reflected in changes in HR measures, and direct associations cannot be assumed (Plews et al., 2013; Buchheit, 2014; White and Raven, 2014; Hottenrott and Hoos, 2017). For example, increasing vagal nerve activity generally increases vagal-related HRV. However, at low HR levels, HRV is often reduced rather than increased due to parasympathetic hyperactivity causing the so-called saturation phenomenon, which may be explained by saturation of acetylcholine receptors at the myocyte level (Plews et al., 2013; Buchheit, 2014). To overcome this issue, resting HR and HRV should be concomitantly assessed and interpreted using intraindividual historical data, representing vagal tone and modulation respectively, and normalizing HRV for the prevailing R-R interval (Plews et al., 2013; Sacha, 2013; Buchheit, 2014; Billman et al., 2015). During exercise, ANS balance continuously shifts from parasympathetic to sympathetic dominance as a function of intensity, whereas vagal-related HRV indices typically level off at moderate intensity (Buchheit, 2014; Michael et al., 2017) and therefore cannot measure ANS activity over the entire range of intensities. Furthermore, HRR and HRVpost, as possible indicators of ANS activity, might be biased by metaboreflex stimulation and should, therefore, be concomitantly interpreted with HRex (Buchheit, 2014).

### Training Context Is Key

The most relevant information for contextualizing HR measures includes training phase, training load, and intensity distribution (Buchheit, 2014). Also, it seems necessary to consider the specific time course of training schedules and training responses and further examine (subjective) measures of well-being and recovery/fatigue state, or rating of perceived exertion (RPE) when using exercise measures. To get a more holistic impression of


(Continued)

TABLE 1 | Overview and schematic representation

monographs

 (M), book chapters (C) in scientific collections,

 of suggested

 overall effects in different HR and context measures in various (team) sports-related

 and PhD theses (T)].

 scenarios [data derived from reviews (R), original articles (O),


exertion; su, supine recording; st, standing recording; RSA, repeated sprint ability; vol, training volume; int, training intensity.

an athlete's training status, practitioners must combine these measures with additional markers of sport-specific performance (Bosquet et al., 2008; Lamberts, 2009; Plews et al., 2013; Buchheit, 2014; Bellenger et al., 2016; Capostagno et al., 2016; Hottenrott and Hoos, 2017; Thorpe et al., 2017). **Table 1** provides an overview of changes in HR and context measures within different training settings. Particular emphasis was placed on structuring the information regarding the time course of training responses as well as the respective training context. The summarized and schematized changes reflect overall group-based effects. Typically, these observed groupeffects are accompanied by large inter-individual variation, which might display contrary behavior on the individual level and highlights the necessity for individualized analysis in sports practice (Plews et al., 2013; Buchheit, 2014; Volterrani and Iellamo, 2016; Hottenrott and Hoos, 2017). However, referring to group-based suggestions of expectable changes might be an appropriate starting point if practitioners are aware of the common between-athlete variations in response and try to identify individual response patterns to consider them for future analysis.

### Methodological Considerations

Using appropriate analysis strategies to interpret individual monitoring data is an essential component of successfully implementing athlete monitoring systems in professional and elite settings (Akenhead and Nassis, 2016). However, there is a considerable research deficit in the area of single-case analysis in sport science and, accordingly, there is a lack of systematic methodological comparisons and recommendations (Buchheit et al., 2014). On the one hand, there is a need for theory-driven and evidence-based methods for data processing and making sense of time series in each measure, while on the other hand, several measures must be combined within a theoretical framework and with multivariate analysis techniques (Kellmann et al., 2018). From a scientific perspective, the ideal overall decision-making process incorporates formalized and validated analysis approaches with high prognostic precision. Furthermore, practitioners need to be able to make quick decisions to modify training and recovery strategies when identified necessary (Starling and Lambert, 2017). Therefore, analysis concepts and methods that enable informative and intuitive visualization are crucial to inform and impact the coaching process (Bourdon et al., 2017; Buchheit, 2017; McGuigan, 2017; Robertson et al., 2017; Thorpe et al., 2017; Heidari et al., 2018). In this regard, the work of Will G. Hopkins on interpreting changes in athlete monitoring (Hopkins, 2004) has had significant impact on current analysis approaches and recommendations in sports research and practice (Akenhead and Nassis, 2016; Buchheit, 2016; McGuigan, 2017; Robertson et al., 2017; Thorpe et al., 2017; Coutts et al., 2018; Kellmann et al., 2018). However, critical evaluation and comparison of the proposed approaches is still pending. In this section, we briefly discuss some of the available analysis concepts, methodological approaches based on univariate data, and possible multivariate strategies to evaluate HR monitoring data.

#### Assessing Meaningful Change

The overall objective of monitoring training response is to identify meaningful changes to adjust training and recovery prescription, when necessary. To evaluate the importance of an observed change, the measurement accuracy or uncertainty of the observed response, as well as the magnitude of the response, must be considered (Hopkins, 2004; Buchheit, 2014; Thorpe et al., 2017). The minimal detectable change refers to changes that are larger than the typical within-subject variation in a measurement, which includes technical error as well as biological variation, and which is usually estimated by measures of reliability (McGuigan, 2017; Thorpe et al., 2017; Hecksteden et al., 2018). However, establishing this threshold requires a normative, and therefore to some degree subjective, determination of "acceptable" error rates (see Hecksteden et al., 2018 for discussion). In this regard, monitoring parameters are commonly rated as useful or sensitive based on providing high reliability and, therefore, low (random or unavoidable) test-retest variation (i.e., noise), which is typically measured as the standard error of measurement (i.e., typical error, TE) and often expressed as CV in %. Although a low measurement error is required to identify small observed changes as true changes (e.g., changes that are larger than the TE), the magnitude of change that can be expected or elicited by an intervention (i.e., signal) is of equal importance. Therefore, it is preferable to judge the sensitivity in a measure by evaluating the signal-to-noise ratio (Buchheit, 2014).

Furthermore, the smallest worthwhile change [SWC, also minimum (clinically) important difference] describes the minimal change in a measurement that results in a practically meaningful enhancement in sport-specific or competitive performance (Hopkins, 2004) (e.g., a change larger than 1/3 of betweencompetition CV in individual sports to substantially increase chances of winning a medal, or ∼0.03 s for 20-m sprint time in soccer to be ahead of the opponent to win a ball; Buchheit, 2018). Two main concepts may be distinguished when determining the SWC: distributional and anchor-based approaches (Thorpe et al., 2017).

In distributional approaches, monitoring data are evaluated in reference to within-group and/or within-athlete variation, which is commonly done by data-transformation (i.e., Z-Scores) and defining (usually arbitrary) thresholds for trivial vs. substantial variation (e.g., Z-Score >1; Akenhead and Nassis, 2016; McGuigan, 2017). In the former case, an athlete's score or response is compared to the reference group (Julian et al., 2017) and therefore strongly dependent on the group's level and heterogeneity in performance. The latter could be described as a within-athlete distributional approach, typically rating observed values/changes as meaningful when located outside the "normal" fluctuation around the individual mean (Akenhead and Nassis, 2016; McGuigan, 2017). Also, week-to-week changes may be expressed as standardized differences [e.g., week-to-week change divided by weekly standard deviation (SD); (Stanley et al., 2015)].

In contrast to distributional approaches, anchor-based approaches rely on the association between the observed measure and an external (criterion) measure of interest. For instance, a certain amount of (change in) training load, which is associated with increased injury risk (Soligard et al., 2016). Ideally, the

assessment of training response incorporates an estimation of an individual confidence interval (or remaining uncertainty) in relation to the SWC (Hopkins, 2004; Hecksteden et al., 2018). For example, practitioners can use an online spreadsheet<sup>1</sup> to analyze individual changes considering the TE and a (normative) SWC (Hopkins, 2000).

In absence of a sound theory or corresponding empirical observations, changes in resting HR measures are commonly evaluated in reference to the individual within-athlete variation (i.e., SD: standard deviation) in a period of "normal" training, (Buchheit, 2014; Plews, 2014), as they have no direct link to (aerobic) performance (Buchheit, 2017). However, the choice of the threshold value, which in this case is a fraction or a multiple of the SD, is highly arbitrary and subjective, and thus depends on the individual response profile and how conservative the coaching or decision-making should be (Buchheit, 2017). In contrast, the relationship between exercise HR and (aerobic) performance is quite strong, and an empirical SWC of 1% in submaximal HRex was suggested, as it may correspond to a meaningful change in (aerobic) performance (Buchheit, 2014, 2017).

In athlete monitoring, there are also other analysis methods that cannot be clearly assigned to the concepts of minimal detectable change or SWC. In training load management, it has become best practice to evaluate short-term (acute, usually ∼5–10 days) and long-term (chronic, usually ∼4– 6 weeks) accumulated loads using (exponentially weighted) rolling averages and acute-to-chronic ratios (Bourdon et al., 2017). Also, mid- to long-term changes and trends could be evaluated with (linear) trend analysis (i.e., the slope of the regression; Plews et al., 2012; Hopkins, 2017; Sands et al., 2017). Moreover, a more advanced approach was recently introduced by Hecksteden et al. (2017), using Bayesian statistics to compile individualized reference ranges to differentiate between two states of muscle recovery. Group-based reference ranges (i.e., priori distribution) were combined with repeated individual measures to generate individual posterior distributions for each recovery state (Hecksteden et al., 2017; a spreadsheet is provided online by the authors). In summary, although a variety of analysis concepts and methods have been described, there is only a negligible number of studies that systematically compare different analysis approaches (Buchheit et al., 2014; Hecksteden et al., 2018). Moreover, it remains unclear whether and how reference values (e.g., baseline mean and SD or TE) need to be adjusted over time since, among other elements, measurement variability and error are likely training-phase dependent (Taylor et al., 2016). For example, we are only aware of one study that (arbitrarily) updated the individual HR(V) reference values after 4 weeks of training (Vesterinen et al., 2016).

**Figure 3** visualizes different analysis concepts and methods and their effects on rating observed changes as meaningful. This example highlights the necessity of a systematic evaluation of the suggested analysis methods and concepts since there is considerable disagreement between approaches (see also Hecksteden et al., 2018 for a detailed discussion).

#### Multivariate Approaches

A common multivariate approach in HR monitoring is a parallel inspection of several markers in combination with simple decision rules. For example, if RPE during and HRR following submaximal exercise are (clearly) elevated, the athlete is likely fatigued (Lamberts et al., 2011). Typically, either each marker, or a minimum number of markers (e.g., at least 2 out of 3), are required to change beyond predefined cut-off values to be interpreted as substantially deviated (Lamberts, 2009). Rather than analyzing markers in a dichotomous fashion (above- or below-threshold), a continuous combination of different markers as ratios (e.g., HR/RPE, Ln rMSSD/RR) is also often proposed (Buchheit, 2014; Halson, 2014; Bourdon et al., 2017). Moreover, visualizing individual response (pattern) with spider diagrams illustrates another valuable and more insightful alternative to ratios since they display the magnitude of change in every single measure and allow the assessment of changes relative to each other when data are appropriately scaled (Julian et al., 2017).

However, the gradual or hierarchical evaluation of variables in the structure of flow charts (Plews, 2014) or closed-loop models (Kiviniemi et al., 2007; Gabbett et al., 2017) appears somewhat advanced. In this context, the so-called (fast-andfrugal) heuristics approach (Raab and Gigerenzer, 2015) provides an attractive opportunity to organize several markers, both structurally and content-wise (i.e., decision trees). At the same time, such heuristics represent an intuitive and simplistic strategy, which reflects fast and practical decision-making in (sports) practice in situations with high uncertainty since only data on a limited number of relevant influencing factors are available (Raab and Gigerenzer, 2015; Jovanovic, 2017). They emerge in the form of (fast-and-frugal) decision trees and consist of three main factors: search rules (where to look for information), stopping rules (when to end search) and decision rules (how to make a decision, Raab and Gigerenzer, 2015). However, although "heuristical" interpretation and decision-making appears appealing in general, the application of fast-and-frugal decision trees in HR monitoring is still largely limited by the previously discussed research deficits (e.g., inconclusive association between HR measures and training load, fatigue, and fitness or performance; see sections Limitations of Univariate HR Monitoring and Training Context is Key).

Obviously, there are more advanced and complex multivariate analysis methods than the previously mentioned simple approaches available. For example, the current training research also suggests the use of multiple (logistic) regressions (Weiss et al., 2017), generalized estimating equations, neural-networks (Pfeiffer and Hohmann, 2012; Bartlett et al., 2017), or modeling techniques based on the original systems-theory model by Banister, developed in 1975 (Perl and Pfeiffer, 2011). Although these advanced concepts are scientifically promising and probably superior to simple or linear concepts, a more detailed discussion is beyond the scope of this report as we are only aware of one investigation using such an advanced multivariate approach to analyze athletes' training response with HR measures (Lacome et al., 2018). Therefore, a broad implementation in

<sup>1</sup> sportsci.org/resource/stats/xprecisionsubject.xls (Accessed February 07, 2018).

FIGURE 3 | Example of visualization and comparison of different analysis concepts and methods for assessing meaningful change in weekly exercise heart rate (HRex) in a semi-professional basketball player over an entire season. HRex was assessed on a weekly basis using a submaximal shuttle run during the warm-up (see Figure 1). In (A), changes from baseline level (average of first 4 weeks of the preparation period) are rated and highlighted as meaningful with three different methods: First, when changes are larger than the smallest worthwhile change (SWC, gray horizontal bar, s), second, when changes are larger than the typical error (TE, error bars, t), or third, when changes are larger than both (SWC+TE, circle). The values for the SWC (>1%) and the TE (>3%) are derived from Buchheit (2014). In (B), changes are analyzed with two within-athlete distributional approaches [Z-Scores: individual mean ± standard deviation (SD)]. The values are rated and highlighted as being meaningfully deviated when Z-Scores are >1. In the first approach, Z-Scores are calculated based on the entire data set (solid horizontal lines, \*), which represents a retrospective analysis after the data collection was completed. In the second approach, Z-Scores are calculated on a "rolling" and additive basis and with all data available at each point in time (dashed lines, #). This likely represents a more realistic approach in sports practice, as monitoring data are analyzed as soon as available and therefore based on a steadily increasing data set. The analysis concepts and methods visualized illustrate a considerable disagreement between methods and concepts. Symbols: ↓: below baseline, ↑: above baseline, –: 1xSD below the mean, +: 1xSD above the mean.

sports practice in the near future seems difficult to achieve (Bourdon et al., 2017).

### PRACTICAL DECISION-MAKING WITH HR MONITORING—CASE EXAMPLES

This section aims to provide two case studies that illustrate how short- and long-term responses in HR measures could be contextualized and analyzed in a multivariate fashion, using a heuristics approach to guide training and recovery prescription. For this purpose, we first differentiate between the analysis of short- and long-term changes and further define the training context. For simplicity, we distinguish between training and recovery periods. Training periods are defined as constant or increasing training loads, whereas recovery is characterized by training load reductions or rest. These initial determinations specify how observed changes are interpreted and, therefore, how decisions are made (i.e., decision rules). Based on the previously presented research (**Table 1**), a multivariate analysis of HRex in

FIGURE 4 | Short-term changes in exercise heart rate (HRex) and rating of perceived exertion (RPE) in an elite, male badminton player (20-year-old) throughout a preparatory period. HRex (circles) and RPE (bars) were assessed on Mondays (post Rec., gray symbols) following 2 days of pronounced recovery, and on Fridays (post Train., blue symbols) following four consecutive days of training (with two sessions on several days) using a submaximal shuttle run (∼1, 1, and 3 min at 8.2, 9.6, and 11.0 km/h, respectively; 12.8 m shuttle length) during the warm-up of the morning sessions. HRex was consistently reduced on Fridays (mean ± SD, −7 ± 1 bpm) and increased on Mondays (+5 ± 2 bpm), which may be interpreted as a result of short-term changes in training load between tests. Similarly, RPE during the shuttle runs was typically increased on Fridays and decreased on Mondays. When applying the presented heuristical logic to decision-making, in most cases the obvious conclusions are drawn corresponding to the general training plan: After several consecutive (intensive) training days, the training load should be reduced in the following days to encourage recovery, as the reduced HRex, and the increased RPE indicate acute fatigue. Likewise, the increased HR and reduced RPE on Mondays indicate recovery, which supports a resumption of (intense) training. However, according to the presented logic, one could have deviated from the training plan at two points in time: On day 24, the relatively high RPE indicates an incomplete recovery, and consequently further facilitating of recovery strategies or at least a reduction in planned workload seemed appropriate. In contrast, the low RPE and the somewhat less severe decline in HRex on day 35 point to the possibility of continuing to tolerate high training loads at least for another training session. Furthermore, the overall decline in HRex over the training weeks, while maintaining a constant or slightly decreasing RPE, indicates positive adaptation and appropriate training periodization.

combination with the rating of received exertion (RPE) might provide adequate information to interpret an athlete's training status (i.e., search rules and stopping rules) in the following case examples.

In the first example (**Figure 4**), an elite, male badminton player was monitored twice per week using a submaximal shuttle run throughout a preparatory period. Although the player is specialized in the (mixed) Doubles discipline, badminton is typically classified as a racket sport, not as a team sport. There are, however, great similarities in the training structure and training demands to those in team sports, since different domains, such as endurance, strength, power, speed, and technical and tactical elements are concurrently trained. Accordingly, we are convinced that the observed short-term responses in exercise HR (HRex) and their underlying physiological mechanisms justify transferability to team sport settings. During the training period, we observed a noticeable and consistent pattern in changes in HRex and RPE during a submaximal run in response to the typical weekly training schedules (see **Figure 4**'s text legend for details). In this case, accumulated training loads within the training weeks resulted in reduced HRex and increased RPE, whereas the relief period over the weekend resulted in an increase in HRex and a decrease in RPE. In addition to the short-term fluctuations, an overall decrease in HRex was observed throughout the training period that, taking into account the RPE scores, can be interpreted as a positive adaptation [increased (aerobic) fitness], and thus as an appropriate training periodization. When this observation is transferred to team sports, it highlights the importance of consistent scheduling of testing sessions (e.g., 2 days post game-day), as acute or short-term changes in load can significantly affect HRex response. Furthermore, it may be necessary to consider short-term and long-term changes at the same time when evaluating training programs. Otherwise, in the absence of continuous data, it might be challenging to separate the different types of response (i.e., strain, fatigue, recovery and adaptation) for the interpretation of long-term training responses.

In the second example, a semi-professional basketball player was monitored on a weekly basis using a submaximal shuttle run throughout 1.5 competitive seasons (**Figure 5**). During the preseason training periods, HRex was markedly reduced both times, likely reflecting positive adaptation. In contrast, in periods of reduced training loads (winter break during weeks 22–23 and off-season), increased HRex in combination with increased RPE indicated (partial) detraining and a loss of (aerobic) fitness. The time course of HRex and RPE response, during the first preparatory period and the beginning of the first season, highlights the importance of training context and multivariate analysis when interpreting long-term changes (see **Figure 5** text legend for details). Accordingly, we question some of the conclusions in the HR monitoring literature that show a so-called "counterintuitive" response in overreached athletes (reduced, rather than increased, HRex in fatigued or overreached athletes; Siegl et al., 2017) or "disagreement between studies" (similar changes in HR measures following endurance training periods leading to increased or decreased performance; Bellenger et al., 2016). Using this second example, we suggest that changes in HR measures should be interpreted primarily against the training context, rather than directly projected onto the constructs of fatigue or performance. Therefore, a (sustained) reduction of HRex due to a training period leading to overreaching (likely reduced performance due to fatigue) followed by an adequate relief period should be interpreted as a "typical" training response in the sense of a (positive) adaptation to increased training load. It should not be seen as an "inconsistent" or "conflicting" finding because a performance outcome measured at different times was increased or decreased. This interpretation goes in line with the fitness-fatigue model, as a performance outcome is a result of fitness and fatigue effects (Coutts et al., 2018). Accordingly, HRex should be interpreted as a fitness indicator rather than a marker of fatigue or performance.

FIGURE 5 | Long-term changes in exercise heart rate (HRex), rating of perceived exertion (RPE) and training load in a semi-professional basketball player (26-year-old, 3rd highest German basketball league) throughout 1.5 competitive seasons. HRex and RPE were assessed on a weekly basis, using a submaximal shuttle run during the warm-up (see Figure 1). Acute and chronic internal training loads were calculated over 1 and 4 weeks of training, respectively (Gabbett, 2016). The gray horizontal bar represents trivial changes from the baseline HRex (average of first four weeks during the first preseason) based on the smallest worthwhile change (SWC; Buchheit, 2014). During the first preseason, HRex displayed a continuously decreasing trend with a concomitantly increasing trend in RPE in response to consecutive weeks of high training load. Since this probably indicates overreaching (Table 1), a (sustained) reduction in training load seems reasonable. As HRex remains substantially reduced during the following months and RPE scores have fallen below the initial values, it can be assumed that the initially reduced load at the beginning of the competitive season allowed sufficient recovery and the training routine at moderate to high training loads can be resumed. In periods of pronounced relief, such as the 2-week winter break (weeks 22–23) and the offseason, there was a significant increase in HR and RPE in both cases. This likely indicates a loss of (aerobic) fitness through detraining, and calls for intensification or resumption of training.

### CONCLUSION

As previously suggested (Buchheit, 2014), in team sports, exercise-related measures (HRex, HRR) are probably superior to those under resting conditions (HRrest, HRVrest) as the former have more favorable signal-to-noise and cost-benefit ratios. Moreover, HRex is more reflective of (aerobic) fitness-related training responses than a surrogate marker of performance or fatigue. Therefore, a comprehensive (team sport) athlete monitoring system must incorporate multivariate approaches that further examine training context, fatigue, and sportspecific performance (Kellmann et al., 2018). When athlete monitoring is integrated into a decision-support system, numerous methodological considerations must be addressed throughout the decision-making process. It is necessary to interpret individual training responses by considering the measurement accuracy as well as the smallest worthwhile change. As outlined in this technology report, future studies should examine the usefulness of different analytical concepts and methods, as this represents a significant research deficit. Finally, the most appropriate analytical approaches must be implemented in software solutions by wearable manufacturer or software providers to improve the decision-making process in sports practice comprehensively. To provide a starting point, we have developed a conceptual framework to contextualize HR measures, focusing on the time course of training responses as well as training context, and illustrate its application for multivariate interpretation and decision-making using a heuristics approach.

### ETHICS STATEMENT

The investigations, from which the case studies were selected, were carried out in accordance guidelines of the Declaration of Helsinki. The protocols were approved by the local ethics committees of the Faculty of Sport Science of the Ruhr-University Bochum, Germany or the Ärztekammer des Saarlandes, Saarbrücken, Germany. All subjects gave written informed consent.

### AUTHOR CONTRIBUTIONS

CS prepared the original manuscript, figures and tables. FH, TW, AD, and AF assisted with writing and editing the manuscript, figures and tables. CS, TW, MK, TM, MP, and AF conceived and designed the original observational investigations, from which the case-examples were drafted.

#### FUNDING

The current study was funded by the German Federal Institute of Sport Science. The research was realized in the project REGman—Optimization of Training and Competition: Management of Regeneration in Elite Sports. We acknowledge support by the DFG Open Access Publication Funds of the Ruhr-Universität Bochum.

### REFERENCES


#### ACKNOWLEDGMENTS

The authors thank Dr. Anne Hecksteden for her constructive comments during the preparation of the manuscript. We would also like to thank all colleagues and students who participated in the data collection, which provided the basis for the analyzes presented, as well as all athletes participating in our investigations.


Fellmann, N. (1992). Hormonal and plasma volume alterations following endurance exercise. Sports Med. 13, 37–49. doi: 10.2165/00007256-199213010-00004


recovery in athletes. Int. J. Sports Physiol. Perform. 12, 1137–1142. doi: 10.1123/ijspp.2016-0120


Israel, S. (1982). Sport und Herzschlagfrequenz. Leipzig: Johann Ambrosium Barth.


vagal outflow. Effects of various respiratory patterns. Clin. Physiol. 21, 365–376. doi: 10.1046/j.1365-2281.2001.00337.x


weeks in elite soccer players. Int. J. Sports Physiol. Perform. 11, 947–952. doi: 10.1123/ijspp.2015-0490


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Schneider, Hanakam, Wiewelhove, Döweling, Kellmann, Meyer, Pfeiffer and Ferrauti. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Accurate Estimation of Running Temporal Parameters Using Foot-Worn Inertial Sensors

Mathieu Falbriard<sup>1</sup> \*, Frédéric Meyer<sup>2</sup> , Benoit Mariani<sup>3</sup> , Grégoire P. Millet<sup>2</sup> and Kamiar Aminian<sup>1</sup>

<sup>1</sup> Laboratory of Movement Analysis and Measurement, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, 2 Institute of Sport Sciences, University of Lausanne, Lausanne, Switzerland, <sup>3</sup> Gait Up S.A., Lausanne, Switzerland

The aim of this study was to assess the performance of different kinematic features measured by foot-worn inertial sensors for detecting running gait temporal events (e.g., initial contact, terminal contact) in order to estimate inner-stride phases duration (e.g., contact time, flight time, swing time, step time). Forty-one healthy adults ran multiple trials on an instrumented treadmill while wearing one inertial measurement unit on the dorsum of each foot. Different algorithms for the detection of initial contact and terminal contact were proposed, evaluated and compared with a reference-threshold on the vertical ground reaction force. The minimum of the pitch angular velocity within the first and second half of a mid-swing to mid-swing cycle were identified as the most precise features for initial and terminal contact detection with an inter-trial median ± IQR precision of 2 ± 1 ms and 4 ± 2 ms respectively. Using these initial and terminal contact features, this study showed that the ground contact time, flight time, step and swing time can be estimated with an inter-trial median ± IQR bias less than 12 ± 10 ms and the a precision less than 4 ± 3 ms. Finally, this study showed that the running speed can significantly affect the biases of the estimations, suggesting that a speed-dependent correction should be applied to improve the system's accuracy.

#### Edited by:

Robert Aughey, Victoria University, Australia

#### Reviewed by:

Alessandro Tonacci, Istituto di Fisiologia Clinica (IFC), Italy Theodore Francis Towse, Grand Valley State University, United States

> \*Correspondence: Mathieu Falbriard mathieu.falbriard@epfl.ch

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 15 November 2017 Accepted: 04 May 2018 Published: 12 June 2018

#### Citation:

Falbriard M, Meyer F, Mariani B, Millet GP and Aminian K (2018) Accurate Estimation of Running Temporal Parameters Using Foot-Worn Inertial Sensors. Front. Physiol. 9:610. doi: 10.3389/fphys.2018.00610 Keywords: running, inertial measurement unit (IMU), validation study, temporal parameters, contact time

## INTRODUCTION

In running, two temporal events (initial contact or touchdown and terminal contact or toe-off) need to be detected in order to extract the main temporal parameters of each step: cadence, contact time, flight phase duration, and swing phase duration. Initial contact (IC) is defined as the time instant when the foot initiates contact with the ground at landing. Terminal contact (TC) corresponds to the end of the pushing phase, when the foot ends contact with the ground. The intrinsic relationships between the different inner-stride temporal parameters and running speed, shoe configuration, running economy, running performance, injury risks have been widely investigated. Therefore, an accurate detection of IC and TC are paramount.

In the literature, the majority of studies that investigated temporal parameters in running have used force plates, contact mats or high speed cameras as reference measurement system (Viitasalo et al., 1997; Garcia-Lopez et al., 2005; Leitch et al., 2011; Ogueta-Alday et al., 2013; Handsaker et al., 2016). Although force plates are accepted as state-of-the-art systems for temporal events detection in running, they suffer from several limitations. In fact, the detection timing of IC and TC on the

vertical ground reaction force depends on the filtering method and on the detection threshold used (Cronin and Rumpf, 2014). Moreover, their lack of portability and their setup complexity restrict their use for in-laboratory experiments, which is a major drawback given the in-field nature of the running activity.

Thanks to the recent improvements in MEMS inertial sensors, their low production cost, their decrease in weight and size and their ability to measure kinematics over large periods of time, inertial sensors are now widely accepted systems to analyze human locomotion. In fact, studies on gait analysis have shown that inertial measurement units (IMUs), when used with stateof-the-art algorithms, can reliably fill the gap between subjective observational analysis and bulky in-laboratory installations (Mariani et al., 2012, 2013). In running, inertial sensors have predominantly been used to detect inner-stride temporal events and derive temporal parameters estimations from them. Some studies have used IMUs on the upper body (Bergamini et al., 2012; Norris et al., 2014), other focused on the shank/tibia segments (Mercer et al., 2003; Crowell et al., 2010; McGrath et al., 2012) and some used foot-worn IMUs (Strohrmann et al., 2011; Chapman et al., 2012; Lee et al., 2015; Reenalda et al., 2016; Brahms, 2017). However to the authors' knowledge, only a few studies have reported on the validity of their algorithms when compared with state-of-the-art reference system. In Ammann et al. (2016), CT estimations were compared between shoe laces worn IMUs and a high-speed video camera for 132 steps of 12 athletes at running speeds within 22.3 ± 5.8 km/h. Because data processing was done by a proprietary software, the algorithms used to estimate CT were not described in the methods. In Weyand et al. (2001) the authors used acceleration peak from a foot-worn accelerometer to detect IC and TC and compared their estimation of CT with a treadmill-mounted force plate. The exact method used to detect IC and TC is not documented in this study and only the bias (mean ± STD) of the 165 trials is provided in the results. There is therefore, no information about the precision of the proposed system. For all other methods, where no validation was reported, there is no evidence that the parameters measured are within an acceptable error range and that this error range does not change with the running conditions.

Therefore the aim of the present study was to investigate different algorithms to detect IC and TC from different features measured by foot-worn IMU kinematic signals, and estimate the main inner-stride temporal parameters. The performance metrics (bias and precision) of each algorithm were assessed in comparison with a reference system (instrumented force plate treadmill), that allowed a validation of inner-stride temporal parameters over a high number of steps and a large range of running speeds.

### MATERIALS AND METHODS

### Measurement Protocol

In total, 41 healthy adults (13 females and 28 males, age 29 ± 6 years, weight 70 ± 10 kg, height 174 ± 8 cm, running weekly 2.1 ± 1 h, 11 being affiliated to a running club) running at least once a week and without any symptomatic musculoskeletal injuries volunteered to participate to this study. The study was approved by the local ethic committee (CCER-VD 2015-00006), was conducted according to the declaration of Helsinki, and written informed consent was obtained from all the participants prior to the measurements. Each participant was asked to run multiple trials of 30 s each, wearing their usual shoes, on an instrumented treadmill, starting at 8 km/h and increasing by 2 km/h up to their maximum speed. A 6 min familiarization period (Lavcanska et al., 2005) was carried out on the treadmill and served as warm-up for the participants. The participants were free to decide on the rest duration in-between the trials.

## Wearable Device and Temporal Features Estimation

#### IMU Based System

One inertial measurement unit (IMU) (Physilog 4, Gait Up, Switzerland, weight: 19 g, size: 50 × 37 × 9.2 mm) was worn on the dorsum of each foot and measured both 3D acceleration and 3D angular velocity at 500 Hz. Each IMU was affixed to the foot using an adhesive strap around the shoe. The range of the accelerometer was set to ±16 g and ±2000◦ /s for the gyroscope.

#### Functional Calibration

In order to use single axes of the inertial sensors in a meaningful and reproducible manner, we designed a functional calibration method to automatically align the technical frame of the footworn IMUs with the functional frame of the foot. The functional frame of the foot was defined as in **Figure 1**: the origin is at the base of the second metatarsal bone, Y<sup>F</sup> is orthogonal to the horizontal plane defined by the ground surface, X<sup>F</sup> lies on the horizontal plane projection of the line joining the center of the calcaneus bone and the head of the second metatarsal bone, pointing distally, and Z<sup>F</sup> is orthogonal to the XFY<sup>F</sup> plane pointing to the right-hand side of the subject. The functional calibration process requires static standing periods in order to

FIGURE 1 | Shows both the technical frame of the foot-worn IMU (XT, YT, ZT) and the functional frame of the foot (XF, YF, ZF). The 3 by 3 rotation matrix R aligns the IMU's technical frame with the functional frame of the foot.

align Y<sup>T</sup> with Y<sup>F</sup> using the gravitational acceleration measured by the IMU. Then, using the hypothesis that most of the foot's angular rotations occur along the Z<sup>F</sup> axis while running, we used Principal Component Analysis to find the rotation angle around the Z<sup>T</sup> axis which aligns Z<sup>T</sup> with ZF. Finally, X<sup>T</sup> is the result of the cross-product <ZT, XT>.

#### Gait Cycle Detection

fphys-09-00610 June 8, 2018 Time: 18:25 # 3

Using the cyclic nature of the running movement, an algorithm was designed to segment a complete trial into mid-swing to midswing cycles. Following previous work on gait analysis (Aminian et al., 2002; Sabatini et al., 2005), we hypothesized that the pitch angular velocity (Ωp) of the foot is maximum at mid-swing. To enhance and detect the mid-swing peak, a 2nd-order Butterworth low-pass filter was designed with an adaptive cut-off frequency. The cut-off frequency was set at 60% of the stride frequency estimated using an auto-correlation method over a 5 s sliding window. This adaptive filtering method was used to cope with the range of running speeds used in this study. The length of the sliding window (5 s) was selected empirically and based on our observations of the signals.

#### Temporal Features Detection

The estimation of inner-stride phases relies on two main temporal events: initial and terminal contact. The initial contact (IC) event corresponds to the time instant when the foot initiates contact with the ground at landing. The terminal contact (TC) event, also known as toe-off, corresponds to the end of the pushing phase when the toes terminates contact with the ground. For each cycle, we identified kinematic features that seemed to be valid candidates to detect IC and TC. Such features varied from global maximum (MAX), local maximum (MAXloc), global minimum (MIN), local minimum (MINloc) and zero crossing (ZeroX) time samples and were detected on the following signals: the pitch angular velocity (Ωp: angular velocity around ZF), the pitch angular acceleration (Ω'p), the pitch angular jerk or first derivative of the pitch angular acceleration (Ω"p), the roll angular velocity (Ω<sup>r</sup> : angular velocity around XF), the norm of the angular velocity (||Ω||), the vertical axis acceleration (Avert: acceleration along YF), the longitudinal axis of the acceleration (Along: acceleration along XF), the coronal axis acceleration (Acoro: acceleration along ZF), the norm of the acceleration (||A||) and the first derivative of the acceleration norm or jerk (||A||). In some cases, an empirically chosen threshold was also used to improve the feature detection (e.g., < −100◦ /s). All these detection rules are detailed in **Table 1** and illustrated in **Figure 2**. Prior to the detection, the acceleration and angular velocity signals were filtered using a 2nd-order low-pass Butterworth filter (fc = 30 Hz) to minimize the influence of the IMU fixation artifacts and a temporal estimation of mid-stance was carried out for each gait cycle in order to separate the detection zones for IC and TC. The detection zone for IC was set as the period between the first zero-crossing of the pitch angular velocity (Ωp) and mid-stance. For TC, the detection zone was set as the period between mid-stance and the last zero-crossing of the pitch angular velocity. Mid-stance was set as the time instant when the angular velocity norm (||Ω||) is minimum within the 30–45% time-range of each mid-swing to mid-swing cycle. Finally, the IC and TC events of left and right foot steps were combined in order to estimate for each step i the ground contact time (CT), the flight time (FLT), the swing time (SWT) and the step time (SPT) using the following relations:

$$\rm{CT}\_i = \rm{TC}\_i \, - \quad \rm{IC}\_i \tag{1}$$

$$\text{FLT}\_{\text{i}} = \text{IC}\_{\text{i}+1} - \text{TC}\_{\text{i}} \tag{2}$$

$$\text{SWT}\_{\text{i}} = \text{IC}\_{\text{i}} + \text{ $\tau$ } \quad \text{ --TC}\_{\text{i}} \tag{3}$$

$$\text{SPT}\_{\text{i}} = \text{IC}\_{\text{i}+1} - \text{IC}\_{\text{i}} \tag{4}$$

#### Reference System and Temporal Features Force Plate

This study used an instrumented treadmill (T-170-FMT, Arsalis, Belgium) sampling at 1000 Hz as reference system for the validation. The force plate system and the inertial sensors were electronically synchronized using a 5 V pulse triggered manually and recorded on each system while IMUs were synchronized with each other's using radio frequencies. To reduce the noise inherent to the treadmill's vibrations, we first applied, on the vertical ground reaction force (GRF) signal, a 2nd-order stopband Butterworth filter with edge frequencies set to 25 and 65 Hz. The filter configuration was chosen empirically to obtain a satisfactory reduction of the oscillations observed during flight phases (i.e., subject not in contact with the treadmill) while minimizing its widening effect during ground contact timeS.

#### Temporal Features Detection

IC and TC events were detected using a threshold on the filtered vertical GRF signal, setting the first threshold-crossing occurrence as IC and the second as TC for each step. As previous studies (Weyand et al., 2001; Cronin and Rumpf, 2014) used different reference thresholds, we have decided to investigate the effect of eight reference thresholds on the validation results. Four thresholds were set to 20, 30, 40, and 50 N, independently of the subjects' body weight (BW) and four others were set to 3, 5, 7, and 9 %BW. Finally, we combined IC and TC events to find the reference inner-stride phases durations (CT, FLT, SWT, and SPT) as in Equations 1–4.

#### Statistical Analysis and Error Estimation

In order to avoid developing algorithms that over-fits our data set and would therefore bias the results, first 10 subjects were randomly selected and dedicated to the development set while the remaining subjects were only used as the validation set. The design of the algorithms described in Section "Wearable Device and Temporal Features Estimation" was conducted using solely the data from the development set. No algorithms debugging was done over signals from the validation set.

To evaluate the error of the proposed system against the reference force plate, we computed for each temporal feature, the


TABLE 1 | Summary of the features used on the inertial sensors signals to detect initial contact (Ic) and terminal contact (Tc).

IC candidates are identified by kj with j ∈ {1.. 12} and TC candidates are identified by tj with j ∈ {1.. 9}. The features presented in this table were used in the respective detection zone of IC and TC.

FIGURE 2 | Features used on the kinematic signals recorded by the foot-worn inertial sensors. IC candidates are identified by kj with j ∈ {1 . . . 12} and TC candidates are identified by tj with j ∈ {1 . . . 9}. The vertical gray dashed lines show the limits of the detection zones for IC and TC candidates. The signals showed in this figure belong to the same step and are represented during one mid-swing to mid-swing cycle.

bias (intra-trial mean) and precision (intra-trial STD) for all steps within a trial. We then combined the results from each trial and computed the median and IQR of both the bias and precision over all trials. These two steps resulted in four inter-trial statistics per temporal feature for both sets (development and validation sets): bµ is the inter-trials median bias, bσ is the inter-trials IQR of

the bias, σ<sup>µ</sup> is the inter-trials median precision and σ<sup>σ</sup> is the intertrials IQR of the precision. Note that we have used the median and IQR functions for the inter-trial statistics as the intra-trial bias and precision were not normally distributed.

A similar method was used for the inner-stride phases. However, to avoid having a large number of candidates for each parameter (12 IC candidates <sup>∗</sup> 9 TC candidates = 108 possible pairs of candidates for each phase estimation), we have decided to keep only the three most precise candidates for IC, the three most precise candidates for TC and to combine them into 9 pairs of estimates for CT, FLT, SWT, and SPT. Then, similarly, the inter-trials bias (bµ, bσ) and the inter-trial precision (σµ, σσ) were evaluated. Precision (i.e., intra-trial STD) was chosen as selection criteria for IC and TC candidates as it informs about the range of random errors made by the system among the steps of a trial. The bias, however can potentially be decreased using an appropriate model of the errors.

To investigate if the speed affects the intra-trial bias of the IC and TC candidates, we used the Kruskal–Wallis test with a significance level of 0.05. We preferred this non-parametric test to the one-way ANOVA because the Lilliefors test rejected, in most cases, the hypothesis that the intra-trial bias were normally distributed among the running speeds. Consequently, in this study, the null hypothesis was accepted only if the rank of the biases were equal among the running speeds. The same hypothesis has also been tested on the precision. Note that this test was applied on the complete data set (development and validation set) as there was no speed-depend adaptations of our detection algorithms.

Finally, we used Bland-Altman plots and the best linear fit, in the least squares sense, to show the trend in the CT estimation errors on the development set. Finding the best linear fit on the development set, allows to further use the linear coefficients to correct the inter-steps errors in the validation set. The inter-steps errors refers to the error of all steps within a group, independently of the trial they belong to. The inter-steps bias is defined as the mean error of all steps and the inter-steps precision as the STD of the error of all steps.

### RESULTS

### Temporal Events Detection

Out of the 41 participants, 35 were kept for the evaluation of the proposed system. Within the 6 participants removed, 2 were removed because the data loss rate was above 20% and 4 were removed because of calibration errors of the systems. The results for the development set and the validation set were computed from 10 subjects with 59 trials (4836 steps) and 25 subjects with 146 trials (12092 steps), respectively. Trials with running speed at 8 km/h were removed due to the presence of steps with double support for some subjects that makes the detection of IC and TO impossible with the GRF of the reference system. The minimum number of steps per trial was 67 and the maximum number of steps per trial was 105 given that the running speed recorded ranged from 10 to 20 km/h. **Figure 2** illustrates the features used to detect IC and TC with the vertical gray dashed lines showing the limits of the detection zones for IC and TC candidates. The signals showed in **Figure 2** belong to the same step and are represented during one mid-swing to mid-swing cycle.

**Table 2** summarizes the IC and TC events detection error for development and validation sets, and for each kinematics feature candidate (kj and tj) extracted by applying the specific detection rule on the kinematics signal. The results are obtained by using the reference value estimated with a threshold at 7 %BW on the vertical GRF. The differences shown in the table were computed such that a positive difference indicates that the event was detected later in the signal than the reference. The three most precise IC candidates (median ± IQR) with respect to the results from the validation set are: k<sup>1</sup> (2 ± 1 ms), k<sup>3</sup> (2 ± 1 ms) and k<sup>8</sup> (3 ± 2 ms). The three most precise TC candidates (median ± IQR) with respect to the results from the validation set are: t<sup>1</sup> (4 ± 2 ms), t<sup>4</sup> (4 ± 2 ms) and t<sup>5</sup> (4 ± 2 ms). One TC candidate shows a noticeably lower inter-trial bias IQR: t<sup>5</sup> with b<sup>σ</sup> = 7 ms.

**Figure 3** shows the influence of the running speed on the IC and TC inter-trials bias for the features (k1, k3, k8) and (t1, t4, t5). The graph was generated using the complete data set (development and validation set) as it is solely used for visualization purpose. When the trials are grouped according to the running speed, the Kruskal–Wallis test applied on the biases shows that the running speed significantly affects the biases in k<sup>8</sup> (p = 0.001), t<sup>1</sup> (p < 0.001), t<sup>4</sup> (p < 0.001), t<sup>5</sup> (p < 0.001) and precision in t<sup>1</sup> (p < 0.001), t<sup>4</sup> (p = 0.014) and t<sup>5</sup> (p < 0.001).

#### Inner-Stride Phases Estimation

**Table 3** lists absolute and relative errors obtained for the estimations of CT, on the validation set, when compared with the force plate estimation found using the reference threshold at 7 %BW. The bias and precision obtained when comparing the other force plate thresholds with the 7%BW reference threshold are also listed at the end of **Table 3**.

The most precise pair of IC and TC candidates for CT was (k1, t1) with an inter-trial median ± IQR precision of 4 ± 2 ms or 1.8 ± 0.9%. CT estimators (k1, t5) and (k3, t5) both have the lowest absolute inter-trial IQR of the biases (bσ = 12 ms) while (k1, t5) has the lowest IQR in relative values (b<sup>σ</sup> = 5.0%). The reference values observed in this study ranged from 132 to 354 ms for CT, from 29 to 238 ms for FLT, from 367 to 613 ms for SWT and from 254 to 435 ms for SPT. **Table 4** shows the relative and absolute errors for FLT, SWT, and SPT estimations for both (k1, t1), (k1, t5) and (k3, t5) pairs.

Finally, **Figure 4** shows the Bland-Altman plot for the CT estimation of the (k1, t1) and (k1, t5) estimators. The orange dashed line represent the best linear fit according to the least squares method. These graphs were computed using all the steps in the development set (N = 4836), independently of the trials.

### DISCUSSION

In this study we proposed, evaluated and compared how different algorithms based foot-worn IMU kinematic features performed in detecting IC and TC during running and in estimating the


TABLE 2 | List of time differences for all the IC and TO candidates, computed over 4836 and 12092 steps for the development set and the validation set, respectively.

Time differences are expressed in milliseconds (ms). The reference system used in this table is the vertical GRF with a threshold set at 7% BW. IC candidates are identified by kj with j ∈ {1.. 12} and TC candidates are identified by tj with j ∈ {1.. 9}. "b" and "σ" are the abbreviations for accuracy (intra-trial mean error) and precision (intra-trial STD of the error), respectively, while suffix "µ" and "σ" represent the median and the IQR over all the trials.

FIGURE 3 | Initial contact (left graph) and terminal contact (right graph) inter-trials bias for the features (k1, k3, k8) and (t1, t4, t5), respectively. The graph was computed using the complete data set (development set and validation set) and using the reference threshold on the vertical GRF at 7 %BW. Each group of speed contains N = 35 trials except the 20 km/h group where N = 30.


TABLE 3 | List of the duration differences for CT estimation in the validation set (N = 146 trials, 12092 steps) when compared to the force plate estimation using the reference threshold set at 7 %BW.

The first nine rows show the estimation errors of the three most precise candidates for IC and TO detection arranged as pairs while the last seven rows show the difference observed when using other reference thresholds on the vertical GRF signal. "b" and "σ" are the abbreviations for bias (intra-trial mean error) and precision (intra-trial STD of the error), respectively, while subscript characters µ and σ represent the median and the IQR over all the trials in the validation set.

TABLE 4 | Flight phase duration (FLT), swing phase duration (SWT) and step time duration (SPT) estimations errors for the (k1, t1), (k1, t5) and (k3, t5) candidates when a reference threshold at 7 %BW is used on the vertical GRF.


The results were computed from the data in the validation set (N = 146 trials, 12092 steps). "b" and "σ" are the abbreviations for bias (intra-trial mean error) and precision (intra-trial STD of the error), respectively, while subscript characters µ and σ represent the median and the IQR over all the trials in the validation set.

main inner-stride temporal parameters: CT, FLT, SWT, and SPT. The errors (displayed in **Table 2**) show that the bias and precision for IC and TC could reach very low values depending on the kinematic features used. Therefore by considering the most performant kinematic features an accurate and precise estimation of inner-stride temporal parameters was proposed and validated against a force plate as reference system.

**Table 3** shows that, the three most precise IC candidates (k1, k<sup>3</sup> and k8) and TC candidates (t1, t4, and t5) can be combined to provide a precise estimation of ground contact time (CT). The most precise pair of features obtained from the two minimums of pitch angular velocity in IC and TC detection zones (k1, t1) had an inter-trials median ± IQR precision of 4 ± 2 ms (1.8 ± 0.9%). However the accuracy of the t<sup>1</sup> candidate is speed dependent (p < 0.001). This explains the relatively high inter-trial IQR of the biases (b<sup>σ</sup> = 17 ms) of CT for the (k1, t1) candidate. In **Figure 3**, the median of the biases for the t<sup>1</sup> (as well for t<sup>4</sup> and t5) seem to linearly decrease as the speed increases. However, even though the Kruskal–Wallis test shows that speed also affect t<sup>5</sup> (p < 0.001), the range of the median biases is approximately two times shorter for t<sup>5</sup> (10 ms) than for t<sup>1</sup> (21 ms).

To reduce the effect of the running speed on the bias, the minimums of pitch angular velocity in IC zone and the maximum of vertical acceleration in TC zone, i.e., (k1, t5) candidate can

be used. Although it is slightly less precise on the detection of CT, the results in **Table 4** show better results in the estimation of FLT for both the accuracy and precision. Given that the CT decreases as speed increase, a measure of the CT itself already contains information about the running speed. Therefore, using the coefficients from the best linear fit (development set data) showed on the Bland-Altman plots in **Figure 4**, the validation set inter-trials median ± IQR bias decreased to −2 ± 14 ms (−1 ± 6.2%) and 1 ± 10 ms (0.3 ± 4.9%) for the (k1, t1) and the (k1, t5) pairs, respectively. For both the (k1, t1) and the (k1, t5) candidates, the precision did not change after the aforementioned correction. Note that the outliers observed on the top graph of **Figure 4** correspond to the detection errors of the t<sup>1</sup> feature due to a second minimum happening later in the pitch angular velocity signal.

Moreover, **Table 2** reveals that the most precise features for IC detection were found on the measurements from a single axis of the IMUs (k1, k3, and k8). This observation emphasizes on the importance of the functional calibration which aligns the technical frame of the inertial sensors with the biomechanically meaningful axes of the foot.

**Table 2** also shows that, in general, the kinematic features used in this study tend to better detect IC than TC. Considering that the IC event comes with a landing impact, while no abrupt variation in the foot's motion occurs at TC, the odds of missing the exact instant of TC are higher. Moreover, the vertical force applied by the foot on the ground decreases drastically at the end of the CT although foot is still in contact with the ground leading to a potentially early detection of TC. Similar observations were reported by Weyand et al. (2001). In fact, we observed that the 3%BW detection threshold showed a bias (b<sup>µ</sup> ± bσ) of −2 ± 2 ms and 7 ± 4 ms for IC and TC when compared to the 7%BW reference threshold. For both IC and TC, the bias was the highest when compared to a force threshold set at 20N. These results show that the detection accuracy of the force plate for TC, is more sensitive to the variations in the reference threshold than IC.

Lastly, the inter-step errors of the k<sup>1</sup> feature seem to follow a bimodal distribution when including all step of the validation set, independently of the trials (N = 12092 steps). This implies that there might be an additional source of variance other than running speed that affects the detection of IC. Because the k<sup>1</sup> feature is based on the angular velocity of the foot at landing, we assume that the type of foot-strike employed (fore-foot strike or rear-foot strike) could also introduce an error in the detection of IC. Further study would be required to evaluate how foot-strike angle influences detection accuracy and precision of temporal events during running. In addition, determining the applicability of the algorithms developed for level running in this study to uphill or downhill running would also need further study.

This study used a different method to express the CT errors than in Ammann et al. (2016). In the aforementioned study, the authors reported an inter-steps bias (N = 132 steps) of −1.9 ms

(−1.3%) and a random error (95% confidence interval) of 17.4 ms (6.1%) for CT. The inter-steps bias and precision for the (k1, t1) pair showed comparable results. In fact, the validation set intersteps bias (N = 12092 steps) was −2 ms (−0.5%) for CT, after applying the linear fit correction showed in the Bland-Altman plots in **Figure 4**. However, the inter-steps random error (95% confidence interval) was slightly higher (23 ms) for the (k1, t1) pair than in Ammann et al. (2016). This can be explained by the fact that t<sup>1</sup> precision is affected by speed (p < 0.001) and that the range of speed in this study (10 – 20 km/h) is larger than in Ammann et al. (2016) (22.3 ± 5.8 km/h). In Weyand et al. (2001), the authors reported a bias (mean ± STD) of 14.6 ± 0.5% when computed over 165 trials. These results are in accordance with the biases showed in **Table 3**.

To the authors' knowledge this study is the first to quantitatively demonstrate how, when using foot-worn IMUs in running, the choice of kinematic features affect the detection accuracy and precision of IC, TC and the inner-stride parameters derived from these two events. Consequently, it is important that researchers report on the methods applied to detect IC and TC events as it provides some information about the confidence interval of the measurements.

#### CONCLUSION

This study aimed to validate, against a gold standard reference system, the performance of several algorithms using foot-worn inertial sensors to detect running gait temporal events and estimate inner-stride phases duration. The results highlighted the importance of suitable kinematic signals and features to avoid large errors in detecting initial and terminal contact. The two

#### REFERENCES


minimum values of the pitch angular velocity in the first half and second half of a mid-swing to mid-swing cycle provide the best estimation of IC and TC. Also the maximum value of vertical acceleration during the second half mid-swing to midswing cycle provides a good estimation of TC which is less dependent on running speed. Using these initial and terminal contact features, we showed that the ground contact time, flight time, step and swing time can be estimated with an inter-trial median ± IQR bias less than 15 ± 12 ms and the inter-trial median ± IQR precision less than 4 ± 3 ms. Running speed could have significant impact on the biases of the estimations and therefore the knowledge about the speed could improve the results. Further studies should investigate the effect of the footstrike angle on the errors made by the features during initial contact.

#### AUTHOR CONTRIBUTIONS

MF, FM, BM, GM, and KA conceptualized the study design. MF and FM conducted the data collection. MF designed the algorithms and KA supervised the study. MF, FM, BM, GM, and KA contributed to the analysis and interpretation of the data. MF drafted the manuscript, all other authors revised it critically. All authors approved the final version, and agreed to be accountable for all aspects of this work.

### FUNDING

This study was supported by the Swiss CTI grant no. 17664.1 PFNM-NM.



**Conflict of Interest Statement:** BM was employed by company Gait Up.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Falbriard, Meyer, Mariani, Millet and Aminian. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Measurement, Prediction, and Control of Individual Heart Rate Responses to Exercise—Basics and Options for Wearable Devices

#### Melanie Ludwig<sup>1</sup> \*, Katrin Hoffmann<sup>2</sup> , Stefan Endler <sup>3</sup> , Alexander Asteroth<sup>1</sup> and Josef Wiemeyer <sup>2</sup>

<sup>1</sup> Department of Computer Sciences, Institute of Technology, Resource and Energy-Efficient Engineering, Bonn-Rhein-Sieg University of Applied Sciences, St. Augustin, Germany, <sup>2</sup> Department of Human Sciences, Institute of Sport Science, Technical University of Darmstadt, Darmstadt, Germany, <sup>3</sup> Department of Computer Science in Sports, Institute of Computer Science, Johannes Gutenberg University of Mainz, Mainz, Germany

#### Edited by:

Billy Sperlich, Universität Würzburg, Germany

#### Reviewed by:

Can Ozan Tan, Harvard Medical School, United States Fabien Andre Basset, Memorial University of Newfoundland, Canada

> \*Correspondence: Melanie Ludwig melanie.ludwig@h-brs.de

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 25 November 2017 Accepted: 04 June 2018 Published: 25 June 2018

#### Citation:

Ludwig M, Hoffmann K, Endler S, Asteroth A and Wiemeyer J (2018) Measurement, Prediction, and Control of Individual Heart Rate Responses to Exercise—Basics and Options for Wearable Devices. Front. Physiol. 9:778. doi: 10.3389/fphys.2018.00778 The use of wearable devices or "wearables" in the physical activity domain has been increasing in the last years. These devices are used as training tools providing the user with detailed information about individual physiological responses and feedback to the physical training process. Advantages in sensor technology, miniaturization, energy consumption and processing power increased the usability of these wearables. Furthermore, available sensor technologies must be reliable, valid, and usable. Considering the variety of the existing sensors not all of them are suitable to be integrated in wearables. The application and development of wearables has to consider the characteristics of the physical training process to improve the effectiveness and efficiency as training tools. During physical training, it is essential to elicit individual optimal strain to evoke the desired adjustments to training. One important goal is to neither overstrain nor under challenge the user. Many wearables use heart rate as indicator for this individual strain. However, due to a variety of internal and external influencing factors, heart rate kinetics are highly variable making it difficult to control the stress eliciting individually optimal strain. For optimal training control it is essential to model and predict individual responses and adapt the external stress if necessary. Basis for this modeling is the valid and reliable recording of these individual responses. Depending on the heart rate kinetics and the obtained physiological data, different models and techniques are available that can be used for strain or training control. Aim of this review is to give an overview of measurement, prediction, and control of individual heart rate responses. Therefore, available sensor technologies measuring the individual heart rate responses are analyzed and approaches to model and predict these individual responses discussed. Additionally, the feasibility for wearables is analyzed.

Keywords: wearable sensors, heart rate modeling, heart rate control, heart rate prediction, phenomenological approaches, training monitoring, load control

**230**

## 1. INTRODUCTION

The use of wearable devices ("wearables") as tools for training or activity tracking has increased considerably. More precise and accurate data acquisition due to improved sensor technology, advanced usability, and portability due to miniaturization and more powerful data analysis due to increased processing power allows the industry to introduce new and improved wearables (Chan et al., 2012; Mukhopadhyay, 2015). Therefore, wearables can be used as "every day" devices providing the user with detailed and individual information about physical activity (PA), fitness level, and physiological responses. Especially for nonathletes, wearables are claimed to be effective and efficient tools for physical training. "Find your own Fit" (Fitbit.com), "beat yesterday" (garmin.com), "listen to your body" (POLAR), or "Eat. Sleep. Move. Better" (Jawbone) are some of the slogans of well-known distributors of those wearables. In this context, especially the heart rate (HR) has become an often used indicator for individual cardiovascular strain during training. Exercise according to defined HR zones is already well established in professional and recreational endurance training. Several wearable devices do not only measure a person's heart rate, but might even give visual, acoustic, or vibro-tactile feedback if HR is outside a specified area. Most apps and devices are connected to web portals that provide a visualization of a subject's training data as well as more or less detailed recommendations for training.

The wide-spread use of HR is not surprising since the pumping action of the human heart is the driving force of blood circulation of the cardiovascular system. The main tasks of this system are to supply the cells with oxygen and nutrients, to remove carbon dioxide and metabolites, and to transport hormones, vitamins, and enzymes (Weiss and Jelkmann, 1989). This is especially apparent in the physical training process, when a defined external stimulus (i.e., load, pedal rate, velocity) is applied to the human body. The increased energy demand of the working muscles causes an increase in cardiovascular functions. Depending on the extent of individual strain (e.g., sleep or activity conditions) the heart has to sensitively adjust the ejection of blood to fulfill different demands of the human body. In contrast to other indicators of cardiovascular strain (e.g., stroke volume (SV), oxygen uptake (VO2), release of carbon dioxide (VCO2), metabolites as lactate or urea, and hormones) HR can be recorded non-invasive, with minimal technical effort, and without the constraints of laboratory conditions.

However, HR responds individually to physical stress or training load. Due to a high amount of internal (i.e., training status, genetics, mood) and external (i.e., environmental conditions, nutrition, water supply) influencing factors, the HR response can even fluctuate in the same individual during a single training session (Bunc et al., 1988; Ewing et al., 1991; Boushel et al., 2001; Achten and Jeukendrup, 2003; Bouchard and Hoffman, 2011; Hoffmann et al., 2016). By recording every single heartbeat, a high variation of longer and shorter heart cycles can be observed. This heart rate variability (HRV) is to a large extent modulated by the stimulating sympathetic and repressing parasympathetic influences of the Autonomous Nervous System (ANS) (Lacey, 1956; Stauss, 2003). Integrated in a variety of complexly nested regulatory mechanisms and reflexes, the antagonistic influences of ANS are modulated according to afferences from sensors that are situated throughout the human body. These sensors measure, e.g., changes in blood pressure, blood volume, or partial pressure of CO<sup>2</sup> or O<sup>2</sup> in the blood.

To evoke training responses corresponding to defined training goals, it is necessary to elicit individual optimal cardiovascular responses to neither overstrain nor under challenge the training person. Therefore, it is essential to model and predict these individual responses. This is the prerequisite for effective and efficient training.

Although the complex influence of reflexes and mechanisms on heart performance has been studied for centuries (e.g., Starling, 1918; Brandfonbrener et al., 1955), modeling and predicting every single heartbeat is yet not possible. In particular, the unpredictability of HRV must be considered as a source of error in modeling.

Therefore, the following HR kinetics need to be considered for modeling acute responses to stress:


This review aims at giving an overview of measurement, prediction, and control of individual HR responses. Therefore, different sensor technologies measuring HR and their feasibility for wearables are analyzed. Afterwards, current models of acute, individual HR responses are addressed, and the implementation and use cases of these models are discussed.

### 2. MEASURING CARDIAC OUTPUT VIA HR

HR kinetics can provide valuable information about the individual responses and therefore the individual strain of the human body. However, valid and reliable measurement of HR is essential to convey the required information and to enable a valid modeling and prediction of these responses. The following chapter analyzes the reliability of different sensor technologies currently available. Additionally, their feasibility for wearables is discussed.

The exclusive measurement of HR as a body's physiological response to exercise is widely used in several areas and applications. For example, HR is used to estimate a person's exhaustion or degree of fatigue (Vautier et al., 1994; She et al., 2013), to indicate individual cardiovascular function (Carter et al., 2003; Borresen and Lambert, 2008), to monitor exercise parameters (e.g., condition, intensity, exercise duration) of single persons or whole groups (Sornanathan and Khalil, 2010; Lee et al., 2015), or to control the individual training (Weghorn, 2013; Hunt and Hunt, 2016).

Due to the central location of the heart inside the torso and the vulnerability of the cardio-respiratory system, heart functions are often measured indirectly by acquiring signals that are caused by these functions. One possibility to measure cardiac output is by assessing SV. Although available measurement technologies (i.e., echocardiography, thermodilution, or direct Fick-method Smyth et al., 1984) show high reliability and validity and provide detailed information about the individual performance of the heart, none of them is suitable to be used during physical training. All described methods and techniques require a clinical setting and preferably a stationary participant.

An alternative way to measure cardiac output is by registering the individual HR or the electric and mechanical effects caused by the heartbeat. Due to the technological progress, new sensors and technologies for reliable and valid measurement of HR are available. Additionally, the sensors available so far still improve in quality and feasibility and allow for a more exact representation of the HR signal. At present, the following measuring technologies are used (see **Table 1**):


The gold standard technique for measuring HR is by quantifying the changes of potentials that are caused by the excitation conduction along a myocardial pathway. This conduction produces electrical potentials that can be registered on the skin using an electrocardiograph. In general, 12 electrodes are arranged at defined sites on the body. However, the obtained electrocardiogram (ECG) is only an indicator for the process of excitation. It does not provide information about the actual contraction work of the heart. The application procedure is time consuming and complicated. Therefore, complex knowledge about medical procedures and a clinical setting are essential to obtain valid information. An appropriate and reliable integration in wearables is not feasible so far. A more common use of the electrocardiography are HR breast belts, which also register varying electrical potentials. In contrast to the ECG, only two electrodes are used. The belt can be attached to the thorax. The recorded RR intervals are used to calculate HR. Applied correctly, these belts show high correlation of 0.85–0.99 to the ECG (Weippert et al., 2010). As the sensors need direct skin contact, participants might feel discomfort to undress for application. Another approach is the capacitive electrocardiogram (cECG). The electrodes of the cECG do not need any conductive electrical contact with the participant but can cover distances for example through at least two layer of clothes. Thus, they can be placed in chairs, car seats, and bath tubs. Czaplik et al. (2010) obtained high correlations to conventional ECG at rest in supine position. However, the correlation varied between 0.10 and 0.85 depending on the body position, (breathing) movements, type of clothing, and sweat production of the participants. Additionally, the technological challenges are still high due to motion artifacts and possible filter effects (Teichmann et al., 2012). Therefore, cECG sensors are not feasible to be used in wearables for physical training.

All electrocardiographic measurements can show measuring errors caused by electromagnetic waves of electrical devices and potentials that are caused by muscular activity.

Optical sensors also became more and more popular. Whereas transmissive photoplethysmographic imaging is widely used in clinical settings, reflective photoplethysmography imaging is already applied in smartwatches or activity trackers. Both technologies use a light source and a detector. In transmissive photoplethysmographic sensors, the light source is placed toward the detector, whereas light source and detector are placed on one side of the captured area in reflective photoplethysmography imaging. While the pulse wave is running through the captured area, the amount of arterial blood is slightly increased. The red blood cells absorb the red light leading to different reflections that can be detected. The registered pulse wave therefore represents HR. Although evidence shows a close correspondence of pulse wave and HR (Drinnan et al., 2001; Opalka, 2009), measuring errors can occur due to the latency of the pulse wave and varying vascular resistance (Selvaraj et al., 2008). Therefore, inconsistent findings regarding the reliability can be found depending on location of sensor, experimental condition and performed exercise (0.11–0.99; Schäfer and Vagedes, 2013). Whereas the sensors show high reliability in clinical settings, at rest, and during sleep, the accuracy becomes considerably smaller during movements. Weghorn (2016) found measurements of 118 bpm, while the ECG reference measure was at 65 bpm. Similar results where obtained by Gillinov et al. (2017). Parak and Korhonen (2014) evaluated two photoplethysmographic based HR monitors, where HR measurement lay within a 10 bpm interval in about 87 % of the time compared to the ECG reference heart rate. This incongruence is mainly caused by the signal processing of the pulse wave. In contrast to the sharp increase of the R-spike in the ECG, the pulse wave shows a slow increase and decrease leading to different detection depending on the analyzing algorithm. Additionally, skin color and external light sources might lead to artifacts.

Due to the comfortable handling and application in different locations at the upper and lower extremities, optical sensors have a high potential to be applied in wearables. However, the reliability essentially needs to improve.

Measuring the alternating magnetic field at distinct areas (e.g., wrist) is another measuring approach that has already been implemented in wearable technologies. This technology registers the pulse wave by measuring the regional changes of tissue connectivity and corresponding changes of impedance. It has the advantage that no contact between sensor and measuring site is needed. At rest, the assessment of heart rate variability (HRV) shows very high correlations (0.99–1.00) compared to 3 channel ECG (Kristiansen et al., 2005). However, the interference caused by movements and muscular activity is still very high; reliable values were only achieved under laboratory conditions and at rest (Teichmann et al., 2012). Currently, the technology is not feasible to be used in wearables for physical training.

Infrasonic cardiac vibration sensors (i.e., ballistocardiographic or seismocardiographic sensors) measure the vibration of the human body that is caused by the heart function and the blood flow through the body (Teichmann et al., 2012; Inan et al., 2015).

#### TABLE 1 | Feasibility of measurement techniques used in wearables.


−, not feasible; o, limited feasibility; +, feasible; n.a., no data available for exercise.

These sensors do not require direct skin contact. Therefore, they can be integrated into devices of daily life (i.e., beds, wheel chair). Shin et al. (2011) obtained a strong correlation (0.97–0.98) on a weighing scale type sensor at rest. However, muscular activity, movements, and floor vibrations may cause measurement errors. Therefore, these sensors do not provide reliable information during physical activity.

Phonocardiographic sensors measure the noise that is produced by the heart function or the blood wave. Modern technology has replaced the stethoscope by a more reliable microphone sensor. However, the reliability of the sensor is not sufficient due to a high amount of interference caused by noise from the environment (Torres-Pereira et al., 1997).

Sphygmographical and sphygmomanometrical sensors measure the differences of blood pressure elicited by systole and diastole. The sphygmo graphical sensor formerly used an inconvenient device attached to the arm, and is therefore not feasible to be used in wearables. Sphygmomanometrical sensors nowadays measure the variance of blood pressure using air pressure cuffs. However, these sensors must be applied by a skilled physician and measurements are non-continuous (Kugler et al., 1997). Therefore, sphygmomanometrical sensors are not feasible for wearables.

Several recent studies showed that accuracy and precision of HR measurement not only depend on the technique of measurement, but is strongly depending on the wearable device used and the completed activity. El-Amrawy and Nounou (2015) compared nine smartwatches and eight fitness trackers. Accuracy for HR measurement (compared to ECG reference heart rate signal) ranged from 92.8 to 99.9 % dependent on the device, and precision ranged from 5.9 to 20.6 %, respectively. Another way to overcome the deficiency of single measurement technologies is to combine sensors obtaining multi-input systems. The developed systems show high reliability and validity (0.993; Brage et al., 2005; Peter et al., 2005).

### 3. MODELING AND PREDICTION OF HEART RATE

In the previous section we discussed many difficulties and sources of errors regarding the feasibility of HR measurement approaches for wearables.

While usage of wearables has rapidly increased over the last few years, modeling aspects of health and health care are also helpful in numerous applications as stated in Fone et al. (2003). This is especially accounting for HR. Numerous models have been discussed with regard to HR modeling within the last decades. Physiological models are usually built to simulate a specific behavior of a biological system with high accuracy. These simulations of the human's cardiovascular system encompass a wide range of different purposes and cover wide variations in complexity. For example, Grodins (1959) described the cardiovascular system as "a feedback regulator" and emphasized the importance of identifying the relevant components in a system with inputs and outputs and the connection between both. Therefore, he identified input and output parameters for the right and the left heart, the open pulmonary circuit amongst others, before formalizing and modeling the cardiovascular system. Similar kinds of models on special parts of the cardiovascular system in general can be found in, e.g., Ursino (1998); McLeod (1966); Hotehama et al. (2003); Whittam et al. (1998); Asteroth (2000). A detailed review with focus on the dynamics of the cardiovascular system and physiological models can be found in Lim et al. (2012).

Following a specific purpose, e.g., providing scientific explanations, such physiological "white box" models try to represent special parts of the physiological functions of a human's body. Additionally, there are many techniques which model phenomenological observations. For setting up a phenomenological model, the phenomena have to be defined, which can (or should) be covered by the model. HR response under different load conditions especially in endurance specific context can be described by the following four phenomena:


Additionally, other aspects like a pre-exercise HR or a person's maximum HR can be considered directly or implicitly in a model.

In the remaining, we will focus on phenomenological models because they seem to be more applicable in wearables. Therefore, we will first define different aspects of modeling and differentiate between approximation and prediction. Additionally, we will present different types of models and shortly summarize results of the corresponding studies. This section will end with a discussion of the usage of presented models with regard on modeled physiological phenomena.

### 3.1. Overview of Phenomenological Models

Phenomenological models and black box models are more applicable than physiological models in terms of approximation and prediction of HR under stress, even if they cannot accurately mirror all effects which occur in a human's body. However, they are used to observe and model essential effects during the training process. Particularly since possibilities of measurement are restricted during training (see section 2), an accurate model which depicts too many different physiological aspects is not applicable.

In this paper, we will focus on modeling acute HR responses under stress. As stated in section 1, these responses can be subdivided as following: Short-term responses expressed by HR kinetics to the change of load and mid-term responses expressed by individual relationship of stress intensity and HR. These acute responses of human HR under stress are part of numerous phenomenological models.

We can define four different aspects which are relevant when considering HR models from modeling perspective; we have to discriminate between approximation, short term prediction, session prediction, and controlling, which will be explained in more detail in the following.

As defined in Ludwig et al. (press), many (non-black box) models M can be defined as functions mapping all parameters αE required by the model, and a stress curve u, to an artificially computed HR curve y. In this curve both, input (i.e., stress curve) and output (i.e., HR curve), are real time series. The estimated HR at point of time t is labeled by y(t) while y = M(αE, u), where αE ∈ P is the parameter setting<sup>1</sup> and u = u1, ..., u<sup>t</sup> ∈ (R +) ∗ serves as the model input.

Mathematically, approximation is just a curve fitting problem, which is a specific type of optimization problems. The goal of curve fitting is to find the best solution to a specific problem by finding the maximum (or the minimum) of a fitness (or error) function which correlates to the problem. There are several methods for finding local optima—usage of variants of least squares method is most common. In terms of HR modeling, optimization is used to find parameters αE as optimal as possible, such that the error between the measured HR curve and the modeled HR curve is as small as possible.

Going further, the term prediction<sup>2</sup> can be used to forecast HR, i.e., computing HR values which were not known by the model beforehand and not used for optimizing the model's parameter

<sup>1</sup> Since HR response is delayed, HR increases after a certain time of physical activity and regeneration in relaxation for example are delayed as well. The speed of these adaption processes is highly personalized, and therefore the models should be parametrized for such individual model components.

<sup>2</sup> In estimation theory estimating the value of a function at a given point in time based on the observations made up to this point is denoted as filtering rather than predicting.

space. A prediction is dependent on the model parameters previously identified in approximation on different data sets (i.e., approximation is performed on training data and prediction on test data). In short term prediction, we are interested in predicting HR responses to the change of load based on current input data over a certain time horizon. This type of prediction is often used to properly control the stress applied to a subject to prevent unwanted physical effects. If instead the task is to develop a sensitive training plan for a subject over a whole workout session beforehand or to plan a competition, then the input-output relation between imposed stress and resulting HR needs to be predicted over a longer period of time. We use the term session prediction to refer to this capability of a model. This means, session prediction is used for predicting a whole time series, such that mid-term HR effects can be modeled as well.

Controlling is a special case of HR prediction in this context. It is usually based on short term prediction since the model is used to control the stress which is exposed to a subject, e.g., by an ergometer. Apart from short term prediction, input and output are interchanged in the control application, since the power of an ergometer should be changed dependent on a subject's HR. HR models used for control are often some kind of short term prediction models.

Adjustment of short term prediction models for the usage of session prediction is mathematically possible, but can lead to a lack of accuracy as shown in Ludwig et al. (2015) and Hoffmann and Wiemeyer (2017a). If a short term prediction model makes use of previous HR values, respective previously computed HR values could be used in the corresponding session prediction model. It is possible that the prediction error accumulate quite fast in doing so. Vice versa, models for session prediction can be transformed into models of short term prediction by using the model stepwise.

In general, all HR models have the potential to be used for any application which requires HR modeling with varying accuracy. Some effects might be modeled only indirectly and thus less accurate as in models considering them as phenomena to be modeled directly. Thus phenomenological models cannot represent all possible HR behaviors, but best describe the effects they are built for. For example, Paradiso et al. (2013) stated that they avoid workloads inducing the cardiovascular drift and therefore do not need to include the drift effect in their model. On the other hand, models used for indoor control purposes – like ergometer or treadmill control – do not need to predict future HR values for more than a few seconds.

**Table 2** gives an overview of common HR models and summarizes their properties. Each model is first specified by its property of being a black box model, a regression analysis model, or a white box model. Most properties are marked with an "x" if applicable, are further specified, or are marked with "ø" for clarification if a certain property is not specified within the corresponding paper; if the model is used for prediction, the type of prediction is further specified. The number of parameters which need to be optimized is stated where possible; in case of Artificial Neural Networks (ANS) , the number results by multiplication of the number of hidden nodes with aggregation of number of input and output nodes (and a bias added if used), since the networks here are built with one hidden layer. Amount of parameters is not specified if a model is not explicitly given and the amount of necessary parameters for optimization is not specified in the correlating paper. The focus for the effects covered by a model is set to the four effects identified as main effects at the beginning of this section—namely delayed exponential attenuated response, S-shaped response, cardiovascular drift, and complete exhaustion. The inclusion of a pre-exercise HR or a person's maximum HR in the model, and the way how stress is included as input is stated here, too. Additionally, some models contain a component for recovery different from the HR response to increasing stress. In this case, the function used for recovery is stated in the table.

It can clearly be seen that most phenomenological models discussed in this paper are modeled and evaluated for control purposes or for analyzing correlations between HR and specific other measurements or influences. Prediction of complete training sessions beforehand ("C")—which corresponds to a proper evaluation with a test set independent of training sets used for parameter estimation—is not yet evaluated very well. Regarding the effects, it is noticeable that most models include both, an exponential response to stress and the S-shaped HR response. Many models use some initial or pre-exercise HR, and all other effects are considered more sparsely. Additionally, while only few models incorporate stress linearly, most authors seem to assume a polynomial influence.

Although black box (or gray box) models (e.g., Hammerstein and Wiener models, ANNs) usually do not have physiological correspondence, simulating an existing HR curve or predicting the next few seconds works very well. But when it comes to planning of training or competition, HR approximation of existing training sessions and prediction of only some seconds into the future is not enough any more. For planning a whole training session or simulating a person's capabilities in a competition, HR needs to be predicted over a complete training session. However, black box models tend to overfit in HR response prediction of a complete training session. This is caused by the high number of parameters, which are also often used in non-black-box phenomenological HR models (Ludwig et al., press). Particularly interpretability of a model's parameters is favorable in HR prediction: to model not artifacts but real factors influencing the HR significantly improves the accuracy of prediction. Ludwig et al. (2015) gives a comparison of different types of phenomenological models and presents their accuracy in approximation and prediction of different time horizons of HR. The results illustrate that good accuracy in approximation or prediction of few seconds does not transfer to prediction accuracy in session prediction.

In the following, all considered models are allocated in subsections appropriate to the underlying type of model. Results cited there are always results of approximation (i.e., evaluation of training data set) if not specified otherwise.

#### 3.1.1. Artificial Neural Networks

Yuchi and Jo (2008) implemented a feedforward ANN to predict HR for the next second based on physical activity (obtained as 3- D acceleration signals), while Mutijarsa et al. (2016) did the same


Frontiers in Physiology | www.frontiersin.org

based on cycling cadence. In both networks, the current HR and the respective stress value (physical activity respectively cadence) were used as input variables. HR for the directly following second was set as output. Yuchi and Jo (2008) found a mean absolute error of 3.31 bpm in their test set and found a number of 50 neurons in the hidden layer suitable. Mutijarsa et al. (2016) found a mean absolute error of 3.02 bpm in their test set and identified a number of 333 neurons in the hidden layer via trial and error. The test set is specified as 30 s prediction interval.

Xiao et al. (2009, 2010, 2011) presented different optimization methods based on evolutionary algorithms to train neural networks for HR prediction based on physical activity based on the network described by Yuchi and Jo (2008). HR values were predicted every 30 s for one subject with a short term prediction accuracy of 4.38 bpm (test set) in the mean absolute error.

#### 3.1.2. Differential Equation (DE) Models

To have a closer look at the differences within the following three DE models, the models share the following general structure:

$$\begin{aligned} \dot{\boldsymbol{x}}\_1(t) &= -a\_1 \cdot \boldsymbol{\varkappa}\_1(t) + a\_2 \cdot \boldsymbol{\varkappa}\_2(t) + f(\boldsymbol{\mu}(t)) \\ \dot{\boldsymbol{x}}\_2(t) &= -a\_4 \cdot \boldsymbol{\varkappa}\_2(t) + g(\boldsymbol{\varkappa}\_1(t), \boldsymbol{\varkappa}\_2(t)) \\ \boldsymbol{\chi}(t) &= \boldsymbol{\varkappa}\_1(t) \end{aligned} \tag{1}$$

Here, a<sup>i</sup> , i ∈ N <sup>+</sup> are the parameters, u serves as model input (stress), and y serves as model output (computed HR). The functions f and g will be specified in the model description to clarify differences in the models.

Cheng et al. (2007) proposed a DE model, which was originally used for treadmill walking and is stated to describe HR behavior during even longer lasting exercises as well as for the recovery phase. One year later, Cheng et al. (2008) published a slightly different DE model used to control speed of a treadmill for regulation of HR in walking at different speeds. In both DE models, the authors formulate two short-term components for different responses in HR changes: One component (x1) is stated to describe changes in HR based on parasympathetic and sympathetic neural effects as a central response to exercise stress, the second component (x2) is stated to describe changes in HR based on effects from the hormonal system, increase in body temperature or other slowly-acting effects from the peripheral local metabolism. The output in both models describes the changes in HR from resting HR, while the input signal is set to the walking velocity during the training (and set to 0 for recovery). Velocity is supposed to have a quadratic influence on changes of HR in both models: regarding Equation 1, Cheng et al. (2007) defined:

$$f(u(t)) = \frac{a\_2 \cdot u^2(t)}{1 + \exp(-u(t) + a\_3)}, \ a\_2 = 1, 2$$

where the exponential function is used to depict further nonlinear effects of the HR; and Cheng et al. (2008) reduced this part of the model to:

$$f(u(t)) = a\_2 \cdot u^2(t).$$

Furthermore, Cheng et al. (2007) model slow recovery of HR after exercise again with a hyperbolic tangent function within:

$$g(\varkappa\_1(t), \varkappa\_2(t)) = a\_4 \cdot \tanh(\varkappa\_2(t)) + a\_5 \cdot \varkappa\_1(t).$$

Only changes of the first component were dependent on input velocity within a sigmoidal function. The five parameters used in this model were estimated using Levenberg-Marquardt. Approximation accuracy is analyzed only visually. The model proposed in Cheng et al. (2008) has no such explicit component to cover slow recovery. While input velocity in this model still only effects changes of the first component, the sigmoidal function here covers changes of the second component, but dependent on the first component, using:

$$g(\boldsymbol{\chi}\_1(t), \boldsymbol{\chi}\_2(t)) = \frac{a\_4 \cdot \boldsymbol{\chi}\_1(t)}{1 + e \boldsymbol{\chi}(-(\boldsymbol{\chi}\_1(t) - a\_5))}.$$

The possibility to individualize the model using the set of five parameters is retained for this DE model, but the authors estimated fixed parameters based on data of all their subjects to identify a model with no free parameters for their controller design. Approximation accuracy is analyzed only visually, since the focus of the presented work was on controller design and parameter stability. While Scalzi et al. (2012) used the model by Cheng et al. (2008) to describe a new controller design, Paradiso et al. (2013) slightly adapted this model for usage in ergometer cycling. Compared to the original model, they used a new scaling parameter for multiplication with the quadratic input term, i.e.,

$$f(u(t)) = a\_6 \cdot u^2(t).$$

The authors stated that the model can be used for cycling ergometer control.

A different DE model was proposed by Stirling et al. (2008). Here, changes of HR are modeled as a function dependent on speed (or other intensity measures) and time. Their model is based on two basic components: changes in HR and the exercise demand, which are both dependent on speed and time and constrained by the minimum and maximum HR values of a subject. Three differences are modeled, which are scaled with different parameters and multiplied afterwards: the difference between current HR and minimal HR, between maximum HR and current HR, and between actual exercise demand and current HR. The parameters are used for scaling and to control how quickly HR approaches or diverges from maximum/minimum HR. Parameters do not change during a certain period of training. Changes in parameters over different training seasons are stated to give information about the subject's cardiovascular condition. Approximation accuracy is analyzed only visually. Improved versions of this model with less parameters were presented by Zakynthinaki (2015) and Mazzoleni et al. (2016); we will describe their work in section 3.1.5.

#### 3.1.3. Regression Models

Analyzing HR using probabilistic approaches as multiple regression, a frequent goal is to test certain correlations between HR and other parameters<sup>3</sup> . Hoffmann and Wiemeyer (2017b) used multiple regression methods to find factors, which may have a significant effect on changes in HR additional to training effort. They analyzed 19 variables (like restfulness of sleep, nutrition, current mood and others) in terms of their impact on three different parameters of the Bunc equation (Bunc et al., 1988) of HR, i.e., HR at start of the exercise, steady state HR, and a factor used in a basic underlying HR model for slope of the HR curve. The authors found that influences on HR response are very individual, but that physical health, negative mood, the number of intervals in training, as well as time of the day seem to generally influence HR changes. Jang et al. (2016) aimed to find a relationship between running speed and HR using statistical regression methods. In 217 subjects with incremental step tests they analyzed a regression for linear and non-linear HR components; the latter are important because of metabolic demands and cardiac drift effects. In both, inter- and intra-subject analysis, they found a strong correlation between HR and running speed. Smallest errors were achieved with higher regression orders. The regression model of fourth order yielded a correlation of 0.997 and a mean error in HR difference of 2.04 bpm. Similarly, Fairbarn et al. (1994) found linear relationships between HR and oxygen uptake for different aged groups of men and women by analyzing data of 231 subjects during incremental cycle ergometer tests with random effects regression. Richards (1980) provides a good overview comprising (amongst other topics) the HR analysis with statistical measures, multivariate statistical methods, and time series analysis of HR with auto regression. A short workflow of choosing the appropriate statistical method when working with HR data is also given for analysis of raw data.

Bennett et al. (1993) discussed four different autoregressive methods to fit and predict HR time series based on past HR values and noise. They found that the bilinear autoregressive model describes HR dynamics best in comparison to autoregression with and without moving average and polynomial autoregression, but performs poorly in prediction. A similar analysis of Christini et al. (1995) confirms the results. Both concluded that control of HR dynamics should be non-linear.

Wang et al. (2008, 2009) used linear regression and support vector regression (SVR) to examine the relationship between oxygen uptake and other cardiovascular variables like HR. The regression here was focused between oxygen uptake and other cardiovascular factors. Hence, no conclusions were drawn for correlations between HR and other cardiovascular factors. Ludwig et al. (2015) showed that support vector regression can also be used to simulate and predict HR dynamics based upon earlier HR measurements. Esmaeili and Ibeas (2016) applied a particle swarm optimization method for the SVR model proposed by Wang et al. (2008) and claimed to reach better model parameters compared to other studies. Girard et al. (2016) used this model to successfully regulate HR response during treadmill exercise with a PID-controller for treadmill speeds lower than 8 km/h.

#### 3 In this specific context, parameters mean measures or effects.

#### 3.1.4. Hammerstein and Wiener Models

Su et al. (2007a,b, 2010) identified a Hammerstein model for HR modeling. Model identification was done separately for the linear and non-linear part of the model by decoupling these parts using pseudorandom binary sequences, which were found to be helpful in this task. Both model parts were identified by machine learning algorithms (e.g., SVR) based on collected experimental treadmill data. The model was used for PID control of the treadmill, which is the focus of the respective work. Based on these Hammerstein model approaches, a modified Hammerstein model is presented and tested by Mohammad et al. (2011). Here, the non-linear part is approximated by a polynomial function.

Gonzalez et al. (2016) focused on approximation and prediction of VO˙ <sup>2</sup> but showed that their identified model can also be applied to HR modeling and prediction. In their work, they analyzed different types of models like autoregressive models with and without a moving average, State-Space models, and Hammerstein-Wiener models and stated that a Hammerstein-Wiener model showed best results in their experiments. Optimization finally leads to a pure Wiener model. In an analysis of five subjects each performing four different bicycle ergometer protocols, average approximation accuracy (training set) of HR was 4.55 bpm, and average session prediction accuracy (test set) was 7.46 bpm.

The model proposed by Ludwig et al. (press) can be illustrated as Wiener model, but has a strong focus on reduction of parameters and thus is presented in section 3.1.5.

#### 3.1.5. Parameter-Reduced HR Models

Zakynthinaki (2015) stated that HR dynamics in response to movement should be dependent on one parameter describing the cardiovascular condition only. They built their model upon the DE model by Stirling et al. (2008), but added, e.g., different HR phases and time delays and simultaneously reduced parameters to only one global parameter, which represents the cardiovascular condition. The basic structure of their model is still a DE model with difference between current HR and minimal HR, maximum HR, or actual exercise demand. For example, the difference between actual and maximum HR is now part of a sigmoidal function similar to Cheng et al. (2008) instead of scaling this difference by one exponent as before (i.e., (HR − HRmax) <sup>A</sup> with parameter A). The number of parameters was reduced via trialand-error such that all parameters except one could be fixed. The author states that the model is able to predict complete training sessions. The published evaluation is performed visually without numeric values and based on a single protocol for two subjects. In Zakynthinaki (2016), the same model is used to predict different stress courses for synthetic data. Transferability to real training data seems to be not yet proved completely.

Mazzoleni et al. (2016) also built their model based on the DE model by Stirling et al. (2008) for HR modeling in cycling exercises. Additionally, they included a term, which considered torque and cadence, which they stated to be crucial in cycling. They ended up with fourteen parameters, but with a stability analysis using eigenvalues they were able to reduce the number of free parameters to 11 and to restrict ranges of at least two parameters. Parameters were computed based on synthetic data, resulting in a coefficient of determination of r <sup>2</sup> = 0.90, when both cadence and power output are used as model input values.

Koenig et al. (2009) aimed to identify the main effect of change in treadmill speed and human energy expenditure to HR to predict HR during Lokomat walking. Therefore, they calculated the average HR increase for different walking velocities after subtracting a pre-exercise HR value and built a model presented as relay block chart with 11 parameters to scale the effects of the input values including, e.g., fatigue of the subject, and were able to reduce number of free parameters to four.

Ludwig et al. (2016, press) proposed a model which can be described as Wiener model. The basic model has four parameters, which can be reduced to one free parameter. Similar to the idea in Zakynthinaki (2015), this parameter is meant to represent the cardiovascular condition of a person. Furthermore, this model intended to be as simple as possible without lack of accuracy. The model was compared to different other models and yielded lower errors in a complete session prediction. In one study, the average prediction error (test set) was 7.08 bpm in a leave-one-out cross validation of altogether 17 tests of three subjects (Ludwig et al., 2016). In a second study, average approximation error (training set) of 4.95 bpm and an average prediction error (test set) of 7.34 bpm in altogether 20 tests of five subjects was observed (Ludwig et al., press).

#### 3.1.6. Further Types of Models

Some further model types are occasionally used for HR modeling; to give a short impression of the variety the models will be shortly mentioned in this section.

Dur-e Zehra Baig et al. (2010) compared a linear time invariant (LTI) model with a linear time varying (LTV) model for HR approximation during walking, cycling, and rowing, each at three different intensities, i.e, nine different tasks per subject. The model using parameters varying in time performed better than the LTI model in all analyzed cases with an average mean squared error of 0.158 bpm<sup>2</sup> for the LTI and 0.071 bpm<sup>2</sup> for the LTV model over both subjects and all performed tasks.

Le et al. (2009), Sinclair et al. (2009), and Yang et al. (2012) all defined HR as sum of an initial HR value before the start of the exercise and changes due to stress at every point in time. The changes in HR are subdivided into a phase where HR increases, and some phase where the cardiac drift occurs. While Le et al. (2009) differentiated between moderate and exhaustive intensities for the phase of increase, Sinclair et al. (2009) defined a steady-state HR phase including the cardiac drift and used accumulated work instead of plain stress values. Le et al. (2009) and Yang et al. (2012) additionally defined a recovery phase, defined by an exponential function in Yang et al. (2012), and a sum of the HR at anaerobic threshold minus calculated HR values up to exhaustion in Le et al. (2009) – basically the counterpart to their implementation of HR exhaustion. The phase of increase respective HR at moderate intensity is modeled as a single parameter in Sinclair et al. (2009), Le et al. (2009) summed up workload and change in HR at the preceding point in time—each scaled by a parameter—and Yang et al. (2012) additionally added up some noise. The drift is again modeled as a single parameter in Sinclair et al. (2009), while Le et al. (2009) and Yang et al. (2012) used a scaled exponential function depending on the current or last workload respectively.

Endler (2013) adapted a model by Perl (2004) to running, which was initially developed for modeling training processes. PerPot-Run uses speed as input, which is divided antagonistically in a positive and negative potential. The model determines HR as output by flow equations, where positive and negative potentials are effecting the HR with different delays. For prediction usage of the model, it has to be calibrated to an individual subject by a graded incremental test of the subject. PerPot-Run can be used to calculate the individual anaerobic threshold (Endler et al., 2017). Furthermore, it is used to optimize endurance running competitions and training. Endler and Friedrich (2016) presented an extension of PerPot-Run, including incline and decline of tracks.

### 3.2. Usage of HR Models and Applicability in Wearables

A commonly used application for HR models is control of HR on a treadmill (Mazenc et al., 2010; Nguyen et al., 2011; Patra¸scu ˘ et al., 2014; Hunt and Fankhauser, 2016; Hunt and Liu, 2017), on a bicycle ergometer (Mohammad et al., 2012; Paradiso et al., 2013; Argha et al., 2014, 2015a,b; Leitner et al., 2014), for gait training (Koenig et al., 2011) or to control strain in exergames (Sinclair et al., 2009). Even apart from strain or stress control, use of HR models is conceivable for many other areas like training planning (Brzostowski et al., 2013; Schäfer et al., 2015), generating individualized training zones based on past training sessions, keeping track of performance development and adjustment of HR training zones, potentially enhancing accuracy by predicting the HR after a model is individualized and adjust the displayed HR according to measurement and model prediction, compensate missing or incorrectly detected HR values [see Jang et al. (2016)], and more.

A simple way to control the individual HR response is by using the closed loop principles of regulatory circuits. Wagner et al. (1993) used the approach of a PD controller for HR control that is solely influenced by the applied load on a bicycle ergometer (u). Thus, the load is adapted proportionally and differentially according to the adaptation course of the HR. Since HR response is delayed the load is adapted at distinct time points. The proportional part analyzes the deviation of the desired target (HRtarget) to the actual measured HR (HRcurrent). The differential part analyzes the increase of HR represented by the deviation of HRcurrent and the starting HR (HRstart) within these intervals. The following formula was used:

$$\begin{aligned} \mu(t) &= K\_{\mathbb{P}} \cdot \text{(HR\_{target} - HR\_{current}(t))} \\ &+ K\_d \cdot \text{(HR\_{current}(t) - HR\_{start}(t))} \end{aligned}$$

Wagner et al. (1993) obtained sufficient results adapting the parameters K<sup>p</sup> and K<sup>d</sup> individually.

Stirling and Zakynthinaki (2003) provide additional examples how modeling can be used for different processes in sport with a focus on modeling physiological responses to exercise.

In addition to these use cases, applicability of phenomenological models to wearables is an interesting issue. But how can wearables benefit from integration of models? Since several wearables already provide some general training information on a computer based platform, inclusion of HR models could be used to already inform the user during the training about, e.g., the training progress or provide suggestions according to a training plan. Even more, it could help to control the strain a person summons up during a competition [similar to the idea of PerPot-Run by Endler (2013)] by providing useful information about an expected HR or performance progress based on current HR data. Independent of concrete activities or goals, information based on model predictions provided by wearables could help to avoid overstrain, enhance training progresses, and altogether motivate the user to train in an expedient and suitable way. In addition, a well-individualized model could improve the accuracy of wearables by comparing current measurements to predicted HR values.

Some limitations of wearables such as a small screen size and moderate computer performance have to be considered. To provide predictive information during training, it would be necessary that either stress is known beforehand, which might be the case only for very specific applications, or to update the model predictions regularly during the training and based on current strain or stress. Since most HR models use only one input (or input curve), which can be power, velocity, physical activity values, and so forth, the kind of stress considered has to be chosen carefully. For example, in running it might be beneficial to include both, running velocity and slope, which would need to be combined to one stress value for usage in most HR models. While a stress value can be well defined in, e.g., walking, running, and cycling, finding an appropriate measure might be much more difficult in other sports. Here, the use of machine learning algorithms (like ANNs, SVR, or Hammerstein or Wiener models) could be beneficial, since they allow easily to include any desired number of different inputs. However, machine learning algorithms need a huge amount of data to be appropriately trained, and training or updating a model sometimes requires a high computational power and a corresponding computing time depending on the underlying system. Especially for ANNs, a small network with up to 10 neurons should be sufficient for HR prediction. Higher amounts of neurons in the hidden layer can quickly lead to overfitting resulting in bad prediction accuracy. On the other hand, simply using an already trained ANN does not require much time and can easily be executed in real time even on wearables. Therefore, an ANN would be feasible to be used on demand, but should be trained beforehand and not on a wearable.

A potential workability of a model on a wearable is strongly dependent on the specific implementation of this model. Models used for control purposes are often feasible in predicting a few seconds of HR which could also be applicable to wearables. Predicting longer time horizons of HR or controlling a complete training session can also be implemented with models, which are able to accurately predict complete training sessions. Using a suitable implementation, most models will be efficient in just computing current HR values based on a given stress value, while parameter optimization can be time expensive.

In general, individualization of a given model always requires optimization of model parameters, which need data to be trained on and can hardly be performed online during a training. Statistical models and results from statistical analysis can help identifying important parameters affecting HR (like gender, age, body mass index, or similar). With this additional information, HR models could be improved such that less parameters have to be optimized. Adjusting model parameters can certainly be performed faster for less parameters, such that a less complex model with only few parameters could possibly be optimized and adjusted online on a wearable and during training. HR models by Zakynthinaki (2015) and Ludwig et al. (press) are reduced to one parameter and might be good candidates for this purpose. Additionally, results obtained in regression analysis as in Hoffmann and Wiemeyer (2017b) can help reducing necessary parameters in other models. Actual applicability of particular models to wearables has to be analyzed and compared against each other in more detail in the future.

## 4. SUMMARY

Wearables controlling individual strain via HR have the potential to be used as effective and efficient tools for the physical training process. As the HR is integrated in a variety of nested regulatory mechanisms and reflexes, different and highly individual HR kinetics can be observed.

Currently, different sensor technologies measuring HR are available: electrographic sensors, optical sensors, infrasonic vibration sensors, magnetic induction monitoring sensors, phonocardiographic sensors, and sphygmographic sensors. Whereas the electrocardiogram is the "gold standard" for measuring HR, most sensors show high reliability and validity in clinical settings as well. HR breast belts are considered an acceptable compromise of reliability, validity, and usability. Especially optical sensors have a high potential due to high usability and acceptability. However, signal processing, i.e., analysis of pulse wave representing heartbeat, has to be improved. The integration of HR sensors operating on different principles (e.g., photoplethysmography) in wearables for training control is not (yet) feasible due to a variety of possible error sources. Modeling individual responses can be performed using biological and phenomenological models. As biological models are very complex and are more appropriate for offline analysis, they are not feasible to be integrated in wearables for physical training. Phenomenological models in contrast focus specifically on HR response integrating many relevant aspects as cardiac drift or maximum HR. Among other classifications, modeling approaches can be divided into ANN, DE models, regression models, Hammerstein and Wiener models, parameter-reduced HR models, and further models that are occasionally used. The described models can be integrated into wearables for controlling HR on a treadmill, a bike ergometer, for gait training, or strain control within exergames. Additionally, some models can be applied to provide information regarding the long term training process. The feasibility of model implementation in wearables is depending on the reliability of the model, the required

**240**

processing power, and the output of the model. Currently, pretrained ANNs, models with individually pre-adapted parameters, or parameter-reduced models seem to be most appropriate for integration into wearables. However, most models were optimized and tested on specific samples. A comparison of the models based on independent data sets is required for objective and reliable evaluation.

### AUTHOR CONTRIBUTIONS

Conceived and designed the manuscript idea: KH and JW; provided substantial contributions to the conception and design of the manuscript, and substantially supported the typesetting: SE; Responsible for sections 1, 2, and 4: KH (writing) and JW

#### REFERENCES


(supervising); Responsible for section 3: ML (writing) and AA (supervising). Final supervision of the document: JW. All authors read and approved the final manuscript.

## FUNDING

The work was supported by the Ministry for Culture and Science (MKW) of the North Rhine-Westphalia state within the program FH-STRUKTUR 2017 (AZ: 322-8.03.04.02-FH-STRUKTUR 2017/07) (to ML) and by the Equal Opportunity Commission at Bonn-Rhein-Sieg University o.a.S. (to ML) and by the Forum for Interdisciplinary Research (Forum Interdisziplinäre Forschung) of the Technical University Darmstadt (to KH).


EMBS 2007. 29th Annual International Conference of the IEEE (Lyon: IEEE), 2984–2987.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Ludwig, Hoffmann, Endler, Asteroth and Wiemeyer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Critical Review of Consumer Wearables, Mobile Applications, and Equipment for Providing Biofeedback, Monitoring Stress, and Sleep in Physically Active Populations

#### Jonathan M. Peake1,2 \*, Graham Kerr <sup>3</sup> and John P. Sullivan<sup>4</sup>

<sup>1</sup> Tissue Repair and Translational Physiology Research Program, School of Biomedical Sciences and Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, QLD, Australia, <sup>2</sup> Sport Performance Innovation and Knowledge Excellence, Queensland Academy of Sport, Brisbane, QLD, Australia, <sup>3</sup> Movement Neuroscience and Injury Prevention Program, Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, QLD, Australia, <sup>4</sup> Clinical and Sports Consulting Services, Providence, RI, United States

#### Edited by:

Billy Sperlich, Universität Würzburg, Germany

#### Reviewed by:

Filipe Manuel Clemente, Polytechnic Institute of Viana do Castelo, Portugal Nicola Cellini, Università degli Studi di Padova, Italy

\*Correspondence:

Jonathan M. Peake jonathan.peake@qut.edu.au

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 13 November 2017 Accepted: 28 May 2018 Published: 28 June 2018

#### Citation:

Peake JM, Kerr G and Sullivan JP (2018) A Critical Review of Consumer Wearables, Mobile Applications, and Equipment for Providing Biofeedback, Monitoring Stress, and Sleep in Physically Active Populations. Front. Physiol. 9:743. doi: 10.3389/fphys.2018.00743 The commercial market for technologies to monitor and improve personal health and sports performance is ever expanding. A wide range of smart watches, bands, garments, and patches with embedded sensors, small portable devices and mobile applications now exist to record and provide users with feedback on many different physical performance variables. These variables include cardiorespiratory function, movement patterns, sweat analysis, tissue oxygenation, sleep, emotional state, and changes in cognitive function following concussion. In this review, we have summarized the features and evaluated the characteristics of a cross-section of technologies for health and sports performance according to what the technology is claimed to do, whether it has been validated and is reliable, and if it is suitable for general consumer use. Consumers who are choosing new technology should consider whether it (1) produces desirable (or non-desirable) outcomes, (2) has been developed based on real-world need, and (3) has been tested and proven effective in applied studies in different settings. Among the technologies included in this review, more than half have not been validated through independent research. Only 5% of the technologies have been formally validated. Around 10% of technologies have been developed for and used in research. The value of such technologies for consumer use is debatable, however, because they may require extra time to set up and interpret the data they produce. Looking to the future, the rapidly expanding market of health and sports performance technology has much to offer consumers. To create a competitive advantage, companies producing health and performance technologies should consult with consumers to identify real-world need, and invest in research to prove the effectiveness of their products. To get the best value, consumers should carefully select such products, not only based on their personal needs, but also according to the strength of supporting evidence and effectiveness of the products.

Keywords: health, performance, stress, emotion, sleep, cognitive function, concussion

## INTRODUCTION

The number and availability of consumer technologies for evaluating physical and psychological health, training emotional awareness, monitoring sleep quality, and assessing cognitive function has increased dramatically in recent years. This technology is at various stages of development: some has been independently tested to determine its reliability and validity, whereas other technology has not been properly tested. Consumer technology is moving beyond basic measurement and telemetry of standard vital signs, and predictive algorithms based on static population-based information. Health and performance technology is now moving toward miniaturized sensors, integrated computing, and artificial intelligence. In this way, technology is becoming "smarter," more personalized with the possibility of providing real-time feedback to users (Sawka and Friedl, 2018). Technology development has typically been driven by bioengineers. However, effective validation of technology for the "real world" and development of effective methods for processing data requires collaboration with mathematicians and physiologists (Sawka and Friedl, 2018).

Although there is some overlap between certain technologies, there are also some differences, strengths and weaknesses between related technologies. Various academic reviews have summarized existing technologies (Duking et al., 2016; Halson et al., 2016; Piwek et al., 2016; Baron et al., 2017). However, the number and diversity of portable devices, wearable sensors and mobile applications is ever increasing and evolving. For this reason, regular technology updates are warranted. In this review, we describe and evaluate emerging technologies that may be of potential benefit for dedicated athletes, so-called "weekend warriors," and others with a general interest in tracking their own health. To undertake this task, we compiled a list of known technologies for monitoring physiology, performance and health, including concussion. Devices for inclusion in the review were identified by searching the internet and databases of scientific literature (e.g., PubMed) using key terms such as "technology," "hydration," "sweat analysis," "heart rate," "respiration," "biofeedback," "respiration," "muscle oxygenation," "sleep," "cognitive function," and "concussion." We examined the websites for commercial technologies for links to research, and where applicable, we sourced published research literature. We broadly divided the technologies into the following categories (**Figure 1**):


Our review investigates the key issues of: (a) what the technology is claimed to do; (b) has the technology been independently validated against some accepted standard(s); (c) is the technology reliable and is any calibration needed, and (d) is it commercially available or still under development. Based on this information we have evaluated a range of technologies and provided some unbiased critical comments. The list of products in this review is not exhaustive; it is intended to provide a cross-sectional summary of what is available in different technology categories.

#### DEVICES FOR MONITORING HYDRATION STATUS AND METABOLISM

Several wearable and portable hardware devices have been developed to assess hydration status and metabolism, as described below and in **Table 1**. Very few of the devices have been independently validated to determine their accuracy and reliability. The Moxy device measures oxygen saturation levels in skeletal muscle. The PortaMon device measures oxy-, deoxy-, and total hemoglobin in skeletal muscle. These devices are based on principles of near infrared spectroscopy. The PortaMon device has been validated against phosphorus magnetic resonance spectroscopy (31P-MRS) (Ryan et al., 2013). A similar device (Oxymon) produced by the same company has been proven to produce reliable and reproducible measurements of muscle oxygen consumption both at rest (coefficient of variation 2.4%) and after exercise (coefficient of variation 10%) (Ryan et al., 2012). Another study using the Oxymon device to measure resting cerebral oxygenation reported good reliability in the short term (coefficient of variation 12.5%) and long term (coefficient of variation 15%) (Claassen et al., 2006). The main limitation of these devices is that some expertise is required to interpret the data that they produce. Also, although these devices are based on the same scientific principles, they do vary in terms of the data that they produce (McManus et al., 2018).

The BSX Insight wearable sleeve has been tested independently (Borges and Driller, 2016). Compared with blood lactate measurements during a graded exercise test, this device has high to very high agreement (intraclass correlation coefficient >0.80). It also has very good reliability (intraclass correlation coefficient 0.97; coefficient of variation 1.2%) (Borges and Driller, 2016). This device likely offers some useful features for monitoring muscle oxygenation and lactate non-invasively during exercise. However, one limitation is that the sleeve that houses the device is currently designed only for placement on the calf, and may therefore not be usable for measuring muscle oxygenation in other muscle groups. The Humon Hex is a similar device for monitoring muscle oxygenation that is touted for its benefits in guiding warm-ups, monitoring exercise thresholds and recovery. For these devices, it is unclear how reference limits are set, or established for such functions.

Other non-wearable devices for monitoring metabolism, such as Breezing and the LEVL device, only provide static measurements, and are therefore unlikely to be useful for measuring metabolism in athletes while they exercise. Sweat pads/patches have been developed at academic institutions for measuring skin temperature, pH, electrolytes, glucose, and cortisol (Gao et al., 2016; Koh et al., 2016; Kinnamon et al., 2017). These devices have potential applications for

measuring heat stress, dehydration and metabolism in athletes, soldiers, firefighters, and industrial laborers who exercise or work in hot environments. Although these products are not yet commercially available, they likely offer greater validity than existing commercial devices because they have passed through the rigorous academic peer review process for publication. Sweat may be used for more detailed metabolomic profiling, but there are many technical and practical issues to consider before this mode of bioanalysis can be adopted routinely (Hussain et al., 2017).

### TECHNOLOGIES FOR MONITORING TRAINING LOADS, MOVEMENT PATTERNS, AND INJURY RISKS

A wide range of small attachable devices, garments, shoe insoles, equipment, and mobile applications have been developed to monitor biomechanical variables and training loads (**Table 2**). Among biomechanical sensors, many are based around accelerometer and gyroscope technology. Some of the devices that attach to the body provide basic information about body position, movement velocity, jump height, force, power, work, and rotational movement. This data can be used by biomechanists and ergonomists to evaluate movement patterns, assess musculoskeletal fatigue profiles, identify potential risk factors for injury and adjust techniques while walking, running, jumping, throwing, and lifting. Thus, these devices have application in sporting, military and occupational settings.

Among these devices listed in **Table 2**, the I Measure U device is lightweight, compact and offers the greatest versatility. Other devices and garments provide information about muscle activation and basic training metrics (e.g., steps, speed, distance, cadence, strokes, repetitions etc). The mPower is a pod placed on the skin that measures EMG. It provides a simple, wireless alternative to more complex EMG equipment. Likewise, the Athos garments contain EMG sensors, but the garments have not been properly validated. It is debatable whether the Sensoria and Dynafeed garments offer any more benefits than other devices. The Mettis Trainer insoles (and Arion insoles in development) could provide some useful feedback on running biomechanics in the field. None of these devices have been independently tested to determine their validity or reliability. Until such validity and reliability data become available, these devices should (arguably) be used in combination with more detailed motion-capture video analysis.

Various mobile applications have been developed for recording and analyzing training loads and injury records (**Table 2**). These applications include a wide range of metrics that incorporate aspects of both physical and psychological load. The Metrifit application provides users with links to related unpublished research on evaluating training stress. Many of the applications record and analyze similar metrics, so it

#### TABLE 1 | Devices for monitoring hydration status and metabolism.


Device is considered the gold standard in its class i.e., no comparison with other technology is possible.

TABLE 2 | Devices and garments for monitoring training loads, movement patterns, and injury risks.


(Continued)

#### TABLE 2 | Continued


is difficult to differentiate between them. The choice of one particular application will most likely be dictated by individual preferences. With such a variety of metrics—which are generally recorded indirectly—it is difficult to perform rigorous validation studies on these products. Another limitation of some of these applications is the large amount of data they record and how to make sense of all the data.

### TECHNOLOGIES FOR MONITORING HEART RATE, HEART RATE VARIABILITY, AND BREATHING PATTERNS

Various devices and mobile applications have been developed for monitoring physiological stress and workloads during exercise (**Table 3**). The devices offer some potential advantages and functionality over traditional heart rate monitors to assess demands on the autonomic nervous system and the cardiovascular system during and after exercise. They can therefore be used by athletes, soldiers and workers involved in physically demanding jobs (e.g., firefighters) to monitor physical strain while they exercise/work, and to assess when they have recovered sufficiently.

Among the devices listed in **Table 3**, the OmegaWave offers the advantages that it directly records objective physiological data such as the electrocardiogram (ECG) as a measure of cardiac stress and direct current (DC) potential as a measure of the activity of functional systems in the central nervous system. However, one limitation of the OmegaWave is that some of the data it provides (e.g., energy supply, hormonal function, and detoxification) are not measured directly. Accordingly, the validity and meaningfulness of such data is uncertain.

The Zephyr sensor, E4 wristband and Reign Active Recovery Band offer a range of physiological and biomechanical data, but these devices have not been validated independently. The E4 wristband is also very expensive for what it offers. The Mio SLICETM wristband integrates heart rate and physical activity data with an algorithm to calculate the user's Personal Activity Intelligence score. Over time, the user can employ this score to evaluate their long-term health status. Although this device itself has not been validated, the Personal Activity Intelligence algorithm has been tested in a clinical study (Nes et al., 2017). The results of this study demonstrated that individuals with a Personal Activity Intelligence score ≥100 had a 17–23% lower risk of death from cardiovascular diseases.

The HELO smart watch measures heart rate, blood pressure, and breathing rate. It also claims to have some more dubious health benefits, none of which are supported by published or peer-reviewed clinical studies. One benefit of the HELO smart watch is that it can be programmed to deliver an emergency message to others if the user is ill or injured.

The Biostrap smart watch measures heart rate. Although it has not obviously been validated, the company provides a link to research opportunities using their products, which suggests confidence in their products and a willingness to engage in research. The Lief patch measures stress levels through heart rate variability (HRV) and breathing rate, and provides haptic feedback to the user in the form of vibrations to adjust their emotional state. The option of real-time feedback without connection to other technology may provide some advantages. If worn continuously, it is uncertain if or how this device (and others) distinguishes between changes in breathing rate and HRV associated with "resting" stress, as opposed to exercise stress (Dupré et al., 2018). But it is probably safe to assume that users will be aware of what they are doing (i.e., resting or exercising) during monitoring periods. Other non-wearable equipment is available for monitoring biosignals relating to autonomic function and breathing patterns. MyCalmBeat is a pulse meter that attaches to a finger to assess and train breathing rate, with the goal of improving emotional control. The CorSense HRV device will be available in the future, and will be tailored for athletes by providing a guide to training readiness and fatigue through measurements of HRV. It is unclear how data from these devices compare with applications such as OmegaWave, which measures ECG directly vs. by photoplethysmography.

A range of garments with integrated biosensor technology have been developed. The Hexoskin garment measures cardiorespiratory function and physical activity levels. It has been independently validated (Villar et al., 2015). The device demonstrates very high agreement with heart rate measured by ECG (intraclass correlation coefficient >0.95; coefficient of variation <0.8%), very high agreement with respiration rate measured by turbine respirometer (intraclass correlation coefficient >0.95; coefficient of variation <1.4%), and moderate to very high agreement with hip motion intensity measured using a separate accelerometer placed on the hip (intraclass correlation coefficient 0.80 to 0.96; coefficient of variation <6.4%). This device therefore offers value for money. Other garments including Athos and DynaFeed appear to perform similar functions and are integrated with smart textiles, but have not been validated.

### TECHNOLOGIES FOR MONITORING AND PROMOTING BETTER SLEEP

Many devices have been designed to monitor and/or promote sleep (**Table 4**). Baron et al. (2017) have previously published an excellent review on these devices. Sleep technologies offer benefits for anyone suffering sleep problems arising from chronic disease (e.g., sleep apnea), anxiety, depression, medication, travel/work schedules, and environmental factors (e.g., noise, light, ambient temperature). The gold standard for sleep measurement is polysomnography. However, polysomnography typically requires expensive equipment and technical expertise to set up, and is therefore not appropriate for regular use in a home environment.

The Advanced Brain Monitoring Sleep Profiler and Zmachine Synergy have been approved by the US Food and Drug Administration. Both devices monitor various clinical metrics related to sleep architecture, but both are also quite expensive for consumers to purchase. The disposable sensor pads required to measure encephalogram (EEG) signals add an extra ongoing cost. The Somté PSG device offers the advantage of Bluetooth wireless

#### TABLE 3 | Devices and garments for monitoring cardiorespiratory functions.


#### TABLE 3 | Continued


technology for recording EEG during sleep, without the need for cables.

A large number of wearable devices are available that measure various aspects of sleep. Several of these devices have been validated against gold-standard polysomnography. The UPTM and Fitbit FlexTM devices are wristbands connected to a mobile application. One study reported that compared with polysomnography, the UP device has high sensitivity for detecting sleep (0.97), and low specificity for detecting wake (0.37), whereas it overestimates total sleep time (26.6 ± 35.3 min) and sleep onset latency (5.2 ± 9.6 min), and underestimates wake after sleep onset (31.2 ± 32.3 min) (de Zambotti et al., 2015). Another study reported that measurements obtained using the UP device correlated with total sleep time (r = 0.63) and time in bed (r = 0.79), but did not correlate with measurements of deep sleep, light sleep or sleep efficiency (Gruwez et al., 2017). Several studies have reported similar findings for the Fitbit FlexTM device (Montgomery-Downs et al., 2012; Mantua et al., 2016; Kang et al., 2017). In a validation study of the OURA ring, it was shown to record similar total sleep time, sleep latency onset and wake after sleep onset, and had high sensitivity for detecting sleep (0.96). However, it had lower sensitivity for detecting light sleep (0.65), deep sleep (0.51) and rapid eye movement sleep (0.61), and relatively poor specificity for detecting wake (0.48). It also underestimated deep sleep by about 20 min, and overestimated the rapid eye movement sleep stage of sleep by about 17 min (de Zambotti et al., 2017b). Similar results were recently reported for the Fitbit Charge2TM device (de Zambotti et al., 2017a). These devices therefore offer benefits for monitoring some aspects of sleep, but they also have some technical deficiencies.

#### TABLE 4 | Wearable devices and equipment for monitoring and promoting better sleep.


(Continued)

#### TABLE 4 | Continued


(Continued)

#### TABLE 4 | Continued


Various other devices are available that play soft music or emit light of certain colors to promote sleep or wakefulness. Some similar devices are currently in commercial development. Although devices such as the Withings Aura and REM Sleep Tracker, Re-Timer and AYO have not been independently validated, other scientific research supports the benefits of applying blue light to improve sleep quality (Viola et al., 2008; Gabel et al., 2013; Geerdink et al., 2016). The NightWave Sleep Assistant is appealing based on its relatively low price, whereas the Withings Aura and REM Sleep Tracker records sleep patterns. The Re-Timer device is useful based on its portability.

Some devices also monitor temperature, noise and light in the ambient environment to identify potential impediments to restful sleep. The Beddit3 Sleep Tracker does not require the user to wear any equipment. The ResMed S+ and Circadia devices are entirely non-contact, but it is unclear how they measure sleep and breathing patterns remotely.

### TECHNOLOGIES FOR MONITORING PSYCHOLOGICAL STRESS AND EVALUATING COGNITIVE FUNCTION

The nexus between physiological and psychological stress is attracting more and more interest. Biofeedback on emotional state can assist in modifying personal appraisal of situations, understanding motivation to perform, and informing emotional development. This technology has application for monitoring the health of people who work under mentally stressful situations such as military combat, medical doctors, emergency service personnel (e.g., police, paramedics, fire fighters) and traffic controllers. Considering the strong connection between physiology and psychology in the context of competitive sport, this technology may also provide new explanations for athletic "underperformance" (Dupré et al., 2018).

Technology such as the SYNC application designed by Sensum measures emotions by combining biometric data from third-party smartwatches/wristbands, medical devices for measuring skin conductance and HR and other equipment (e.g., cameras, microphones) (Dupré et al., 2018). The Spire device is a clip that attaches to clothing to measure breathing rate and provide feedback on emotional state through a mobile application. Although this device has not been formally validated in the scientific literature, it was developed through an extended period of university research. The Feel wristband monitors emotion and provides real-time coaching about emotional control.

In addition to the mobile applications and devices that record and evaluate psychological stress, various applications and devices have also been developed to measure EEG activity and cognitive function (**Table 5**). Much of this technology has been extensively engineered, making it highly functional. Although the technology has not been validated against gold standards, there is support from the broader scientific literature for the benefits of biofeedback technology for reducing stress and anxiety (Brandmeyer and Delorme, 2013). The MuseTM device produced by InterAxon is an independent EEG-biofeedback device itself, but it has also been coupled with other biofeedback devices and mobile applications (e.g., Lowdown Focus, Opti BrainTM). The integration of these technologies highlights the central value of measuring EEG and the versatility of the MuseTM device. The NeuroTracker application is based around the concept of multiple object tracking, which was established 30 years ago as a research tool (Pylyshyn and Storm, 1988). NeuroTracker has since been developed as a training tool to improve cognitive functions including attention, working memory, and visual processing speed (Parsons et al., 2016). This technology has potential application for testing and training cognitive function in athletes (Martin et al., 2017) and individuals with concussion (Corbin-Berrigan et al., 2018), and improving biological perception of motion in the elderly (Legault and Faubert, 2012). The NeuroTracker application has not been validated.

In the fields of human factors and ergonomics, there is increasing interest in methods to assess cognitive load. Understanding cognitive load has important implications for concentration, attention, task performance, and safety (Mandrick et al., 2016). The temporal association between neuronal activity and regional cerebral blood flow (so-called "neurovascular coupling") is recognized as fundamental to evaluating cognitive load. This assessment is possible by combining ambulatory functional neuroimaging techniques such as EEG and functional near infrared spectroscopy (fNIRS) (Mandrick et al., 2016). Research exists on cognitive load while walking in healthy young and older adults (Mirelman et al., 2014; Beurskens et al., 2016; Fraser et al., 2016), but there does not appear to be any research to date evaluating cognitive load in athletes. A number of portable devices listed in **Table 5** measure fNIRS, and some also measure EEG and EMG. These integrated platforms for measuring/assessing multiple TABLE 5 | Wearable devices and mobile applications for monitoring psychological stress, brain activity, and cognitive function.


(Continued)

#### TABLE 5 | Continued


(Continued)

#### TABLE 5 | Continued


physiological systems present significant value for various applications. These devices all measure physiological signals directly from the brain and other parts of the body. Research using these devices has demonstrated agreement between measurements obtained from fNIRS vs. the gold standard of functional magnetic resonance imaging (Mehagnoul-Schipper et al., 2002; Huppert et al., 2006; Sato et al., 2013; Moriguchi et al., 2017). These devices require some expertise and specialist training.

Concussion is a common occurrence in sport, combat situations, the workplace, and in vehicular accidents. There is an ever-growing need for simple, valid, reliable, and objective methods to evaluate the severity of concussion, and to monitor recovery. A number of mobile applications and wearable devices have been designed to meet this need. These devices are of potential value for team doctors, physical trainers, individual athletes, and parents of junior athletes.

The King-Devick Test <sup>R</sup> is a mobile application based on monitoring oculomotor activity, contrast sensitivity, and eye movement to assess concussion. It has been tested extensively in various clinical settings, and proven to be easy to use, reliable, valid, sensitive, and accurate (Galetta et al., 2011; King et al., 2015; Seidman et al., 2015; Walsh et al., 2016). Galetta et al. (2011) examined the value of the King-Devick Test <sup>R</sup> for assessing concussion in boxers. They discovered that worsening scores for the King-Devick Test <sup>R</sup> were restricted to boxers with head trauma. These scores also correlated (ρ = 90; p = 0.0001) with scores from the Military Acute Concussion Evaluation, and showed high test–retest reliability (intraclass correlation coefficient 0.97 [95% confidence interval 0.90–1.0]). Other studies have reported a very similar level of reliability (King et al., 2015). Performance in the King-Devick Test <sup>R</sup> is significantly impaired in American football players (Seidman et al., 2015), rugby league players (King et al., 2015), and combat soldiers (Walsh et al., 2016) experiencing concussion. Because the King-Devick Test <sup>R</sup> is simple to use, it does not require any medical training, and is therefore suitable for use in the field by anyone.

The EyeSync <sup>R</sup> device employs a simple test that records eye movement during a 15-s circular visual stimulus, and provides data on prediction variability within 60 s. It is not yet commercially available, and has therefore not been validated. The BrainCheck SportTM mobile application employs the Flanker and Stroop Interference test to assess reaction time, the Digit Symbol Substitution test to evaluate general cognitive performance, the Trail Making test to measure visual attention and task switching, and the Coordination test. It has not been independently validated, but is quick and uses an array of common cognitive assessment tools.

The Sway mobile application tests balance and reaction. Its balance measurements have been validated in small scale studies (Patterson et al., 2014a,b). Performance in the Sway test was inversely correlated (r = −0.77; p < 0.01) with performance in the Balance Error Scoring System test (Patterson et al., 2014a) and positively correlated (r = 0.63; p < 0.01) with performance in the Biodex Balance System SD (Patterson et al., 2014b). Further testing is needed to confirm these results. One limitation of this test is the risk of bias that may occur if individuals intentionally underperform during baseline testing to create lower scores than they may attain following a concussion (so as to avoid time out of competition after concussion).

Various microsensors have been developed for measuring impact forces associated with concussion (**Table 5**). Some of these microsensors attach to the skin, whereas others are built into helmets, headwear or mouth guards. The X-Patch Pro device is a device that attaches behind the ear. Although it has not been scientifically validated against any gold standard, it has been used in published concussion research projects (Swartz et al., 2015; Reynolds et al., 2016), which supports its sensitivity for assessing head impact forces. The PreventTM mouth guard is a new device for measuring the impact of head collisions. Its benefits include objective and quantitative data on the external force applied to the head. Many of the sensors vary in accuracy, and only record linear and rotational acceleration. Whereas, many sports involve constantly changing of direction, planes of movement will provide the most accurate data. A study by Siegmund et al. (2016) reported that the Head Impact Telemetry System (HITS) sensors detected 861 out of the 896 impacts (96.1%). If a sensor is detecting better than 95%, it has good reliability. However, helmetless sports have fewer options for such accuracy and actionable data.

### CONSIDERATIONS AND RECOMMENDATIONS

In a brief, yet thought-provoking commentary on mobile applications and wearable devices for monitoring sleep, Van den Bulck makes some salient observations and remarks that are applicable to all forms of consumer health technologies (Van den Bulck, 2015). Most of these technologies are not labeled as medical devices, yet they do convey explicit or implicit value statements about our standard of health. There is a need to determine if and how using technology influences peoples' knowledge and attitude about their own health. The

ever-expanding public interest in health technologies raises several ethical issues (Van den Bulck, 2015). First, self-diagnosis based on self-gathered data could be inconsistent with clinical diagnoses provided by medical professionals. Second, although self-monitoring may reveal undiagnosed health problems, such monitoring on a large population level is likely to result in many false positives. Last, the use of technologies may create an unhealthy (or even harmful) obsession with personal health for individuals or their family members who use such technologies (Van den Bulck, 2015). Increasing public awareness of the limitations of technology and advocating health technologies that are both specific and sensitive to certain aspects of health may alleviate these issues to some extent, but not entirely.

For consumers who want to evaluate technologies for health and performance, we propose a matrix based around two dimensions: strength of evidence (weak to strong) and effectiveness (low to high) (**Figure 2**). This matrix is based on a continuum that was developed for use in a different context (Puddy and Wilkins, 2011), but is nonetheless appropriate for evaluating technology. When assessing the strength of evidence for any given technology, consumers should consider the following questions: (i) how rigorously has the device/technology been evaluated? (ii) how strong is the evidence in determining that the device/technology is producing the desired outcomes? (iii) how much evidence exists to determine that something other than this device/technology is responsible for producing the desired outcomes? When evaluating the effectiveness of technologies, consumers should consider whether the device/technology produces desirable or non-desirable outcomes. Applying the matrix in **Figure 2**, undetermined technologies would include those that have not been developed according to any real-world need and display no proven effect. Conversely, well-supported technologies would include those that have been used in applied studies in different settings, and proven to be effective.

Most of the health and performance technologies that we have reviewed have been developed based on real-world needs, yet only a small proportion has been proven effective through rigorous, independent validation (**Figure 3**). Many of

these technologies described in this review should therefore be classified "emerging" or "promising." Independent scientific validation provides the strongest level of support for technology. However, it is not always possible to attain higher standards of validation. For example, cognitive function is underpinned by many different neurological processes. Accordingly, it is difficult to select a single neurological measurement to compare against. Some technologies included in this review have not been independently validated per se; but through regular use in academic research, it has become accepted that they provide reliable and specific data on measurement items of interest. Even without formal independent validation, it is unlikely (in most instances at least) that researchers would continue using such technologies if they did not offer reliable and specific data. In the absence of independent validation, we therefore propose that technologies that have not been validated against a gold standard (but are regularly used in research) should be considered as "wellsupported." Other technical factors for users to consider include whether the devices require calibration or specialist training to set up and interpret data, the portability and physical range for signal transmission/recording, Bluetooth/ANT+ and real-time data transfer capabilities, and on-board or cloud data storage capacity and security.

From a research perspective, consumer health technologies can be categorized into those that have been used in validation studies, observational studies, screening of health disorders, and intervention studies (Baron et al., 2017). For effective screening of health disorders and to detect genuine changes in health outcomes after lifestyle interventions, it is critical that consumer health technologies provide valid, accurate and reliable data (Van den Bulck, 2015). Another key issue for research into consumer health technologies is the specificity of study populations with respect to the intended use of the technologies. If technologies have been designed to monitor particular health conditions (e.g., insomnia), then it is important for studies to include individuals from the target population (as well

### REFERENCES

Baron, K. G., Duffecy, J., Berendsen, M. A., Cheung Mason, I., Lattie, E. G., and Manalo, N. C. (2017). Feeling validated yet? A scoping review of the use of consumer-targeted wearable and mobile technology to measure and improve sleep. Sleep Med. Rev. doi: 10.1016/j.smrv.2017.12.002. [Epub ahead of print].

as healthy individuals for comparison). Scientific validation may be more achievable in healthy populations compared with populations who have certain health conditions (Baron et al., 2017). There is some potential value for commercial technology companies to create registries of people who use their devices. This approach would assist in collecting large amounts of data, which would in turn provide companies with helpful information about the frequency and setting (e.g., home vs. clinic) of device use, the typical demographics of regular users, and possible feedback from users about devices. Currently, very few companies have established such registries, and they are not consistently publishing data in scientific journals. Proprietary algorithms used for data processing, the lack of access to data by independent scientists, and non-random assignment of device use are also factors that are restricting open engagement between the technology industry and the public at the present time (Baron et al., 2017).

It would seem advisable for companies producing health and performance technologies to consult with consumers to identify real-world needs and to invest in research to prove the effectiveness of their products. However, this seems to be relatively rare. Budget constraints may prevent some companies from engaging in research. Alternatively, some companies may not want to have their products tested independently out of a desire to avoid public scrutiny about their validity. In the absence of rigorous testing, before purchasing health and performance technologies, consumers should therefore carefully consider whether such technologies are likely to be genuinely useful and effective.

#### AUTHOR CONTRIBUTIONS

JP and JS conceived the concept for this review. JP, GK, and JS searched the literature and wrote the manuscript. JP designed the figures. JP, GK, and JS edited and approved the final version of the manuscript.

### FUNDING

This work was supported by funding from Sport Performance Innovation and Knowledge Excellence at the Queensland Academy of Sport, Brisbane, Australia.

#### ACKNOWLEDGMENTS

We acknowledge assistance from Bianca Catellini in preparing the figures.


Strength Cond. Res. 30, 2212–2218. doi: 10.1519/JSC.00000000000 01307


decrease head impacts in football players. J. Athl. Train. 50, 1219–1222. doi: 10.4085/1062-6050-51.1.06


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Peake, Kerr and Sullivan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Quantified Soccer Using Positional Data: A Case Study

Svein A. Pettersen<sup>1</sup> \*, Håvard D. Johansen<sup>2</sup> , Ivan A. M. Baptista<sup>1</sup> , Pål Halvorsen3,4 and Dag Johansen<sup>2</sup>

<sup>1</sup> School of Sport Sciences, UIT The Arctic University of Norway, Tromsø, Norway, <sup>2</sup> Department of Computer Science, UIT The Arctic University of Norway, Tromsø, Norway, <sup>3</sup> ForzaSys AS, Oslo, Norway, <sup>4</sup> Simula Research Laboratory, Oslo, Norway

Performance development in international soccer is undergoing a silent revolution fueled by the rapidly increasing availability of athlete quantification data and advanced analytics. Objective performance data from teams and individual players are increasingly being collected automatically during practices and more recently also in matches after FIFA's 2015 approval of wearables in electronic performance and tracking systems. Some clubs have even started collecting data from players outside of the sport arenas. Further algorithmic analysis of these data might provide vital insights for individual training personalization and injury prevention, and also provide a foundation for evidence-based decisions for team performance improvements. This paper presents our experiences from using a detailed radio-based wearable positioning data system in an elite soccer club. We demonstrate how such a system can detect and find anomalies, trends, and insights vital for individual athletic and soccer team performance development. As an example, during a normal microcycle (6 days) full backs only covered 26% of the sprint distance they covered in the next match. This indicates that practitioners must carefully consider to proximity size and physical work pattern in microcycles to better resemble match performance. We also compare and discuss the accuracy between radio waves and GPS in sampling tracking data. Finally, we present how we are extending the radio-based positional system with a novel soccer analytics annotation system, and a real-time video processing system using a video camera array. This provides a novel toolkit for modern forward-looking soccer coaches that we hope to integrate in future studies.

#### Edited by:

Billy Sperlich, Universität Würzburg, Germany

#### Reviewed by:

Filipe Manuel Clemente, Polytechnic Institute of Viana do Castelo, Portugal Yehuda Weizman, Swinburne University of Technology, Australia

> \*Correspondence: Svein A. Pettersen svein.arne.pettersen@uit.no

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 24 November 2017 Accepted: 18 June 2018 Published: 06 July 2018

#### Citation:

Pettersen SA, Johansen HD, Baptista IAM, Halvorsen P and Johansen D (2018) Quantified Soccer Using Positional Data: A Case Study. Front. Physiol. 9:866. doi: 10.3389/fphys.2018.00866 Keywords: player load, athlete quantification, GPS tracking, LPM tracking, wearables, player monitoring

# 1. INTRODUCTION

Over the last decade, we have witnessed the emergence of a myriad of wearable devices and sensors for quantification of sport and physical activity. These are frequently touted as a game changer and a key for future development of many sports. Key sport governance organizations like Fédération Internationale de Football Association (FIFA), with its 265 million members in various local clubs world-wide (Kunz, 2007), have already approved use of wearables and Electronic Performance and Tracking Systems (EPTSs) in official matches. This has undoubtedly accelerated research and development of athlete quantification technology. Training and matches are already being impacted. For instance, it is believed that the German national soccer team used wearable technology to profile the players, and with these statistics, coach Joachim Low made the crucial substitute of Mario Götze who scored the winning goal in the world cup final in Brazil 2014.

**264**

Although such success stories certainly do exist, the general usefulness of athlete quantification technologies has several shortcomings. The aim of this paper is to highlight some of the challenges we encountered when using positional data as part of research and team development, and to suggest other promising data sources. Our main observation is that athlete quantification systems are often inhibited by questionable validity of acquired data. We argue that by combining data from multiple systems, some of the shortcomings of existing positional tracking systems can be detected and perhaps avoided. All data in this report was collected from autumn 2011 until spring 2017. All participants have given their written informed consent, and the project has been given institutional approval.

### 2. TRACKING USING LPM (RADIO SIGNALS) AND GPS IN A PROFESSIONAL FOOTBALL CLUB

Football is an open-loop sport, and it is important to emphasize the need for more research to develop our understanding of valid indications of physical match performance and competitive success (Carling, 2013). Toward that end, the athlete quantification technologies deployed in our research facilities at Alfheim Stadium is already generating important insight. At Alfheim Stadium, there has been a substantial development and use of various tracking technology, including multiple camera semi-automatic systems, Local Position Measurement (LPM) systems, and GPS systems, each capable of quickly recording and storing data about team players. We have to a large extend moved away from GPS based technology, which has traditionally been the preferred choice by clubs to quantify training load of teamsports athletes, both during training and matches (Aughey, 2010).

An alternative to GPS based systems, are those based on LPM radio signals. Unlike GPS systems, where devices are passive receivers of signals from overhead satellites, LPM systems work by having the wearable emit signals to local receivers, which do the actual triangulation. Our experience is that LPM systems have better accuracy than GPS-based systems. In our case, we have several years of experience with positional tracking using the stationary LPM system: ZXY Sport Tracking System by ChyronHego (Trondheim, Norway). This system is based on using the 5.0 GHz Industrial, Scientific, and Medical (ISM) radio band for communication and signal transmissions. With ZXY, each player wears a belt with a transponder placed at his lumbar (Pettersen et al., 2014), and there are six stationary sensors placed at the stadium perimeter. The stationary sensors compute the position data for each belt by advanced vector based processing of the received radio signals. The processing system in each stationary sensor enables direct projection of the player's positions on the field without having to exchange data with other sensors. Multiple receivers are still required to cover the entire field and to avoid occlusions. The default resolution is fixed to 20 Hz for each belt. Data is stored in the system's internal database and can be exported as comma separated values files.

To quantify the accuracy difference of GPS technology compared to LPM systems, we performed two studies, as will be described next.

### 2.1. Study 1 and Study 2: GPS vs. LPM-Tracking

In Study 1 (2011), we instrumented 6 high-level female players (weight 59.6 ± 6.8 kg, height 171.5 ± 4.2 cm) with both GPS and LPM tags and instructed them to perform the Copenhagen Soccer Test for Women (CSTw). Each player ran the CSTw course 18 times, simulating a match and accumulating a distance of 10,331 m (Bendiksen et al., 2013). Each player wore two GPS tags from the GPSport SPI-ProX1 5.0 Hz system in a vest on their upper body, and two ZXY tags placed in a small belt near the lumbar spine. Having multiple tags enables us to measure both the inter and the intra reliability of the systems.

The average distance covered was measured by SPI-ProX1 (12 tags on 6 players) to 11,668 ± 1,072 m with a CV value of 6%, while ZXY (14 tags on 7 players) measured the distance to 10,204 ± 103 m with a CV value of 1%. For High Intensity Runs (HIRs) (>16.0 km h−1), the values were 612 ± 433 m with a CV value of 37.4% and 1238 ± 38 m with a CV value of 3.1%, respectively.

In the intra reliability test, the measured discrepancy between the two tags placed on the same player ranged between 800 and 2,071 m using SPI-ProX1 and 25–290 m using ZXY. Our observation that the SPI-ProX1 system seems to measure higher values for total distance covered is further supported by an experiment where 19 players of two junior elite teams were equipped with both ZXY and SPI-ProX1. The average distance covered was measured by SPI-ProX1 to 10,805 ± 847 m, while ZXY measured the distance to 9,891 ± 974 m (Johansen et al., 2013).

In Study 2 (2016), 12 male youth elite players (weight 64.2 ± 8.2 kg, height 176.0 ± 6.7 cm) were instructed to jog clockwise around the pitch at Alfheim Stadium, following the side and end-lines of the pitch. All players wore both the Polar Team Pro 10 GHz GPS system (Kempele, Finland) and the ZXY system. The GPS tags were connected to the anterior part of the chest by a elastic chest strap. **Figure 1B** shows the recorded positional information for both Polar and ZXY. (The Polar system could not plot more than five players per figure.) As can be seen in the figures, players were not capable of performing 90◦ turns in the corners, which is to be expected. The GPS tracks in **Figure 1B** can clearly bee seen to deviate significantly from the actual trajectory of the players, while the tracks shown in **Figure 1A** much more closely follow the lines. A similar effect was also observed by Buchheit et al. (2014).

Next, seven of the twelve players were selected to complete a training session. With statistical significance levels obtained by Paired T-test, sprint performance (>25.2 km h−1) was measured lower by ZXY 55.3 ± 7.3 m compared to Polar Team Pro 70.0 ± 12.9 m (P > 0.05). HIR and number of accelerations (≥2 m s−2) showed an inverse tendency with higher values 222.8 ± 77.8 m and 100.9 ± 19.9 counts vs. 164.4 ± 54.9 m and 81.0 ± 15.9 counts (ns). All tracking generated raw data was

loaded into Microsoft Excel, where statistical procedures were executed.

It could be speculated that the GPS signal reception at Alfheim Stadium is poor. However, the stadium does not have an overhanging roof, nor are there any nearby high buildings that obscures the sky. A few 9 m high stands are located 9.3 m behind the sidelines, but we do not suspect these to interfere with the GPS signal. Measurement accuracy may still be reduced by atmospheric conditions such as clouds and fog. A more plausible explanation is perhaps the stadium's arctic location at 69.65◦ north. The inclination of GPS satellite orbits is approximately 55◦ (north or south), so that no satellites have been directly overhead during our tracking sessions (Langley, 1999). High error rates have, however, been reported elsewhere for inter-unit reliability across different GPS models (Jennings et al., 2010; Castellano et al., 2011). A stationary reference GPS receiver can improve accuracy by averaging its position over time. As long as such a reference receiver detects the same satellite signals as the wearable GPS receiver, it can send correction data. In the northern areas, GPS based solutions that also communicate with the Russian Global Navigation Satellite System (GLONASS) system should also be considered as these generally provide better precision here. Still, ours and Stevens et al. (2014) findings indicate superior accuracy in Local Position Systems (LPS) compared to GPS. It remains unclear to what extent the inherent accuracy limitation in the GPS system limits its usefulness for athlete quantification.

Although the CSTw has a 10,331 m preset course that the players should follow, some discrepancies in the measured distance are to be expected. Even small deviation of the sensor device from the set trajectories of the test, like the player leaning in the turns of the course, will impact the measurements and adds up throughout the test. However, the high meter values in relation to the course length and in addition the large CV between units of the SPI-ProX1 system suggest that the results should be interpreted with caution.

Using an absolute sprinting or high-velocity threshold for all athletes in a team does not account for individual genetic or physiological differences. The same external load calculated by an acceleration, HIR, or sprinting threshold for two athletes could represent a different internal load based on individual characteristics (Impellizzeri et al., 2004). Positive and negative accelerations are metabolically demanding and often do not elicit velocities defined as HIR or sprint (Osgnach et al., 2010). The starting velocity is critical when measuring accelerations or decelerations, the metabolic cost of changing speed more than 2.0 m s−2 is much larger at a starting speed of 5.0 m s−1 compared to 1.0 m s−1. In addition, quantification of these variables is dependent upon the validity and reliability of athlete tracking systems.

An alternative may be individual thresholds for external load expressed relative to maximum speed attained during sprint testing. An individualized approach of arbitrarily derived velocity thresholds may benefit the training prescription for players, but will limit comparisons with other teams and leagues. Limited research exists on how to individualize accelerations, which are energy demanding, and therefore, we will have limited information on total external load even with individualized speed zone limits (Sweeting et al., 2017).

### 2.2. Study 3: High Intensity Activity in Training vs. Match

In Study 3 (2017), 5 players (age 25.2 ± 4.0 , height 178.4 ± 5.0 cm, weight 75.2 ± 6.6 kg) were randomly selected from 5 different playing positions: central back, full back, TABLE 1 | High-intensity actions (HIRs and Sprints) and number of appearances (counts) and/or meters for five training sessions, compared to an official match in five players in different positions.


CB, Center back; FB, Full back; CM, Center midfield; WM, Wide midfield; CF, Center forward.

The difference (% match) correspond to the total value of the training week compared to the match. The value of the match is considered as 100%. Example from a normal microcycle (5 training sessions between two official matches).

central midfielder, wide midfielder, and central forward. The players were tracked in 5 consecutive in-season training sessions (microcycle) and in one official home match. Distances and number of HIR and sprints were compared (**Table 1**). We observed large discrepancies in high-intensity activities between trainings in the microcycle and match. As shown in **Table 1**, we have recorded substantial underload in HIR and sprint for most players during the training week compared to macth. Following the principle of overload, this indicates that the format of the small side games does not elicit the sufficient amount of HIR and sprint, with exception of the central forward position in the team's style of play. Practitioners should be aware of and take into consideration how different pitch size and number of players dictate the external and internal training load.

From a training load perspective, the large intra/inter unit differences in tracked distance described in section 2 can also have significant practical implications for an athlete across a longitudinal period, which questions meaningful interpretation of the data. For within-athlete longitudinal monitoring, we therefore recommended that practitioners assign a specific device to each athlete. To appropriately detect changes in physical performance, researchers must also account for match-tomatch variation and device reliability. Any possible interference between co-located devices has to our knowledge not yet been fully explored. Nevertheless, developing a device including algorithms describing position-specific match demands might be useful to control training load in relation to match demands. By integrating information about training content, load periodization, and fatigue status we can provide real-world insight into optimal approaches for player preparation.

#### 3. PERSPECTIVE

The studies described above indicate that existing positional technologies do not guarantee an accurate measurement of player locomotor activities. We are therefore experimenting with two specific supplemental data sources that we plan to integrate in future studies: one based on video and one based on selfreporting.

#### 3.1. Full-Stadium Video Coverage

Video of player actions are generally considered a useful tool for soccer analytics. Videos have traditionally been obtained from the following three sources: professional TV broadcasts, hand-held cameras, or fixed arena cameras. Unfortunately, these sources are either not available for practices, too personnel demanding, or too costly. More importantly, none of these solutions provide a sufficient high-resolution coverage of all players throughout a session. Our solution was to develop the Bagadus (Stensland et al., 2014) video system.

Bagadus consists of multiple small shutter and exposure synchronized cameras that record a high-resolution video of the soccer field. The cameras are set in a circular pattern; pitched, yawed, and rolled to look directly through a point five cm in front of the lenses, minimizing the parallax effect. Combined, the cameras cover the full pitch with sufficient overlap to identify common features necessary for camera calibration and image stitching to generate a panorama video.

Bagadus video playback can switch between streams delivered from the different cameras, either manually by selecting a camera, or automatically following players based on sensor information. It can also play back a panorama video stitched from the different camera feeds. Using the panorama video, a virtual view can also be extracted (Gaddam et al., 2015), for instance to automatically follow one particular player (Gaddam et al., 2014).

### 3.2. Video Indexing With Rich Metadata

Many elite soccer clubs spend much time on manual laborintensive post-game analysis by carefully watching full-length recordings of the game. By enriching video archives with timesynchronized metadata from external sensors, Bagadus enables a much more efficient video retrieval and summarizing experience, reducing the time needed for coaches to locate relevant video segments. At Alfheim Stadium we found positional data from ZXY particularly useful as it enables Bagadus to track individual players and generate on-the-fly video summaries based on player or group formation and trajectories. For instance, a video summary of all situations where a particular player sprints toward

his own goal, or all situations where the midfielder is in the mid-circle (Mortensen et al., 2014).

In addition to positional data, we have developed an annotation system (Johansen et al., 2012; Stensland et al., 2014) for use during matches to tag important events with metadata as they occur. A key design principle for this system was minimizing deployment effort and hardware investments. Mobile devices like smartphones and tablets are as such ideal platforms as they are highly available, mostly Internet connected, and provide sufficient computational resources. In combination with an tilebased interface optimized for fast input, the average annotation time was cut down to less than 3 seconds (Johansen et al., 2012) while operated on the field. The registered events are time-aligned with the video and stored in an analytic database, immediately available for use by the video retrieval system. This enable video-based team or individual feedback in the locker room during half time, or after practice.

#### 3.3. Individual Subjective Reports

We have also implemented a player monitoring system PMSys: a self-reporting system<sup>1</sup> for mobile devices, which enables monitoring of individual phenotypic parameters through repeated questionnaires that the players answer on their own mobile phones.

Having regular reports from all team members is a key goal for PMSys. As such, a key design requirement was support on all smart-phone platforms (e.g., iOS and Andoid) in use by team members. To reduce the costs of multi-platform support, we opted to develop PMSys as a hybrid-mobile application based on the Ionic 2+ Framework<sup>2</sup> . Recent versions of the framework generate applications that look and feel similar to native ones, and earlier performance and appearance disadvantages are mostly mitigated. PMSys is currently deployed in Google Play for Android devices, and in Apple's iTunes store for iOS devices. The mobile application provides graphical visualization feedback, which gives the player a timeline overview.

In addition to the smart-phone app, we also constructed a web-portal that team coaches can use to analyze and present data. The portal is constructed with the coaches in mind, providing several tools and plots for teams and individual players. In combination with the web portal and mobile application, we have implemented our own communication service between the mobile phone and the web portal, allowing a coach to send

<sup>1</sup>PMSys, http://forzasys.com/pmsys.html.

#### REFERENCES


push-messages directly to a player's mobile phone. A key feature of PMSys is the ability for coaches to schedule future and repeated push-messages.

Our experience with PMSys Athlete Self-Report Measures (ASRM) at Alfheim, is that education and feedback is of utmost importance to maintain daily usage. The scope of education should include why an ASRM should be used, the purpose of the questions asked, and who is analysing the data. Education should emphasize that results are to be used for the player benefit, and not to their detriment. Feedback should consist of daily interactions and reminders pushed directly to the users device, showing what action is taken in response to reported data. During the season, the generated daily wellness reports may form the basis of the regular conversations between coaching staff and players. Engagement of staff, especially in the implementation process, is essential (Saw et al., 2015), with particular emphasis on the need for a key-staff member to oversee the day-to-day responses and be able to analyze and interpret the ASRM.

By complimenting GPS and LPM positional data, like the ones we have used in our previous studies, with data from video and self-reporting tools, we hope to better predict injury or reduced performance for a player. The extended data sources are in particular interesting when considered as additional input to modern machine learning algorithms.

#### ETHICS STATEMENT

The study is approved by the Norwegian Centre for Research Data and the players have given their written informed consent to participate.

#### AUTHOR CONTRIBUTIONS

SP: data collection, in charge of the writing process; HJ, IB, PH, and DJ: data collection, manuscript writing.

#### FUNDING

This work was supported in part by the Norwegian Research Council project numbers 250138 and 263248.

#### ACKNOWLEDGEMENTS

The publication charges for this article have been funded by a grant from the publication fund of UiT The Arctic University of Norway.


<sup>2</sup>https://ionicframework.com/


**Conflict of Interest Statement:** We hereby declare that PH is employed in a part-time position at ForzaSys AS.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Pettersen, Johansen, Baptista, Halvorsen and Johansen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Exercise Intensity During Cross-Country Skiing Described by Oxygen Demands in Flat and Uphill Terrain

Øyvind Karlsson<sup>1</sup> \*, Matthias Gilgien1,2, Øyvind N. Gløersen1,3, Bjarne Rud<sup>1</sup> and Thomas Losnegard<sup>1</sup> \*

<sup>1</sup> Department of Physical Performance, Norwegian School of Sport Sciences, Oslo, Norway, <sup>2</sup> Norwegian Ski Federation, Alpine Skiing, Oslo, Norway, <sup>3</sup> Condensed Matter Physics, Department of Physics, University of Oslo, Oslo, Norway

#### Edited by:

Billy Sperlich, Universität Würzburg, Germany

#### Reviewed by:

Beat Knechtle, University Hospital Zurich, Switzerland Sabrina Skorski, Saarland University, Germany

#### \*Correspondence:

Øyvind Karlsson oyvind.karlsson@gmail.com Thomas Losnegard thomas.losnegard@nih.no

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 08 March 2018 Accepted: 14 June 2018 Published: 09 July 2018

#### Citation:

Karlsson Ø, Gilgien M, Gløersen ØN, Rud B and Losnegard T (2018) Exercise Intensity During Cross-Country Skiing Described by Oxygen Demands in Flat and Uphill Terrain. Front. Physiol. 9:846. doi: 10.3389/fphys.2018.00846 Purpose: In this study wearable global navigation satellite system units were used on athletes to investigate pacing patterns by describing exercise intensities in flat and uphill terrain during a simulated cross-country ski race.

Methods: Eight well-trained male skiers (age: 23.0 ± 4.8 years, height: 183.8 ± 6.8 cm, weight: 77.1 ± 6.1 kg, VO2peak: 73 ± 5 mL·kg−<sup>1</sup> ·min−<sup>1</sup> ) completed a 13.5-km individual time trial outdoors and a standardized indoor treadmill protocol on roller skis. Positional data were recorded during the time trial using a differential global navigation satellite system to calculate external workloads in flat and uphill terrain. From treadmill tests, the individual relationships between oxygen consumption and external workload in flat (1◦ ) and uphill (8◦ ) terrain were determined, in addition to VO2peak and the maximal accumulated O2-deficit. To estimate the exercise intensity in the time trial, the O2-demand in two different flat and five different uphill sections was calculated by extrapolation of individual O2-consumption/workload ratios.

Results: There was a significant interaction between section and average O2-demands, with higher O2-demands in the uphill sections (110–160% of VO2peak) than in the flat sections (≤100% of VO2peak) (p < 0.01). The maximal accumulated O2-deficit associated with uphill treadmill roller skiing was significantly higher compared to flat (6.2 ± 0.5 vs. 4.6 ± 0.5 L, p < 0.01), while no significant difference was found in VO2peak.

Conclusion: Cross-country (XC) skiers repeatedly applied exercise intensities exceeding their maximal aerobic power. 6O2-deficits were higher during uphill skiing compared to flat which has implications for the duration and magnitude of supramaximal work rates that can be applied in different types of terrain.

Keywords: cross-country skiing, exercise intensity, external power, global navigation satellite system, metabolic rate, pacing

# INTRODUCTION

fphys-09-00846 July 5, 2018 Time: 19:30 # 2

Cross-country (XC) skiing is an endurance sport in which the goal is to cover a known distance in the shortest time possible. Unlike most other endurance sports such as track running, rowing, or swimming, a substantial variation in speed exists, since competition courses in XC skiing must consist of approximately one-third ascending, one-third flat and one-third descending terrain (FIS, 2017). The large fluctuations in speeds, imposed by the topography of the course, challenge skiers' ability to control the exercise intensity, also described as the athletes pacing (Abbiss and Laursen, 2008).

It is widely accepted that pacing patterns have a significant influence on performance in a variety of sports, including XC skiing (Abbiss and Laursen, 2008; Losnegard et al., 2017). Theoretically, an even pacing pattern is regarded as optimal for performance in endurance sports events with durations > 2 min, where athletes race against the clock over a known distance (Abbiss and Laursen, 2008). In contrast, studies of running (Tucker et al., 2006; Hanley, 2015), cycling (Thomas et al., 2012), mountain bike (Martin et al., 2012), and rowing (Garland, 2005) have shown that athletes, in fact, apply positive, J-shaped or variable pacing patterns. Furthermore, studies on pacing patterns in XC ski racing have consistently shown that, on a lap-by-lap basis, XC skiers apply a positive pacing pattern independent of both race distance and level of the skiers (Larsson and Henriksson-Larsen, 2005; Bolger et al., 2015; Formenti et al., 2015; Andersson et al., 2016; Losnegard et al., 2017). However, describing pacing patterns in terms of lap-by-lap comparisons in sports where course topography changes substantially are insufficient due to the non-constant relationship between speed, external work rate, and thereby metabolic energy demand. Therefore, describing such pacing patterns in a sport such as XC skiing, demands alternative methods where the total energy turnover could be estimated.

Previous investigations of exercise intensity and pacing patterns in XC skiing have mainly focused on sprint skiing (≤1.8 km) (Andersson et al., 2010, 2016; Sandbakk et al., 2010, 2011). The pacing pattern in XC sprint skiing has been shown to be regulated according to the terrain, with skiers applying considerably higher metabolic rates during uphill compared to flat sections of the course (Andersson et al., 2016; Sandbakk and Holmberg, 2017). This is in line with computer modeling from XC skiing and road cycling, which suggests that increased exercise intensity in uphill terrain improves performance compared to maintaining an even exercise intensity (Swain, 1997; Atkinson et al., 2007; Sundstrom et al., 2013). Moreover, estimations of the work rate during single uphill sections of competitive skiing have revealed metabolic rates of approximately 110–160% of peak aerobic power (Norman et al., 1989; Sandbakk et al., 2011), implying a considerable anaerobic energy production. However, no study has investigated anaerobic energy turnover during competitions in XC skiing. In running, Olesen (1992) has shown that the anaerobic capacity (with the maximal accumulated oxygen deficit method) during uphill running is higher compared to running on flat terrain, which may also apply to XC skiing (Andersson et al., 2016). Consequently, this may have implications for the maximal metabolic power attainable in different terrains and adds to the complexity of pacing in XC skiing. However, except for the pioneering work by Norman et al. (1989) conducted nearly 30 years ago, little information is available on exercise intensity in various terrains in distance XC skiing (>10 and 15 km for female and male skiers, respectively). Moreover, to our knowledge, no study has investigated individual energy turnover rates in different terrains during XC skiing. Since specificity is a known principle in training, a more detailed evaluation of sport specific requirements may, therefore, contribute to optimizing training and competition strategies.

One challenge when estimating energy turnover, and thereby exercise intensity, in XC skiing on snow is to control the ski friction, and thereby the external load. An alternative approach is to use roller skis, which are also used in treadmill skiing. The relationship between external workload and metabolic cost can, therefore, be determined by testing skiers during treadmill roller skiing (Sandbakk et al., 2010). Recent advances in wearable sensor technology allow tracking of athletes as a point mass model for position, speed, and acceleration, by using a differential global navigational satellite system (dGNSS), during outdoor roller skiing (Larsson and Henriksson-Larsen, 2005; Andersson et al., 2010). Using this wearable technology, the external work load can be determined (Swaren and Eriksson, 2017) and the metabolic cost of roller ski racing can be estimated to illustrate the exercise intensity during a race.

The present study, therefore, investigated pacing patterns in a 13.5 km self-paced time trial (TT), performed on roller skies on an international race course, by describing exercise intensities in flat and uphill terrain. To determine exercise intensities in the TT, external workloads were derived from accurate positioning data collected with a dGNSS system worn by the skiers. The external workloads from the TT were converted into energy demands using individual relationships between energy cost and workload collected during treadmill roller skiing. We hypothesized that: (I) During a XC distance race, skiers apply a variable pacing pattern; (II) XC skiers repeatedly perform exercise intensities exceeding their peak aerobic power during a XC distance race.

### MATERIALS AND METHODS

### Subjects

Eight well-trained male XC skiers (mean ± SD: age, 23.0 ± 4.8 years; body mass, 77.1 ± 6.1 kg; height 1.84 ± 0.07 m) volunteered to participate in the study. The skiers were recruited via convenience sampling using the following criteria: (1) either active or former active competitor at a national level in Norway, (2) experienced with treadmill roller skiing and (3) familiar with the specific race course. The protocol was approved by the local ethics committee of the Norwegian School of Sport Sciences and the Norwegian Social Science Data Services (NSD). All subjects gave written informed consent in accordance with the Declaration of Helsinki. If younger than 18 years of age, parental written consent and assent from the skier were obtained.

### Experimental Overview

fphys-09-00846 July 5, 2018 Time: 19:30 # 3

All skiers attended two separate sessions, separated by 11.0 ± 4.9 days. In the first session, the skiers completed a self-paced 13.5-km individual TT on an international race course to simulate a XC ski race. In the second session, individual relationships between external work rate and oxygen consumption (VO2), and peak oxygen consumption (VO2peak) and maximal accumulated O2-deficit in flat and uphill terrain was determined in the laboratory on a roller ski treadmill. The O2-demand in seven sections of the TT course was estimated by extrapolating the individual linear relationships between VO<sup>2</sup> and workload using individual positioning data collected in the TT. All tests were performed on roller skis using the skate technique. The same test leaders conducted all tests.

#### Time Trial

The individual TTs were carried out in the roller ski course in Holmenkollen (Oslo, Norway). The course profile resembled the actual profile of a XC-ski course used in the FIS World Cup and consisted of three identical laps of 4.5 km (height difference 51 m, maximum climb 32 m, total climb 166 m). The TTs were conducted on two separate days; six skiers completed the TT on the first day and two skiers on the second day. Before starting the TT, the skiers performed 20 min of individual warm-up, wearing the dGNSS equipment and the assigned test skis to familiarize themselves with the equipment. The skiers started the TT at 2-min intervals and were instructed to complete the TT as fast as possible. No instructions regarding pacing patterns were given. Continuous individual positioning data were recorded with a dGNSS system during the TT. Heart rate data were recorded with a separate HR monitor. In two preselected sections of the course, one uphill (S4) and one flat (S7), video recordings of the skiers were conducted to determine sub-techniques applied in these sections. In addition, the skiers verbally reported their rating of perceived exertion (RPE) on a category ratio scale (Foster et al., 2001). Air temperature during the outdoor TTs was between 8 and 16◦C, and air pressure was approximately 1005 hPa. Local wind direction was northeast and southeast on the first and second day, respectively. The asphalt was completely dry on both occasions.

### Laboratory Tests

The indoor tests were performed on a roller ski treadmill. Speeds, inclinations, and sub-techniques were chosen to resemble those from the outdoor TT to enable estimation of O2-demands during flat and uphill skiing. The indoor test protocol is illustrated in **Figure 1**. First, skiers completed a standardized 15 min warm up at 3◦ and 3.0 m·s −1 (∼ 60–75% of HRpeak). The skiers then performed six submaximal workloads divided into two subsets consisting of three flat and three uphill workloads, respectively. The flat subset was meant to resemble the flat sections of the TT course, and was carried out at 1◦ and 4.5, 5.5, and 6.5 m·s <sup>−</sup><sup>1</sup> using the V2 technique (two pole plants for two ski pushes). The uphill subset was meant to resemble the uphill sections of the TT course (mean incline ≈ 8 ◦ ), and was carried out at 8◦ and 1.5, 1.75, and 2.0 m·s −1 in the V1 technique (one pole plant for two ski pushes). The duration of each submaximal workload was 5 min, and each workload was separated by 2 min. The two subsets were separated by 5 min of passive recovery.

After 8 min of active recovery (∼60–70% of HRpeak) the skiers completed two self-paced 3 min all-out performance tests separated by 20 min as described by Sandbakk et al. (2016). The 3 min all-out performance test has been shown to be a valid method to determine VO2peak during treadmill roller skiing (Losnegard et al., 2012a). The speed was constant the first 30 s starting at 2.5 m·s −1 and 7.5 m·s −1 in the uphill (8◦ ) and flat (1◦ ) performance test, respectively. Thereafter the subjects controlled the speed (uphill; 0.25 m·s −1 , flat; 0.5 m·s −1 increments or decrements by adjusting their position on the treadmill relative to laser beams situated in front of and behind the skier. External power, steady state VO2, respiratory exchange ratio (RER), ventilation (VE) and

breathing rate (BR) were measured continuously during all tests. Heart rate and rating of perceived exertion (RPE) were registered, and blood lactate concentration ([La−]) was measured immediately after the completion of each workload. The order in which the submaximal subsets and the performance tests were performed were counter balanced. The temperature indoors was approximately 20◦C, and the total duration of the indoor session was approximately 1.5 h.

#### External Power

External power (Pext) on the treadmill was calculated as the sum of power against gravity (P<sup>g</sup> ) and power against rolling resistance (Prr), without dGNSS equipment, previously described by Losnegard et al. (2013). External power outdoors (Pext\_out) was calculated as the sum of power against gravity (P<sup>g</sup> ), power against rolling resistance (Prr) and power against air drag resistance (Pd):

$$P\_{\text{ext\\_out}} = \sum P = P\_{\text{g}} + P\_{rr} + P\_d$$

Power against gravity was calculated as the increase in potential energy per unit time:

$$P\_{\mathcal{S}} = m \cdot \mathcal{g} \cdot \sin \alpha \cdot \nu$$

where m represents the total mass of the skier (incl. equipment), g the gravitational acceleration, α the inclination of the course in degrees and v the skier's speed along the track.

Power against rolling resistance (Prr) was calculated as work against rolling resistance forces per unit time:

$$P\_{rr} = C\_{rr} \cdot m \cdot \lg \cdot \cos \alpha \cdot \nu$$

where Cr r represented the coefficient of rolling resistance of the roller skis and was measured (Crr = 0.024) before and after the project using a towing test previously described by Losnegard et al. (2011). We used the same Crr for the treadmill and asphalt surface, as previous studies by our group did not find any differences using the same roller skis, asphalt track and treadmill belt (Myklebust, 2016).

Power against air drag resistance (Pd) was estimated as follows:

$$P\_d = F\_d \cdot \nu$$

where F<sup>d</sup> represents the force from air drag acting on the skier. F<sup>d</sup> was estimated assuming a turbulent air flow and no environmental wind (Sundstrom et al., 2013):

$$F\_d = 0.5 \cdot C\_D A \cdot \rho \cdot \nu\_{air}^2$$

where C<sup>D</sup> represents the drag coefficient, A the projected frontal area of the skier, ρ the air density, and vair the speed of the skier relative to the air. Due to the assumption of no environmental wind, vair was set equal to v. The drag area (CDA) was determined by scaling, as described by Sundstrom et al. (2013).

Air density (ρ) was calculated from ambient temperature measurements on site on the test day. Air pressure (p) was obtained from the meteorological station at Blindern (Oslo, Norway<sup>1</sup> ). Air density ρ was calculated from the following

1 eklima.net equation, assuming dry air:

$$\rho = \frac{P}{R \cdot T}$$

where R is the specific gas constant of dry air (287.058 J·kg−<sup>1</sup> ·K −1 ) and T the ambient temperature in kelvins.

#### Definitions and Data Analysis

The pacing pattern was concidered variable if there were statistically significant changes in exercise intensity, expressed as O<sup>2</sup> demand, throughout the TT. Conversley, the pacing pattern was concidered even if the changes in exercise intensity were statistically non-significant. VO2peak in uphill and flat terrain, respectively, was defined as the highest average 30-s epoch during each of the performance tests. Peak heart rate (HRpeak) was defined as the highest HR registered during the performance tests. Oxygen cost for each workload was defined as the average oxygen consumption between 3 and 4.5 min in each workload. 6O2-deficit was calculated based on the method presented by Losnegard et al. (2012a). Gross efficiency (GE) in the submaximal workloads was defined as the ratio between external power output (W) and aerobic energy turnover rate (W) and was expressed as percentages, as described by Losnegard et al. (2014).

Two regression equations were computed for each athlete, one for flat and one for uphill skiing, assuming a linear relationship between external power and VO2. Individual positional data from each section were standardized according to section length (100 sample points in each section), and individual external work rate was calculated at each sample point. An estimate of the O2-demand was then made using the individuall regression equations and external work rates. Individual section O2-demand was defined as the average O2-demand of the 100 sample points in each section.

#### Instruments and Materials

All tests were performed on Swenor Skate Long roller skis (length: 630 mm, weight incl. binding: 795 g·ski-1, Swenor, Sarpsborg, Norge) equipped with wheel type 2 and Rottefella Xcelerator 2.0 bindings (Rottefella, Klokkarstua, Norge). The skiers used the same pair of roller skis and their personal ski boots and ski poles (90 ± 1% of body height) in both the outdoor TT and the indoor session. Before the indoor session, the ski poles were fitted with customized treadmill ferrules. Laboratory tests were performed on a roller ski treadmill with belt dimension 3 × 4.5 m (Rodby, Södertälje, Sverige).

The dGNSS system used in the TT has previously been described and validated for kinematics (Gilgien et al., 2015) and kinetics (Gilgien et al., 2013) in alpine skiing, and has an expected accuracy < 5 cm when double difference ambiguities are fixed (Gilgien et al., 2014). The dGNSS system consisted of an antenna mounted on the skier's helmet (G5Ant-2AT1, Antcom, United States) connected to a GPS/GLONASS dual frequency (L1/L2) receiver (Alpha-G3T, Javad, United States) placed in a small backpack. Total weight of the dGNSS system was 940 g (receiver 430 g, backpack 350 g, antenna 160 g). A stationary base station was placed in a fixed position close to the course, to facilitate differential positioning. The base station

consisted of an antenna (GrAnt-G3T, Javad, United States) and a receiver (Alpha-G3T, Javad, United States). The antenna was mounted on a tripod and raised approximately 2 m above ground level. The dGNSS measurements were determined in the global coordinate system WGS84 (Universal Transverse Mercator zone 32, northern hemisphere). The dGNSS position was calculated using kinematic carrier phase double difference solutions (Gilgien et al., 2014) at 50 Hz using geodetic post-processing software (Justin, Javad, United States), and filtered using smoothing splines (smoothing parameter p = 0.1) weighted by their fixed/float status (Skaloud and Limpach, 2003). The position measurements were mapped onto a common trajectory based on a kinematic position tracking of the race track sampled at 1 Hz (antenna and receiver: GrAnt-G3T and Alpha-G3T, Javad, United States). The skiers' speed v (Eq. 2) along the track was determined from the time derivative of the positions along the mapping trajectory.

During the laboratory tests, oxygen consumption was measured using an automatic ergospirometry system (Oxycon Pro, Jaeger GmbH, Hoechberg, Germany). Blood lactate concentration was measured in unhemolyzed blood from capillary fingertip samples (YSI 1500 Sport; Yellow Springs Instruments, Yellow Springs, OH, United States). The lactate analyzer and the Oxycon Pro Jaeger Instrument were calibrated according to the instruction manual as described in detail previously (Losnegard et al., 2011). Body mass and mass including equipment were measured before the TT and the treadmill test (Seca model 708; Seca, Hamburg, Germany).

Rating of perceived exhaustion was evaluated using a category ratio RPE scale (0–10) validated by Foster et al. (2001). Heart rate was recorded using the athletes' personal training computers. Video was recorded with two Canon HF100 video cameras (frame rate = 25 Hz, Canon Inc., Tokyo, Japan). Environmental temperature, air pressure, and wind data were retrieved from local weather stations (met.eklima.no, Meteorological Institute of Norway, Oslo, Norway).

#### Statistics

Data are presented as the mean ± standard deviation (SD) unless otherwise stated. Normality of the data was assessed using the Shapiro–Wilks test of normality (α = 0.05). Outliers were assessed by inspection of boxplots and by examination of studentized residuals for values greater than ± 3. Paired sample T-tests were used to detect statistical differences in average speed, VO2peak, 6O2-deficit, HRpeak, [La−], VE and RPE between the 3-min allout performance tests, and between GE during flat and uphill submaximal skiing. One-way repeated measures ANOVAs, with Bonferroni correction for multiple comparisons, were conducted to determine whether there were statistical differences in average lap speed, section O2-demand, section external power and section speed between laps, and in average section O2-demands between sections in the TT. Average HR between laps and RPE during the TT failed the assumption of normality and were analyzed with related samples Friedman's tests, with Bonferroni correction for multiple comparisons. Pearson's Product Moment Correlation Analysis was applied for correlation analysis between VO<sup>2</sup> and external work rate on the treadmill. The strengths of correlation

(r) were interpreted as follows: correlation coefficient (r) < 0.1 trivial; 0.1–0.3, small; 0.3–0.5, moderate; 0.5–0.7, strong; 0.7–0.9, very strong; and 0.9–1.0, almost perfect (Hopkins, 2002). An α level of p ≤ 0.05 was considered significant, and p ≤ 0.10 was considered a tendency. All calculations were performed in MATLAB R2016a (MathWorks, Inc., Natick, MA, United States), and statistical analyses were performed in SPSS Statistics (IBM Corp., Armonk, NY, United States).

## RESULTS

### Laboratory Tests

Oxygen consumption during the flat and uphill submaximal workloads corresponded to 57 ± 5%, 66 ± 6%, 78 ± 6%, and 62 ± 9%, 69 ± 8%, 76 ± 8.0% of VO2peak, respectively (**Figure 2**). Heart rate was 76 ± 7%, 83 ± 7%, 91 ± 5%, and 80 ± 7%, 86 ± 7%, and 90 ± 6% of HRpeak, respectively. Correlations between VO<sup>2</sup> and external power were large to very large during flat [r(25) = 0.86, p = 0.001] and uphill [r(25) = 0.90, p = 0.001] roller skiing, respectively. There were no differences in GE between workloads at the same inclination, but GE was significantly different between flat [(14.4% ± 0.6%) = 0.9%] and uphill (17.8% ± 0.7%) with a mean differense of 3.4% [t(25) = −26.480, p < 0.001].

Physiological variables from the flat and uphill performance tests are presented in **Table 1**. There were no significant differences in VO2peak, HRpeak, [La−], RER or RPE between the two conditions. However, the 6O2-deficit was 34.8% higher in the uphill compared to the flat performance test [t(7) = −5.676, p = 0.001].

### Time Trial Characteristics

Mean TT finishing time was 33:25 ± 01:38 mm:ss, corresponding to an average speed of 6.7 ± 0.3 m·s −1 . Average lap speed changed significantly during the TT [F(2,12) = 7.371, p = 0.008], with a significant reduction in speed between lap 1 and lap 2 (−0.2 m·s −1 , p = 0.044). There were no significant differences between lap 1 and lap 3 or between lap 2 and lap 3 (**Figure 3E**). Comparing speeds within each setion revealed significant changes in S4 [F(2,12) = 14.765, p = 0.001], with a a reduction in speed from lap 1 to lap 2 and from lap 1 to lap 3, with a mean difference of 6.1% (p = 0.048) and 7.1% (p = 0.003), respectively. In addition, there were significant changes in S7 [F(2,12) = 5.915, p = 0.016], with an increase in speed from lap 2 to lap 3, with a mean difference of 5.0% (p = 0.035). Continuous time loss and speed are presented in **Figures 3D,E**.

External power varied between ∼230 and 600 W (**Figure 3B**). Section O2-demands from each lap are presented in **Figure 3A**. Section O2-demand varied significantly in S4 [F(2,12) = 14.163, p = 0.001] and S7 [F(2,12) = 6.802, p = 0.011]. In S4, there was a tendency to a reduction in O2-demand between lap 1 and lap 2 (7.4%, p < 0.051), and a significant reduction in O2-demand between lap 1 and lap 3 (8.3%, p = 0.005). In S7, there was an increase in O2-demand (8.9%, p = 0.034) between lap 2 and lap 3. There were no significant differences in O2-demands between laps in S1, S2, S3, S5, or S6. Comparing average

O2-demands (over three laps) between sections, showed that section significantly influenced O2-demands [F(6,36) = 65.816, p < 0.001, post hoc results presented in **Figure 3A**], which ranged from 89 to 157% of VO2peak.

The average HR during the TT was 94 ± 3% of HRpeak (**Figure 3C**). There were significant changes in average HR between laps [χ 2 (2) = 8.3, p = 0.016]. From lap 1 (Mdn = 92% HRpeak, IQR = 4%) to lap 3 (Mdn = 95% HRpeak, IQR = 2%)


TABLE 1 | Average speed and physiological responses during the flat (1◦ ) and uphill (8◦ ) 3 min all-out performance tests on the roller ski treadmill.

n = 8, <sup>a</sup>n = 6, <sup>b</sup>n = 7.

(p = 0.016). There were, however, no statistical differences in average HR between the other laps.

Rating of perceived exertion changed significantly between the different laps of the TT [χ 2 (2) = 26.393, p < 0.001], with differences between S4 lap 1 (Mdn = 6.0, IQR = 3.0) and S4 lap 3 (Mdn = 8.0, IQR = 1.0) (p = 0.009), between S4 lap 1 (Mdn = 6.0, IQR = 3.0) and S7 lap 3 (Mdn = 9.0, IQR = 1.5) (p = 0.002) and between S7 lap 1 (Mdn = 7.0, IQR = 2.0) and S7 lap 3 (Mdn = 9.0, IQR = 1.5) (p = 0.015). The preferred sub-techniques in S4 and S7 were V1 and V2, respectively.

#### DISCUSSION

The present study investigated pacing patterns by describing exercise intensities in flat and uphill terrain during a self-paced roller ski time trial. The principal findings were that in a XC distance race: (I) the skiers frequently applied exercise intensities exceeding their peak aerobic power and exercise intensity was higher in uphill compared to flat terrain; (II) the skiers applied a variable pacing pattern, evidenced by significant changes in exercise intensity; (III) while peak aerobic power in flat and uphill skiing were similar, the 6O2-deficit during uphill skiing were greater compared to flat.

#### Pacing Pattern and Exercise Intensity Distribution

To our knowledge, this is the first investigation to use positional data, combined with physiological measurements, to determine the individual oxygen demand in multiple sections of a self-paced XC ski race. As evident by the considerable variations in O2-demand (**Figure 3A**), the skiers applied a variable pacing pattern throughout the TT, which is in accordance with earlier observations from competitive XC sprint skiing (Andersson et al., 2010, 2016). It has been suggested that athletes apply a variable pacing pattern in an attempt to maintain the same exercise intensity throughout a race (Abbiss and Laursen, 2008). In this study, however, even though the speed was substantially lower in the uphill sections of the course, O2-demands were considerably higher compared to the flat sections (**Figure 3A**). This implies that the skiers did not maintain an even exercise intensity, but rather repeatedly increased the intensity in the uphill sections. Moreover, the large variations in exercise intensity and the disassociation between exercise intensity and speed between different terrains emphasizes that describing pacing patterns exclusively by inter-lap variations in speed is insufficient (Abbiss and Laursen, 2008). At least, this is true for endurance sports events where substantial variations in the topography exist.

Comparing the exercise intensity in S1 and S6, two sections of approximately the same length and inclination, the O2-demand was approximately 50% higher in S6 (**Figure 3A**). This difference in exercise intensity can be explained by the fact that prior to S6, the skiers performed approximately 1 km of downhill terrain (**Figure 3F**). This allowed the skiers to arrive at S6 in a partially recovered state abel to apply a greater amount of anaerobic work in this section. Further, S6 was followed by relatively even terrain, thus implying, that the skiers had a less strenuous part of the course ahead of them after S6. It was also evident that when the length and inclination of the uphill section increased (S4 and S5), the skiers reduced the intensity (∼115% of VO2peak) compared to the shorter and less steep sections (S3 and S6, 140–160% of VO2peak). Furthermore, as the skiers approached the end of the race, they increased the exercise intensity (S6 and S7). This increase in exercise intensity has been described as the "endspurt phenomenon," and has been explained by a reduction in uncertainty regarding the remaining work (Tucker, 2009). Taken together, these observations imply that skiers modify their exercise intensity and hence their pacing according to the terrain and their current position in the course.

### Exercise Intensity in Uphill Terrain

The rationale for applying higher exercise intensities in uphill than in flat terrain, and thus a variable pacing pattern, involves at least three factors. First, the quadratic increase in air drag with speed implies that a substantial fraction of the increase in propulsive power is dissipated to overcome the increase in air drag resistance. In the flat sections, air drag resistance accounted for approximately 50% of the external work. In contrast, in the uphill sections where the speeds were low, the work against air drag was negligible (∼3% of Pext\_out). Since performance in uphill terrain is a determinant of overall performance in XC skiing (Andersson et al., 2010; Sandbakk et al., 2011), the most rational choice is to increase work rate in the uphill parts of the course.

Second, repeated periods of supramaximal intensities are possible because of the downhill sections, where the skiers are propelled mostly by gravity. Since the anaerobic capacity is limited, either cessation of work or a reduction in work rate to a level that can be met by aerobic metabolism must occur following a period of supramaximal work rates (Gastin, 2001). In the present study, we did not estimate the O2 demand of downhill skiing, but previous estimations suggest that it is approximately 40–60% of VO2max (Sandbakk and Holmberg, 2017). Further, VO<sup>2</sup> values of approximately 65% of VO2max during downhill skiing have been reported during competition (Welde et al., 2003). Therefore, it seems reasonable to assume that there is a sufficient drop in O2-demand during downhill skiing to recover at least some of the O2-deficit attained. This is exemplified by the differences in O2-demands between S1 and S6, as described above.

Third, the possibility of repeatedly attaining an O2-deficit and at least partially recovering from it without a decrease in speed separates XC skiing from other endurance sports, such as running, track cycling and speed skating. In these endurance sports athletes can maintain an intensity relying on a high contribution of anaerobic metabolism for only a limited time, without reducing the speed (Gastin, 2001). To our knowledge, the present study is the first to directly compare the VO2peak and the 6O2-deficit of XC skiers during both flat (1◦ ) and uphill (8◦ ) skiing. A novel finding was that the 6O2-deficit was significantly greater in the uphill performance test compared to the flat. Such a difference has previously been reported in treadmill running and has been attributed to a greater amount muscle mass being active when running uphill (Olesen, 1992). The difference in O2-deficit between flat and uphill terrain has implications for the duration and magnitude of supramaximal work rates in different type of terrains. Since the peak aerobic power was similar in flat and uphill terrain, the total metabolic power attainable in different terrains is determined by the anaerobic energy turnover. In addition, the 6O2-deficit seems to be an important factor for training-induced seasonal changes and thereby performance in elite distance XC skiers (Losnegard et al., 2013). Taken together, even though the duration of a XC distance race is relatively long (>30 min), and the relative contribution from anaerobic metabolism is low, the ability to repeatedly apply work rates covered by a high anaerobic turnover seems to be a crucial factor for performance in elite XC skiing, which to date is not fully understood.

#### Heart Rate and Exercise Intensity

From a practical standpoint, HR is a widely used tool to describe exercise intensity in endurance sports (Achten and Jeukendrup, 2003). In the present study, HR remained high for most of the race (>90% of HRpeak) (**Figure 3C**), which is in accordance with previous observations in XC skiing (Mognoni et al., 2001; Formenti et al., 2015). Our results also show that HR to some extent reflects the exercise intensity in various parts of the course. However, the ability of the HR to reflect rapid-intensity transients and supramaximal exercise intensities is limited due to the temporal dissociation between HR, VO<sup>2</sup> and work rate during high-intensity exercise (Buchheit and Laursen, 2013; Bolger et al., 2015). This is supported by our observations when comparing HR and O2 demands in the different sections of the course (**Figures 3A,C**). While a considerable variation in O2-demand was evident, there were relatively small variations in HR. Hence, HR is not suitable for describing exercise intensity in XC skiing competitions.

#### Methodological Considerations

In the present study, we assumed that the linear relationship between external power and O<sup>2</sup> cost established during the submaximal workloads (∼55–80% of VO2peak) also applied to maximal and supramaximal workloads observed in the TT (∼85–160% of VO2peak). This relationship is well established at workloads below the lactate threshold (Bassett and Howley, 2000; Noordhof et al., 2010). However, it is debated whether this linearity also applies to workloads above the lactate threshold (Noordhof et al., 2010). Furthermore, taking both the duration of the TT and the intensities applied into account, a VO<sup>2</sup> slow component must be expected (Jones et al., 2011). This could result in a reduction in GE and an increase in the energy cost of maintaining the same external workload. We did not quantify alterations in the GE during the TT. Thus, we may potentially have underestimated the actual O2-demand, at least in the later parts of the TT. These considerations should be taken into account when interpreting the results.

The air drag resistance acting upon a XC skier is a complex mechanism, affected by changes in body position, sideways movement and clothing (Spring et al., 1988; Leirdal et al., 2006). In the present study, we assumed that the upright position, described by Spring et al. (1988) represented the average body position of the skier. Because of the quadratic behavior of the air drag resistance, the body position of the skier potentially has a significant influence on the external power at high speeds. However, most of the sections in the current study were uphill (nuphill = 5). Hence, the speeds and the relative contributions of air drag resistance were small (∼3%) and should not have influenced the results. Moreover, environmental wind conditions may influence the relative air flow and affect the air drag resistance. In the present study, no environmental wind was assumed in the calculations as wind conditions on both test occasions were negliabe.

Power against rolling resistance on the treadmill was quantified using the method described by Losnegard et al. (2012a). The limitations of this approach were discussed by the authors and can be attributed to changes in the normal force and the orientation angle of the roller ski in relation to the direction of travel. However, the effects were small, and would only have minor implications for the estimation of total external power.

The different ski-skating techniques applied in the flat and the uphill sections could potentially influence the VO2peak and O2-deficit. However, Losnegard et al. (2012b) reported no differences in performance, VO2max or O2-deficit between

V1 and V2 at steep inclines (6–8◦ ) and differences in applied sub-techniques were considered to have minimal impact on the results.

A strength of the present study is the use of individually estimated O2-demands, which to date has not been applied in simulated races outdoors. Thus, the presented method makes it possible to describe not only the external workload but also the physiological workload imposed on skiers in a XC ski race. However, it should be noted that the results are based on estimations and not direct measurement of the O2-demand. To our knowledge, direct methods to measure anaerobe turnover during field tests are at present not well developed. Moreover, the positional data were collected during a self-paced time trial conducted on a world-cup race course, implying that the results are of high relevance to determine performance in international competitions.

#### Future Studies

In this study, measurements of GE were restricted to steep inclines (8◦ ) or flat (1◦ ) terrain, which limits inferences about inclinations between these two conditions. Hence, further knowledge of how metabolic rate depends upon inclination and/or skiing speed would be useful for assessing exercise intensity throughout a ski race. Moreover, we did not measure aerobic energy consumption during the TT. Such measurements would enable calculation of the O2-deficits attained throughout the race, thereby providing a more detailed view of the rates of aerobic and anaerobic energy production.

#### Practical Application

Even though a high aerobic energy turnover always has been mandatory for performance in elite XC skiing, our results also suggest that the ability to repeatedly utilize a high anaerobic energy turnover is of great importance in distance XC skiing. Furthermore, we revealed that the O2-deficit is terrain specific, implying the importance to develop this capacity specifically for different terrains. Further, the observation that HR did not accurately reflect exercise intensity during the TT suggests that athletes and coaches should consider other methods to quantify exercise intensities during competitions and high intensity training sessions with rapid changes in terrain. Finally, the wearable GNSS units used in the current study provides researchers a valuable tool for detailed analysis of performance in XC

### REFERENCES


skiing. As the technology improves and smaller units become available, these units could also become a valuable tool for coaches and athletes when evaluating training and competition strategies.

## CONCLUSION

The present study investigated energy demands flat and uphill terrain during a self-paced roller ski time trial. This was accomplished by applying a novel approach combining accurate positioning data collected with a wearable dGNSS system, with individual physiological data collected during treadmill roller skiing. Our findings revealed that XC skiers repeatedly applied exercise intensities exceeding their maximal aerobic power during a XC distance race. Hence they applied a variable pacing pattern. Furthermore, the 6O2-deficit was considerably higher during uphill skiing compared to flat which has implications for the duration and magnitude of supramaximal work rates that can be applied in different types of terrain.

### AUTHOR CONTRIBUTIONS

ØK, MG, and ØG analyzed the data. All authors contributed to the design of the study, colletion of data, and writing.

### FUNDING

The study was internally financed by the Department of Physical Performance at the Norwegian School of Sport Sciences.

## ACKNOWLEDGMENTS

This paper was based on the master's thesis "Pacing strategy and exercise intensity in cross-country skiing" from the Norwegian School of Sport Sciences (Karlsson, 2017). This thesis is the only medium in which this content has previously appeared, and its publication is in line with the policies of the Norwegian School of Sport Sciences. The authors would like to express their gratitude to all the skiers for their participation and cooperation. The authors would also like to acknowledge the contribution of Camilla Høivik Carlsen, Erik Trøen, and Ola Kristoffer Tosterud during the data collection.



moderate to steep inclines. J. Strength Cond. Res. 26, 1340–1347. doi: 10.1519/ JSC.0b013e318231a69e



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Karlsson, Gilgien, Gløersen, Rud and Losnegard. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Dynamics of Recovery of Physiological Parameters After a Small-Sided Game in Women Soccer Players

Rafaela B. Mascarin<sup>1</sup> , Vitor L. De Andrade<sup>2</sup> , Ricardo A. Barbieri<sup>3</sup> , João P. Loures<sup>1</sup> , Carlos A. Kalva-Filho<sup>1</sup> and Marcelo Papoti1,2,3 \*

<sup>1</sup> Post Graduate in Rehabilitation and Functional Performance, Physiotherapy Department, Faculdade de Medicina de Ribeirão Preto, University of São Paulo, Ribeirão Preto, Brazil, <sup>2</sup> Post Graduate Program in Movement Sciences, Bioscience Institute, Physical Education Department, São Paulo State University "Júlio de Mesquita Filho", Rio Claro, Brazil, <sup>3</sup> Post Graduate Program in Physical Education and Sport, School of Physical Education and Sport of Ribeirão Preto, Ribeirão Preto, Brazil

Purpose: Training methods based on small-sided game (SSG) seem to promote physiological and tactical benefits for soccer players as they present characteristics more specific to the game. Thus, the main objective of the present study was to analyze the hormonal, biochemical, and autonomic parameters in an acute manner and the recovery dynamics (up to 72 h after) in a SSG.

#### Edited by:

Billy Sperlich, Universität Würzburg, Germany

#### Reviewed by:

Filipe Manuel Clemente, Polytechnic Institute of Viana do Castelo, Portugal Giovanni Messina, University of Foggia, Italy

\*Correspondence:

Marcelo Papoti mpapoti@usp.br; mpapoti@yahoo.com.br

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 06 February 2018 Accepted: 19 June 2018 Published: 11 July 2018

#### Citation:

Mascarin RB, De Andrade VL, Barbieri RA, Loures JP, Kalva-Filho CA and Papoti M (2018) Dynamics of Recovery of Physiological Parameters After a Small-Sided Game in Women Soccer Players. Front. Physiol. 9:887. doi: 10.3389/fphys.2018.00887 Methods: Thirteen professional female soccer players participated in the study (18.8 ± 0.8 years, body mass 59.4 ± 6.2 kg, and height 1.68 ± 0.05 m). During and after the SSG session (4 min × 4 min separated by 3 min of passive interval and 120 m<sup>2</sup> coverage per player), autonomic modulation was analyzed in the time and frequency domains using heart rate variability, and blood samples (5 ml) were collected before (0 h) and after (10 min and 24, 48, 72 h) the SSG for biochemical and hormonal analysis.

Results: The SSG induced an increase effect for LF (low frequency) (92,52%; Very likely increase) and a decrease effect for HF (high frequency) values (−65,72%; Very likely decrease), after 10 min of recovery. The LF/HF increase after 10 min of recovery (386,21%; Very likely increase). The RMSSD (square root of the mean squared differences of the successive N–N intervals) and pNN50 (measure of the number of adjacent NN intervals which differ by more than 50 ms) values presented a decrease effect 10 min after SSG (61,38%; Very likely decrease and−90%; Very likely decrease). The CK (creatine kinase) values presented no changes 10 min after SSG. The LDH (lactate dehydrogenase) values presented an increase effect 10 min after the SSG (19,22%; Likely increase). Both testosterone and cortisol concentrations presented the same behavior after SSG, where no alterations were observed with after 10 min (<0,37%; Most likely trivial).

Conclusion: The SSG promoted significant cardiovascular stress that was restored within the first 24 h of recovery. Parasympathetic parameters continued to increase while sympathetic parameters declined significantly during the 72 h of recovery. In addition, the reduced game did not alter biochemical or hormonal responses during the 72 h.

Keywords: training, soccer, heart rate variability, muscle damage, hormone, fatigue, recovery, sport science

## INTRODUCTION

fphys-09-00887 July 9, 2018 Time: 15:28 # 2

Decisive actions during an official football match are carried out at maximum intensity over short periods of time (i.e., anaerobic efforts), however, the majority of energy required during a match is supplied by the aerobic metabolism (Jones and Drust, 2007; Hill-Haas et al., 2009a,b; Casamichana and Castellano, 2010). As a result, several training methods, with and without the ball, have been tested (Helgerud et al., 2001; Hoff et al., 2002; Impellizzeri et al., 2006; Little and Williams, 2006; Rampinini et al., 2007; Iaia et al., 2009).

In this sense, different small-sided game have become widely used alternatives, mainly to include actions with the ball, opponents, and specific situations of the game such as defensive or offensive numerical superiority or inferiority (recurring and decisive context in a game of soccer) (Costa et al., 2009). SSGs present specificity, subjecting the participant to the technical, tactical, and physical aspects inherent in soccer practice due to characteristics very close to the formal game (i.e., physical and physiological impact, ball actions, and the presence of opponents and teammates that imply specific situations of the game such as defensive or offensive numerical superiority/inferiority) (Michailidis, 2013). In this way, different SSGs present a high degree of specificity, subjecting the participant to the technical, tactical, and physical aspects inherent in soccer (Owen et al., 2004; Little and Williams, 2006; Jones and Drust, 2007; Michailidis, 2013). By exposing the athletes to a certain level of physical stress, SSGs promote changes in blood lactate concentration, rate of perceived exertion and heart rate, as well as alterations in the autonomic nervous system (Boullosa et al., 2013). One method used to evaluate the autonomic nervous system and its sympathetic and parasympathetic branches is HRV which describes the dynamics of the intervals between consecutive heart beats.

Vanderlei et al. (2009) described that part of the control of the cardiovascular system is performed by the autonomic nervous system and is closely linked to heart rate. Thus, the increase in HR is a consequence of the greater action of the sympathetic pathway and the lower parasympathetic activity. The authors state that irregularities in HRV indicate the heart's ability to respond to multiple stimuli such as exercise. Thus, the rigorous training programs that professional athletes follow lead to significant changes in the mechanisms of cardiovascular adaptation, improving cardiac function (Francavilla et al., 2018).

After acute physical exercise, HRV can allow easy and noninvasive analysis of the neural control of heart rate, besides being able to measure important modifications in the functioning of the cardiovascular system and its mechanisms of autonomic adjustments (Alonso et al., 1998). The cardiac autonomic modulation index has been used as a marker of the quality of cardiac function, representing a technique that allows the evaluation of risks of sudden cardiac death (Sessa et al., 2018) and also of the stress induced by exercise (Mazon et al., 2013). This analysis is an attempt to avoid states of fatigue, in order to promote adequate recovery, thus optimizing the training (Bricout et al., 2010). Moreover, it presents sensitivity to the effects of the SSG on the autonomic system as observed by Hammami et al. (2016). The study found low parasympathetic reactivation 10, 20, and 30 min after an SSG effort.

In addition to variables related to the cardiovascular system, the determination of injury biomarkers and physiological stress are also frequently used to determine the internal training load (Nakamura et al., 2010; Souza et al., 2010; Coelho et al., 2013; Mazon et al., 2013). Previous studies have shown that both soccer training and formal games can alter plasma concentrations of catecholamines (adrenaline and noradrenaline), cortisol, testosterone, creatine kinase, and lactate dehydrogenase as a consequence of the efforts (Coelho et al., 2011, 2013; Silva et al., 2012, 2014), which could be partially attributed to intermittent repetitions of intense eccentric activation (Ispirlidis et al., 2008).

Different responses are observed between the sexes, mainly in the inflammatory profile (Souglies et al., 2015). Bowtell et al. (2016) investigated the CK response in women with little or no experience of American football during two sessions of different SSGs and found elevated levels of the protein up to 48 h post-game. Ispirlidis et al. (2008) investigated performance, muscle damage, and inflammation during a 6 day recovery period in elite soccer players after a simulated game and found elevated CK and LDH levels up to 96 and 72 h after, respectively, while cortisol levels reached a peak immediately after the game and returned to baseline within the first 24 h of recovery. No change was observed in testosterone levels.

Although it is possible that the delay in HRV recovery after exercise may be indicative of the overall magnitude of the induced stress response, the course of recovery time does not indicate total recovery from the systemic stress response (Seiler et al., 2007). Therefore, simultaneous evaluation of HRV and other markers of stress and fatigue is of utmost importance. Thus, the main objective of the present study was to determine and understand the recovery dynamics of autonomic, biochemical, and hormonal parameters after SSG effort in soccer players.

The SSG seems to be advantage to the training routine, however, little is known about the dynamics of recovery of physiological parameters with this stimulus. Therefore, the main innovative factor of the research was the determination and the understanding of the dynamics of recovery of autonomic, biochemical, and hormonal parameters after the SSG with women soccer players.

#### MATERIALS AND METHODS

#### Participants

Thirteen athletes belonging to a professional women's soccer team participated in the study, who competed in state

**Abbreviations:** AM, before midday; C, cortisol; CK, creatine kinase; HF, high frequency; HR, heart rate, HRV, heart rate variability; LDH, lactate dehydrogenase; LF, low frequency; pNN50, proportion of interval differences of successive N–N intervals greater than 50 ms; RMSSD, square root of the mean squared differences of the successive N–N intervals; RR, interval beat-to-beat; SDNN, standard deviation of the normal-to-normal intervals; SSG, small-sided game; T, testosterone; VO2, oxygen consumption; VO2MAX, maximum oxygen consumption.

championships, with minimum experience of 5 years of systematized training, all affiliated to the Brazilian Football Confederation (CBF) [age: 18.8 ± 0.8 years; body weight: 59.4 ± 6.2 kg (Evolution Sanny Professional Precision-Scale); height: 1.68 ± 0.05 m (Sanny Standard Stadiometer); VO2max: 36,07 ± 7,50]. All procedures were approved by the University's Institutional Review Board for Human Subjects (Human Research Ethics Committee) and were conducted according to the Declaration of Helsinki. Athletes were informed about the experimental procedures and risks and signed an informed consent form prior to participation in the study. This study was performed in accordance with international ethical standards (Harriss and Atkinson, 2015).

#### Experimental Design

The evaluations were performed 2 months after the competitive period (August, 2017) and all sessions took place on synthetic grass (where the formal games of the team took place) wearing cleats. On the first day, a progressive test (20 m go and back) was performed on the field to determine maximum oxygen consumption (VO2max). The SSG was applied on the second day of evaluations, 1 week after the application of the progressive test. All players did not practice any physical activity for 48 h preceding the SSG.

Heart rate variability and HR were evaluated constantly (i.e., prior to, during the SSG, and in the first 30 min and 24, 48, 72 h of recovery). At 0 h, 30 min, 24, 48, and 72 h after the SSG, HRV monitoring was performed for 20 min. Blood samples for biochemical and hormonal analysis were collected prior to, and 5 min and 24, 48, 72 h after the SSG session.

The SSG took place at the team training center in atmospheric conditions of 25–28◦ C, 40–44% humidity, wind 13 km/h, and atmospheric pressure 1013–1016 hPa (App The Weather Channel).

### Progressive Test and Backward Extrapolation Technique

Before the beginning of the tests, the athletes were kept in a seated position for 5 min to determine the baseline of the blood lactate concentration and oxygen consumption (VO2). The participants performed 20 m races in the form of go and back on the soccer field. They started the test at an intensity of 8 km/h and increased 1 km/h every 3 min. The intensity of each stage was controlled by sound stimuli and the athletes were instructed to pass the 20 m demarcation lines at each signal. Exhaustion was defined as the player's inability to continue the test or when she could not complete the 20 m at each beep for three consecutive times.

After each effort, athletes were instructed to breathe immediately into a face mask, connected to a gas analyzer system (VO2000, Medgraphics, EUA). VO<sup>2</sup> values were log-transformed and plotted against time, which was linearly adjusted. Thus, the y-intercept was considered as VO<sup>2</sup> at the end of exercise (Montpetit et al., 1981) and assumed as the first point of recovery.

#### Small-Sided Game

The coverage area per player was set at 120 m<sup>2</sup> (Kelly and Drust, 2009; Jastrzebski et al., 2016). The evaluated model was the 4x4; each session lasted 25 min, with 16 min of effort (four efforts of 4 min) and 9 min of passive rest (three rest intervals of 3 min). To perform the evaluations in a staggered way and to respect the minimum interval between evaluations post-SSG, a total of 16 sequential games of 4 min duration and 3 min interval were played. The players warmed up before the start of the SSG with three laps running around the field and short runs with a change of direction for 8 min.

The game consisted of passing the end lines with the ball controlled and possession of the ball was alternated, that is, when a team scored or exceeded the demarcation limits of the game, the ball was quickly returned to the other team. The athletes were motivated by the coaches throughout all games.

### Analysis of Heart Rate Variability (HRV)

The HRV was analyzed pre, 10 min after, and at 24, 48, and 72 h of recovery after the SSG. With the exception of collections 10 min after play, the collections were part of the players' first daily activity. The players woke up at the training center and went to a pre-determined room for evaluation to begin at 6:30 AM. The HRV was recorded beat-to-beat (RR intervals) by a heart rate monitor – Polar Team<sup>2</sup> (Polar Kempele <sup>R</sup> , Finland) in a continuous manner and later transmitted to a computer through interface model – IR interface (Polare <sup>R</sup> , Finland) using the Software "Kubios HRV," for Windows (Polar Electro Oy, Kempele, Finland, 2010).

Heart rate variability was analyzed in the frequency domain: the power of the high frequencies (HF: 0.15–0.40 Hz) and low frequencies (LF: 0.04–0.15 Hz) in normalized units and the LF/HF in ms<sup>2</sup> (milliseconds). In the time domain, the following indices were used: mean RR (mean of RR intervals), SDNN (standard deviation of all normal RR intervals recorded in a time interval, expressed in ms), RMSSD (square root of the difference between adjacent normal RR intervals in a time interval expressed in ms), and pNN50 (percentage of adjacent RR intervals with duration difference greater than 50 ms) (Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology, 1996).

TABLE 1 | Mean ± standard deviation (SD) values of the anthropometric characteristics and the physiological variables of the players.


BMI, body mass index; LAN, anaerobic threshold; iVO2max, intensity of maximum oxygen consumption; VO2max, maximum oxygen consumption.

### Blood Collection and Analysis

All venous blood collections were performed under the responsibility of an accredited nurse, following all hygiene and asepsis care. Analyzes of the samples were performed by the Clinical Analysis Service (CAS) of the Faculty of Pharmaceutical Sciences of Ribeirão Preto. The athletes were instructed to maintain a 12-h fast, not to practice physical activities, and not to consume alcohol or drinks containing caffeine. While the female athletes were still fasting, 5 mL of blood was collected (7 AM) at moments 0, 24, 48, and 72 h after the SSG in a predetermined room in the training center. The collection 5 min after the game was performed between 10 and 12 AM in a room next to the field where the SSG was played.

For collection and storage of blood samples, BD Vaccutainer <sup>R</sup> EDTA tubes with separator gel were used (1 Becton Drive, Franklin Lakes, NJ, United States). After collection, the blood was centrifuged for 8 min at 3000 rpm and 8◦C and stored at 8◦C for further biochemical and hormonal analysis.

For quantification of cortisol and free testosterone, specific radioimmunoassay procedures were used through the IMMULITE/IMMULITE 1000 Total Testosterone and Cortisol Kit (Siemens Medical Diagnostics, Los Angeles, CA, United States). As a marker of muscle damage, CK and LDH were determined with the aid of a specific kit provided by Wiener lab. CK dosing was performed using the optimized UV method (IFCC) in serum. LDH was performed through the optimized UV method (SFBC) in serum.

#### Statistical Analysis

The normality of the data was confirmed using the Shapiro– Wilk test, which allowed the description of the variables using mean ± standard deviation. The values observed in each recovery time were compared with baseline values using the Magnitude Based Inferences using the spreadsheets proposed by Hopkins et al. (2009). The effects on HRV, biochemical and hormonal parameters were classified qualitatively as an increase effect, trivial effect or decrease effect. For this, the differences from baseline values were expressed as standardized differences (Cohen's d) and the smallest standardized change was assumed to be 0.20 (Cohen, 1988). Qualitative inferences were classified as most unlikely (<1%), very unlikely (1– 5%), unlikely (5–25%), possibly (25–75%), likely (75–95%), very likely (95–99%), and most likely (>99%). The inference was Unclear when both the increase and the decrease effects were > 5%.

### RESULTS

### Anthropometric and Physiological Characteristics

**Table 1** presents the values referring to the anthropometric characteristics and physiological variables found in the progressive test performed by the players.

#### Small-Sided Game

The players presented a mean blood lactate concentration ([La]mean) of 2.66 ± 0.95 mM at the anaerobic threshold during the incremental test. During the SSG, the [La]mean and %HRmax attained were 6.35 ± 2.22 mM and 94.67 ± 0.87%, characterizing the high energy demand in this activity.

### Heart Rate Variability

The HRV responses were demonstrated in **Table 2**. In the frequency domain (**Figure 1**), SSG induced an increase effect for LF (92,52%; Very likely increase) and a decrease effect for HF values (−65,72%; Very likely decrease), after 10 min of recovery. Both LF and HF returned to baseline values after 24 h (<2,13%; Very likely trivial effect) and presented effects related to the autonomic adaptation after 48 h (Likely decrease for LF and Likely increase for HF), which was maintained after 72 h. The LF/HF increase after 10 min of recovery (386,21%; Very likely increase), returned to baseline values after 24 h (13,44%; Possibly trivial) and decrease after 48 and 72 h of recovery (−53%; Likely decrease). In the time domain (**Figure 2**), the RMSSD values presented a decrease effect 10 min after SSG (61,38%; Very likely decrease) but showed an increase effect from 24 h of recovery (>57,04%; Likely increase). The same behavior was observed for pNN50, where a decrease effect occurred after


RR, interval beat-to-beat; LF, low frequency; HF, high frequency; LF/HF, low frequency/high frequency; RMSSD, square root of the mean squared differences of the successive N–N intervals; SDNN, standard deviation of the normal-to-normal intervals; pNN50, proportion of interval differences of successive N–N intervals greater than 50 ms.

10 min (−90%; Very likely decrease), which was followed by an increase from 24 h of recovery (>15,28%; Likely increase). Although the SDNN values demonstrated no alterations 10 min after the SSG (−13,52%; Possibly trivial), an increase effect


CK, creatine kinase; LDH, lactate dehydrogenase; T/C, testosterone/cortisol.

was also observed from 24 h of recovery (>49,03%; Likely increase).

#### Biochemical and Hormonal Examinations

Biochemical and hormonal responses were presented in **Table 3**. **Figure 3** shows the muscle damage values obtained before and during recovery after SSG. The CK values presented no changes 10 min after SSG (2,72%; Most likely trivial) and decrease progressively from 24 h of recovery (> −19,59%; Likely decrease until 48 h and Very likely decrease at 72 h). Although the LDH values presented an increase effect 10 min after the SSG (19,22%; Likely increase), these concentrations decrease progressively from 24 h of recovery (> −7,68%; Likely decrease until 48 h and Very likely decrease at 72 h). Both testosterone and cortisol concentrations presented the same behavior after SSG (**Figure 4**), where no alterations were observed with after 10 min (<0,37%; Most likely trivial), an decrease effect occurred after 24 h (> −32,65%; Very likely decrease) and 48 h >8,92%; Likely decrease), with the return to the baseline values after 72 h of recovery (< −0,09%; Most likely trivial for testosterone and Likely trivial for cortisol).

#### DISCUSSION

The investigation of the autonomic, biochemical, and hormonal parameters pre- and post-SSG, demonstrate that the stimulus promoted a break in the organic homeostasis of soccer players. In this sense, was determined and monitored the dynamics of recovery of autonomic, biochemical, and hormonal that promote specific and desired adaptations in women soccer players in the training routine.

The temporal course of cardiac autonomic recovery reflects the restoration of cardiovascular homeostasis, which is an important component of general recovery (Stanley et al., 2013). Thus, HRV indices may be useful for monitoring the effects of soccer training as they are sensitive to periods of stress and recovery (Bara-Filho et al., 2013). In relation to this, the findings of Boullosa et al. (2012), with male and female Spanish soccer players, suggest that a higher baseline HRV may allow greater use of autonomic resources for responses of soccer players to stress. Dutra et al. (2013) investigated baseline HRV indices in women divided into three groups according to aerobic capacity and found values similar to those

of the present study for RR, LF, HF, and the LF/HF ratio. In a study conducted with trained and highly trained runners (Seiler et al., 2007), baseline values for all autonomic indices corroborate with the data of the present study. Similar values were also found in soccer players during the pre-season (Oliveira, 2012).

inference analysis for hormonal variables. Chances of effects (decrease/trivial/increase; inference) for cortisol values were: after 10 min (0/100/0; Most likely trivial), 24 h (96/3/1; Very likely decrease), 48 h (85/15/0; Likely decrease) and 72 h (0/77/23; Likely trivial). For testosterone were: after 10 min (0/100/0; Very likely decrease), 24 h (96/3/2; Very likely decrease), 48 h (79/21/0; Likely decrease), and 72 h (0/100/0; Most likely trivial). For testosterone and cortisol ratio (Testosterone/Cortisol): after 10 min (12/88/0; Likely trivial), 24 h (86/14/0; Likely decrease), 48 h (0/100/0; Most Likely trivial), and 72 h (8/92/0; Likely trivial).

The results of the present study demonstrate a high mean RR and HF (parasympathetic predominance index) pre-game, followed by a significant decrease 10 min after the SSG. The values of HF pre- SSG corroborate with a study conducted with female professional basketball players (Messina et al., 2012). The authors suggest that a lower resting heart rate is a consequence of high vagal tone due to the training effect. In relation to the LF and LF/HF ratio (indices related to the predominance of the sympathetic component action on the heart), low pre-SSG means were observed followed by a significant increase in the first 10 min of recovery. These results reflect an increase in sympathetic stimulation or an attenuated parasympathetic modulation mitigated by the SSG (Boullosa et al., 2012) in order to bring the ANS to a stress condition and consequently, low HRV values that are attributed to a decrease in the efferent vagal tonus and a lower β-adrenergic response capacity (Dong, 2016). That is, during exercise, with the increase in HR, autonomic dysfunctions occur such as vagal inhibition and increased sympathetic activation (Buchheit et al., 2009). This post-exertion behavior has been reported in several studies with varied efforts in soccer among young trained individuals, untrained individuals, players, and elite players (Bricout et al., 2010; Boullosa et al., 2012, 2013; Bara-Filho et al., 2013; Dellal et al., 2015; Flatt et al., 2016, 2017; Hammami et al., 2016).

When the recovery data were observed 24 h after exercise, it was observed that HRV values returned to baseline and continued to decrease (LF and LF/HF) or increase (RR, HF, RMSSD, pNN50, and SDNN) in the following hours. This is due to parasympathetic cardiac reactivation. We emphasize the decrease in LF and a significant increase in HF found at 72 h in relation to the pre-game and 24 h recovery moments. This result may be associated with the recommendation not to practice any physical activity for only 48 h preceding the test. It is possible that if there had been a pause in the training sessions in the 72 h that preceded the SSG, the values found in the pre-analysis would not show a significant difference in relation to the 72 h moment. Another hypothesis is based on the fact that the SSG and other collections performed in the study may have altered the daily autonomic control of the players. The participants in the study of Boullosa et al. (2012) presented significantly lower HRV before and after a football match compared to the day of rest. The authors state that concern or mental preparation for the soccer game may lead to an increased sympathetic response and/or attenuated parasympathetic modulation, resulting in lower player HRV.

Based on the results, the players demonstrated significant cardiovascular stress during the SSG with decreased cardiac autonomic control, evidenced in the first minutes of recovery (10 to 30 min) in relation to the pre-game. Previous studies with a simulated formal game in soccer players observed low HRV in up to 10 h of recovery (Boullosa et al., 2013). In contrast, Seiler et al. (2007) in a study with highly trained runners observed recovery at approximately 120 min postexercise, regardless of the intensity of the training. However, Stanley et al. (2013) demonstrate that the time required for complete autonomic cardiac recovery after a single aerobic

training session is up to 24 h after low-intensity exercise, 24– 48 h after moderate exercise, and at least 48 h after high intensity exercise. However, the authors suggest that individuals with higher fitness are more resistant to training stress and require less time to recover due to lower variations and faster recovery of cardiac parasympathetic activity after exercise. In the present study, although the SSG was an intense aerobic activity (92.7– 94.77% HRmax), cardiovascular autonomic recovery occurred after 24 h.

The baseline plasma CK concentrations of the present study are close to those found by Coelho et al. (2011) when evaluating soccer players of the first division of Brazilian soccer. The authors evaluated the team throughout the training period and, therefore, values close to 300 U/L are expected during the season. These values are also similar to those of Zoppi et al. (2003), Ascensão et al. (2008), and Souglies et al. (2015). In contrast, Lazarim et al. (2009) found higher resting values (493 U/L) and Andersson et al. (2007), lower values (158 ± 33 U/L) when analyzing protein concentrations in professional soccer players. The latter author, however, did not report the interval between CK collection and team training. In relation to LDH, Bezerra et al. (2016) found resting values close to those of the present study and reported that the interval between the final training and collections was 24 h, suggesting that the values found were influenced by the daily training. Ispirlidis et al. (2008), when evaluating professional soccer players, pre- and post-game, found CK and LDH resting values below 200 U/L, however the author states that the athletes did not practice any strenuous activity for 7 days before and after the game. In the present study, the players stopped training 48 h preceding the SSG, and for this reason it was possible to observe a significant reduction in CK and LDH at 48 h in relation to at 5 min and 24 h of recovery.

Despite resting values close to those reported in the literature, there was no significant increase in CK in the 72 h of recovery in relation to rest. On the other hand, it was possible to observe a significant increase in LDH soon after (10 min) the SSG. Observing the other results, this increase is associated with a decrease in the O<sup>2</sup> demand in the muscle and, therefore, intensification in the lactate formation in order to provide energy for muscular action.

No studies were found that assessed muscle damage in response to an SSG with female soccer players. Bowtell et al. (2016) analyzed the CK response using soccer SSGs in untrained women, however, the rest values presented were much lower (69 ± 23 U/L) than in the present study. After the SSG the authors found values significantly higher than pre-game, with the peak at 48 h of recovery (108 ± 39 U/L). The literature is vast concerning responses to game stimuli and muscular damage in soccer players (Andersson et al., 2007; Ascensão et al., 2008; Ispirlidis et al., 2008; Coelho et al., 2011; Souglies et al., 2015). Thus, it can be concluded that although the practice of the SSG chosen may be intense, it does not impose stimuli that produce muscular stress when compared to the formal game, probably because of its short duration.

By monitoring the quantitative changes in hormones with anabolic and catabolic properties, such as testosterone and cortisol, it is possible to identify a momentary catabolic state (Mazon et al., 2013). Several studies have reported the behavior of these hormones against stimuli from formal male and female soccer games (Ispirlidis et al., 2008; Oliveira et al., 2009; Maya et al., 2016). Haneishi et al. (2007), in addition to evaluating the post-game responses, analyzed the cortisol responses after training of 105 min. The authors found that the post-game cortisol response was 250% higher than the post-training values, which did not present any significant differences in relation to the pre-training evaluation. Competitive events (i.e., games) are more likely to generate acute hormonal responses than routine training activities (e.g., SSGs), as they promote an early increase in cortisol levels to prepare the individual for action (Oliveira et al., 2009). In the present study, there was no significant increase in cortisol or testosterone during the 72 h of recovery after the SSG. Waal (2017) evaluated the acute endocrine responses of soccer players in an SSG close to the model proposed in the present study and also found no significant difference after the SSG in relation to rest. This appears to be the only study to evaluate hormonal responses from stimuli using SSGs. The author also concludes that training based on SSGs or unofficial (i.e., friendly) matches does not seem to produce the same significant hormonal responses to the stimulus as the competitive environment. Consequently, no significant alterations were observed in the T/C ratio.

It can be concluded that the athletes presented cardiovascular stress during the SSG with reduced cardiac autonomic control, evidenced in the first minutes of recovery. The parasympathetic cardiac reactivation was reestablished after 24 h although the values at 72 h still demonstrated a significant reduction. However, although the physical requirements related to the SSG caused a decrease in the autonomic parameters, the hormonal and muscle damage markers were not altered.

The limitation of the present study was the relatively small number of participants. The study evaluated 13 players, however, a total of 23 players took part in the study to make the ideal scheduling possible in the participation in each of the SSG efforts (fundamental aspect so that the collection moments are met post-SSG for each player) besides the precaution related to possible injuries from the SSG. Nevertheless, the study offers valuable insights into the SSG among women soccer players.

Further studies should be devoted to verifying the influence and recovery time required for autonomic, neuromuscular, inflammatory, and hormonal parameters using generic training methods (e.g., interval aerobic training, intermittent highintensity training) which seek improvement in aerobic fitness and game performance in male and female amateur and professional soccer players. In addition, new efforts should be directed in an attempt to simulate competitive scenarios using SSGs and generic training methods.

As a practical implications, it is important that high performance coaches simulate competitive practice environments in order to make training, based on internal loads, as close as possible to the context and physiological demand experienced during a formal competitive football match. Thus, the understanding and monitoring of certain stress markers during the season could contribute to the

systematization and optimal control of individual training loads in an attempt to minimize the onset of the fatigue process and enhance performance of the athletes.

#### AUTHOR CONTRIBUTIONS

RM, VDA, RB, CK-F, and MP: conceived and designed the experiments. RM and VDA: performed the experiments. RM,

#### REFERENCES


VDA, and RB: analyzed data. RM, VDA, RB, and CK-F: contributed materials and analysis tools. RM, VDA, JL, CK-F, and MP: wrote the paper.

#### FUNDING

This study was supported by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), process No. 2015/24833-7.

specific tests: interests and limits. Asian J. Sports Med. 4:e25723. doi: 10.5812/ asjsm.25723


performance responses following a soccer game. Clin. J. Sport Med. 18, 423–431. doi: 10.1097/JSM.0b013e3181818e0b


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Mascarin, De Andrade, Barbieri, Loures, Kalva-Filho and Papoti. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Tracking Performance in Endurance Racing Sports: Evaluation of the Accuracy Offered by Three Commercial GNSS Receivers Aimed at the Sports Market

#### Øyvind Gløersen1,2 \*, Jan Kocbach<sup>3</sup> and Matthias Gilgien2,4

<sup>1</sup> Condensed Matter Physics, Department of Physics, University of Oslo, Oslo, Norway, <sup>2</sup> Department of Physical Performance, Norwegian School of Sport Sciences, Oslo, Norway, <sup>3</sup> Centre for Elite Sports Research, Department of Neuromedicine and Movement Science, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway, <sup>4</sup> Norwegian Ski Federation, Alpine Skiing, Oslo, Norway

Edited by:

Billy Sperlich, Universität Würzburg, Germany

#### Reviewed by:

Fabio Rubens Serpiello, Victoria University, Australia Matthias Wilhelm Hoppe, University of Wuppertal, Germany

> \*Correspondence: Øyvind Gløersen o.n.gloersen@fys.uio.no

#### Specialty section:

This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology

Received: 28 February 2018 Accepted: 19 September 2018 Published: 09 October 2018

#### Citation:

Gløersen Ø, Kocbach J and Gilgien M (2018) Tracking Performance in Endurance Racing Sports: Evaluation of the Accuracy Offered by Three Commercial GNSS Receivers Aimed at the Sports Market. Front. Physiol. 9:1425. doi: 10.3389/fphys.2018.01425 Advances in global navigation satellite system (GNSS) technology have resulted in smaller and more accurate GNSS receivers, which have become increasingly suitable for calculating instantaneous performance parameters during sports competitions, for example by providing the difference in time between athletes at any location along a course. This study investigated the accuracy of three commercially available GNSS receivers directed at the sports market and evaluated their applicability for time analysis in endurance racing sports. The receivers evaluated were a 1 Hz wrist-worn standalone receiver (Garmin Forerunner 920XT, Gar-920XT), a 10 Hz standalone receiver (Catapult Optimeye S5, Cat-S5), and a 10 Hz differential receiver (ZXY-Go). They were validated against a geodetic, multi-frequency receiver providing differential position solutions (accuracy < 5 cm). Six volunteers skied four laps on a 3.05 km track prepared for cross-country skiing, with all four GNSS receivers measuring simultaneously. Deviations in position (horizontal plane, vertical, direction of travel) and speed (horizontal plane and direction of travel) were calculated. In addition, the positions of all receivers were mapped onto a mapping trajectory along the ski track, and a time analysis of all 276 possible pairs of laps was performed. Specifically, the time difference between any two skiers for each integer meter along the track was calculated. ZXY-Go, CAT-S5, and GAR-920XT had horizontal plane position errors of 2.09, 1.04, and 5.29 m (third quartile, Q3), and vertical precision 2.71, 3.89, and 13.35 m (interquartile range, IQR), respectively. The precision in the horizontal plane speed was 0.038, 0.072, and 0.66 m s−<sup>1</sup> (IQR) and the time analysis precision was 0.30, 0.13, and 0.68 s (IQR) for ZXY-Go, Cat-S5, and Gar-920XT, respectively. However, the error was inversely related to skiing speed, implying that for the low speeds typically attained during uphill skiing, substantially larger errors can occur. Specifically, at 2.0 m s−<sup>1</sup> the Q3 was 0.96, 0.36, and 1.90 s for ZXY-Go, Cat-S5, and Gar-920XT, respectively. In summary, the differential (ZXY-Go) and 10 Hz standalone (Cat-S5) receivers performed substantially better than the wrist-worn receiver (Gar-920XT) in terms of horizontal position and horizontal speed calculations. However, all receivers produced sub-second accuracy in the time analysis, except at very low skiing speeds.

Keywords: global navigation satellite systems, GPS, speed, position, time, validity, human performance

#### INTRODUCTION

fphys-09-01425 October 6, 2018 Time: 12:15 # 2

In most endurance sports such as cycling, running, rowing, or cross-country skiing, athletes move from a start point along a pre-defined track to finish in the shortest time possible. To provide athletes, coaches, and spectators with information describing the development of a race, intermediate times are commonly used to provide section time information. Such information provides some insight into the development of a race, but is limited, since changes in athletes' performance often occur at a higher rate than the time elapsed in the individual sections. This limitation in analysis detail can be overcome if the athlete's position is tracked instantaneously along the course from start to finish using wearable positioning devices such as global navigation satellite systems (GNSS) or local positioning systems (LPS). Instantaneous performance can be characterized by instantaneous time analysis, providing the relative difference in time between athletes at any location along the course. Such instantaneous time analysis allows the identification of events where athletes gain or lose time compared to their compatriots, and can even provide the rate at which time is gained and lost from start to finish of the entire race (Self et al., 2012; Bolger et al., 2015; Johnson et al., 2015; Gilgien et al., 2016; Losnegard et al., 2016; Sandbakk et al., 2016; Marsland et al., 2017). For cases where athletes follow a given track, differences in time between athletes are explained by differences in speed between the athletes. Hence, the measurement of instantaneous time and speed differences between athletes provides a more detailed performance analysis compared to the commonly used discrete intermediate time analysis. To allow instantaneous performance analysis, an athlete's position and speed need to be tracked continuously during the race using methodologies that cause the least possible interference with the athlete's sporting action, but that exhibit sufficient accuracy.

To track athletes' positions and speed instantaneously, the primary technologies used are video-based tracking, LPS, and GNSS (Muthukrishnan, 2009). Video-based tracking is only applicable if the athletes are in the field of view of a camcorder throughout the race and are therefore not often used in racing and endurance sports. LPS is typically used for indoor sports but can also be used in outdoor sports that are held in limited space, such as on track loops (Self et al., 2012; Swarén et al., 2016; Swarén and Eriksson, 2017). GNSS does not have the two limitations described above and is therefore the most commonly applied wearable technology used to track athletes in outdoor sports.

The rapid development in GNSS technology over recent decades has substantially increased the number of different commercially available GNSSs suitable for sports applications. The GNSS receivers used in sports devices range from single-frequency chips incorporated in smartphones and wrist-worn training computers, to standalone units solely designed for athlete tracking and high-end geodetic receivers, which are typically carried on the athlete's back and developed for purposes different from sports (tracking of planes, drones, etc.). Hence, the GNSS technologies applied in sports differ substantially in hardware and software quality and complexity (Supej and Cuk, 2014), which has an impact on measurement accuracy (Muthukrishnan, 2009). The major characteristics of GNSS properties that have impacts on position accuracy are: Antenna and GNSS board type; GNSSs used; GNSS frequencies used; and GNSS processing method (standalone, differential, precise point positioning, etc.) (Madry, 2015). Since GNSS receivers applied in sports should be small, light, and user-friendly, the manufacturers of wearable GNSS receivers need to find a trade-off between form factor, simplicity, system performance, and cost. Watches and smartphones obviously have limited space for a GNSS antenna and board and limited accuracy is expected, while receivers carried on the back can have a larger form factor. The number of GNSSs and satellites available has increased substantially over the last decade; with NAVSTAR GPS, GLONASS, Beidou, and the launching of Galileo, four functioning global systems are available. The number of GNSSs and satellites used also increases the accuracy and stability of position solutions for applications in sport (Gilgien et al., 2014b). Therefore, GNSS receivers used in sports increasingly tend to combine more than one GNSS. GNSS satellites send information on several frequencies. Use of multiple frequencies helps cancel out inaccuracies caused by the ionosphere. However, most GNSS receivers used in sports use only one frequency. Also, most GNSS receivers used in sports use only the GNSS information from the receiver carried by the athlete to calculate position (standalone solution). Combining the GNSS signal information from the receiver on the athlete with the GNSS information captured by a stationary GNSS receiver in close proximity (short baseline) substantially improves the position accuracy in dynamic applications (kinematic double difference method, hereafter called differential method) (Gilgien et al., 2014b). Further, position accuracy and robustness can be enhanced if GNSS data are combined with inertial measurement technology (IMU) (Skaloud and Limpach, 2003; Wägli, 2009; Fasel et al., 2016). GNSS solutions aimed at sports with reduced position accuracy requirements (i.e., most wrist-worn receivers or smartphones) apply single frequency analysis to one or two GNSSs in standalone mode (Terrier et al., 2000; Edwards et al., 2002; Townshend et al., 2008; Jennings et al., 2010a,b; Wisbey et al., 2010; Aughey, 2011; Clark et al., 2011; Macutkiewicz and Sunderland, 2011; Waldron et al., 2011; Bolger et al., 2015; Sandbakk et al., 2016). However, in sports with high demands

for position accuracy, geodetic GNSS receivers are used in differential mode using multiple signal frequencies from one or several GNSSs to calculate position, speed, and acceleration (Larsson and Henriksson-Larsen, 2001; Skaloud and Limpach, 2003; Wägli, 2009; Andersson et al., 2010; Supej, 2010; Supej and Holmberg, 2011; Supej et al., 2012; Gilgien et al., 2013, 2014a,b, 2015a,b; Bucher Sandbakk et al., 2014; Nemec et al., 2014; Fasel et al., 2016; Kröll et al., 2016). Speed can be derived from time differentiation of the position data, or by using the Doppler principle on the GNSS signal (Zhang et al., 2006; Wang and Xu, 2011; Boffi et al., 2016), acceleration can be derived from position or measured with inertial sensors (Gilgien et al., 2014b; Supej and Cuk, 2014; Boffi et al., 2016). The accuracy of GNSS methods used in sports has been assessed for position (Townshend et al., 2008; Gilgien et al., 2014b, 2015b; Fasel et al., 2016), displacement (Townshend et al., 2008; Coutts and Duffield, 2010; Jennings et al., 2010a; Waldron et al., 2011; Hoppe et al., 2018), speed (Schutz and Herren, 2000; Witte and Wilson, 2004, 2005; Barbero-Alvarez et al., 2009; Coutts and Duffield, 2010; Waldron et al., 2011; Gilgien et al., 2015b; Boffi et al., 2016; Fasel et al., 2016), and acceleration (Gilgien et al., 2013, 2015b). However, most of these validations exhibited at least one of the following limitations: (1) Only one receiver was assessed per study, which does not allow a direct comparison between receivers/studies, since studies were conducted under different GNSS conditions and in different applications; (2) some studies applied a reference method that did not allow for instantaneous accuracy comparisons; (3) between-device reliability was not assessed. Further, only one of the validations focused on accuracy for split times and section times when validating a differential high-end receiver (Supej and Holmberg, 2011).

Therefore, the aim of this study was to assess three different classes of GNSS receivers that are frequently applied in sports for position, speed, and segment time accuracy in endurance racing sports. The receivers assessed were a 1 Hz low-grade wrist-worn receiver (Garmin Forerunner 920XT), a 10 Hz standalone receiver (Catapult Optimeye S5), and a 10 Hz differential GNSS receiver (ZXY Go). The accuracy of the three receivers was assessed by comparison with measurements using a high-end differential, multi-frequency, and multi-GNSS receiver (reference system) (Gilgien et al., 2013, 2014b, 2015b) for position, speed, and time analysis.

### MATERIALS AND METHODS

### Participants and Test Protocol

The data presented in this study were collected during the Norwegian national cross-country skiing teams training camp at Sognefjell, Norway (61◦ 330 53.7900N, 7◦ 590 51.5400E, elevation 1434 m) on May 31, 2017. Six volunteers were recruited from the team's support group. All participants were able skiers, but none of them were actively competing. The participants gave their written consent to participation, and the study was approved by the ethics board at the Norwegian School of Sport Sciences.

All participants were instructed to ski four laps of a specified track section (L = 3048 m, **Figure 1**). Between each lap, they were allowed a rest of approximately 1 min. They were instructed to ski

Frontiers in Physiology | www.frontiersin.org

at a pace close to their own typical racing speed. The participants were divided into two equally sized groups, with group 1 starting at approximately 10:15 a.m., and group 2 at approximately 4:45 p.m. Since GNSS conditions change with time (due to changes in constellations and atmospheric effects), the results were expected to vary between the two groups. The differences are highlighted in the results when these were substantial.

#### Materials

Each participant was equipped with one high-end differential GNSS receiver used as a reference, and the three GNSS receivers whose performance was to be evaluated. The reference system consisted of a differential multi-frequency and multi-GNSS receiver. Specifically, the base station consisted of a GNSS antenna (Grant-G3T, Javad, San Jose, CA, United States) and receiver (Alpha-G3T, Javad, San Jose, CA, United States) and was placed at the start of the ski track allowing for short baseline differential solutions. The athletes carried a GNSS antenna (G5Ant-2AT1, Antcom, Torrance, CA, United States, 160 g) mounted on a cycling helmet, and a GNSS receiver (Alpha-G3T, Javad, San Jose, CA, United States, 430 g) was carried in a small backpack (**Figure 2**). The sampling frequency was set to 10 Hz, which was the same frequency as the highest sampling frequencies of the evaluated receivers.

#### Evaluated Receivers

The Catapult Optimeye S5 (Firmware version 7.18, abbreviated as Cat-S5) has a 10 Hz GNSS with an external antenna, packaged with an IMU in a casing with dimensions: 96 × 52 × 13 mm. The sensor is intended to be worn in a harness on the torso and has a mass of approximately 67 g. In the current study it was placed in the athlete's backpack, close to where it would be placed in the harness (**Figure 2**). The receiver was oriented in an erect position as recommended by the manufacturer.

A Garmin Forerunner 920XT (Garmin International, Inc., Olathe, KS, United States, abbreviated as Gar-920XT) was worn on the wrist. It samples at 1 Hz, has a mass of 61 g, and measures 45 × 55 × 13 mm.

FIGURE 2 | Experimental setup. (A) The reference system antenna was mounted on a bicycle helmet and coupled to the receiver in the backpack (hidden under the start bib). (B) Arrangement of receivers in the backpack. The ZXY-Go receiver was positioned just below the Cat-S5 receiver, and is not visible in this image. The Garmin receiver was worn on the wrist, and is also not visible. (C) The three evaluated receivers (on top, from left to right): Gar-920XT, Cat-S5, and ZXY-Go. Below: Reference receiver (Javad Alpha-G3T).

The ZXY-Go system (ChyronHego Norge A/S, Oslo, Norway) consists of tracking receivers intended to be worn in a harness on the torso. They measure 45 × 90 × 15 mm, have a mass of 63 g, and sample at 10 Hz. The current version of the receivers did not have local storage and data were sent in real time to a base station and were processed using a post-processing approach. This implies that position solutions were only calculated in periods when the receiver on the athlete was in the line of sight of the base station, which was not the case for the entire track (**Figure 3**). Future versions of this type of receivers tailored for the endurance sports market are expected to have local storage and/or a different radio transmission technology, avoiding the line-of-sight limitation. For post-processing, the GNSS data of the base station from the reference system were used. In the current study the receiver was placed in the athlete's backpack, directly beside the Cat-S5 receiver, and was oriented based on the manufacturer's recommendation. All three receivers apply single frequency (L1) analysis on GPS and GLONASS signals.

## Data Analysis

#### Reference System

Geodetic short baseline position solutions were calculated using dual frequency (L1 and L2) data from NAVSTAR GPS and the GLONASS satellite systems. The ambiguities of the differential position solutions were solved for all athletes and the entire time periods when athletes were skiing, using the kinematic algorithm of the geodetic post-processing software Justin (Javad, San Jose, CA, United States).

#### GNSS Position Solution Calculation

The conditions for GNSS measurements were excellent, with a position dilution of precision (PDOP) of 1.23 ± 0.15. Data from the ZXY-Go system were processed by ZXY staff according to their best practice principles, but were not filtered by the manufacturer. To reduce system bias, the GNSS base station data of the reference system were used, before they were sent to the authors as text files. GNSS solutions for Cat-S5 and Gar-920XT were calculated using their respective automated processing procedures and position results were exported to text files using

Catapult Sprint software version 5.1.7, and Fit CSV Tool version 1.0.12.20, respectively. Data from the Cat-S5 and Gar-920XT were passed through their manufacturer's proprietary filters. GNSS coordinates were expressed in the WGS84 coordinate frame. The Cat-S5 and Gar-920XT adjust for geoid height. The offset between orthometric height and GPS ellipsoidal height was calculated to be 46.022 m at the recording location, and was removed from the data (Wong and Gore, 1969). All subsequent analyses were conducted using Matlab R2017a (The MathWorks, Natick, MA, United States).

#### Time Synchronization

The ZXY-Go receivers were synchronized with the reference receiver using its GPS time stamps. Both Gar-920XT and Cat-S5 lacked support to export accurate GPS time. Therefore, they were first synchronized using their local time. In a second step, the synchronization offset (1t) from the reference receiver time was estimated from the slope of the position difference (1s) vs. speed (| v| ) relationship:

$$
\Delta s = |\nu| \times |\Delta t| + k.
$$

Here 1s refers to the position difference along the skiing direction, defined as the reference receiver's horizontal plane velocity vector (**Figure 4**). The constant k was the systematic offset due to different antenna mounting positions (see the section "Correction of antenna mounting locations"). The regression was performed using a robust regression scheme with a bi-square weighting function and a tuning constant of 4.685.

#### Correction of Antenna Mounting Locations

The position data from each evaluated receiver were corrected for the typical offsets due to different anatomical mounting locations. Specifically, mean displacement vectors between markers positioned close to the different GNSS receivers'

positions were calculated based on optical motion capture marker positions from a previous study (Myklebust et al., 2015; Gløersen et al., 2017). A marker on the superior section of the head was used to represent the reference antenna location; a marker on the 10th thoracic vertebra was used to represent the two receivers in the backpack; and a marker located on the distal end of the left radius was used to represent the wrist-worn receiver. These vectors were added to the evaluated receiver position measurements (by transforming them to the East-North-Up coordinate frame). Specifically, the two receivers in the backpack were translated 33 cm forward (i.e., in the skiing direction), and 43 cm vertically upward. The wrist-worn device was translated 5 cm forward, 52 cm vertically upward, and 33 cm medially.

#### Mapping Trajectory

For the time analysis, the GNSS measurements were mapped onto a common trajectory (mapping trajectory). Because of the relatively narrow ski track (approximately 3 m), each athlete's position in the direction perpendicular to the track was neglected. The trajectory computed from the reference system from the first lap of one of the subjects was used as a mapping trajectory. During this lap, our reference receiver had a fixed solution throughout the lap. The coordinates of the mapping trajectory were filtered with a 0.3 Hz low pass filter to remove frequencies caused by postural movements (see the section "Filtering and parameter calculation"). The filtered coordinates were then resampled to every integer meter and interpolated using a cubic spline.

The criterion for mapping onto the mapping trajectory was to minimize the Euclidean distance between a measured position and any given point along the mapping trajectory. Only the two horizontal coordinates were used for the mapping. To avoid situations where the mapping could suddenly jump to incorrect sections of the track (i.e., when two sections of the ski track passed close to each other), a piecewise mapping onto track segments of length max (10 m, 1t × 20 m s−<sup>1</sup> ) was performed. Here 1t denotes the time since the last measurement. If there was a gap in the measurements of more than 5 s (relevant only for the ZXY-Go receivers), the mapping was done onto the whole mapping trajectory for the next position measurement. To minimize the likelihood of the solver finding only a local minimum, the track was partitioned into four sub-segments (**Figure 1**), and only the solution that returned the minimal Euclidean distance was kept.

The distance along the track was calculated from a piecewise linear curve through the mapping trajectory, starting at the first point and ending at the mapped position, with a node every integer meter. The start time was defined as the time of the reference system at the first sample after crossing the virtual start position, i.e., the first sample with a non-zero distance along the mapping trajectory.

#### Filtering and Parameter Calculation

The reference method measurements were filtered using smoothing splines weighted by their fixed/float status and predicted accuracy (Skaloud and Limpach, 2003) using a smoothing parameter of p = 0.995, as implemented in Matlab's curve fitting toolbox. In a second filtering step, weights were set equal to zero for any samples having an acceleration norm greater than 25 m s−<sup>2</sup> , before reevaluating the smoothing spline. The smoothing spline was evaluated at the same times as the evaluated receivers, enabling an estimate of the reference receiver position at the time of each receiver's position measurement.

Because the receivers were not positioned on the same anatomical locations, the GNSS positions of all receivers (including the reference receiver) were low pass filtered using a second order Butterworth filter with a cutoff frequency of 0.3 Hz. This cutoff frequency was determined based on the frequency spectrum of similar anatomical locations during treadmill ski skating. Specifically, the displacements of the head, hand, and 10th thoracic vertebra were determined using marker positions sampled at 250 Hz [data from previous study (Myklebust et al., 2015; Gløersen et al., 2017)]. The frequency spectrums of these measurements indicated that most of the signal's power was confined to frequencies greater than 0.5 Hz. Velocity was calculated from differentiation of the position data using a five-point finite difference algorithm (Gilat and Subramaniam, 2008), and was filtered with the same 0.3 Hz low pass filter as the position measurements.

Horizontal plane speed was defined as the vector magnitude of the easting and northing velocity vector components. Speed along the mapping trajectory was obtained from numerical differentiation of the distance moved along the track using the same five-point finite difference algorithm (Gilat and Subramaniam, 2008), and was filtered using the same filter as the horizontal plane speed. Distance covered, i.e., the length of the trajectory traveled by the athlete, was calculated as the cumulative sum of Euclidean distances between each horizontal-plane GNSS position measurement. Hence, the distance covered could be calculated for each position measurement from the receivers. Due to gaps in the ZXY-Go position measurements (periods when the receivers did not have radio contact with the base station), distance covered could not be evaluated for the ZXY-Go receivers.

Both Catapult and Garmin calculate their own measurements of speed and distance covered using proprietary algorithms. Because of the filtering procedure specified in the previous paragraph, and to ensure a fair comparison against the ZXY-Go measurements, we decided to perform identical speed and distance covered calculations based on the GPS positions for all evaluated receivers. Deviations between proprietary measurements of speed or distance covered and the calculations performed in this study are briefly discussed later.

#### Time Analysis

To evaluate the time difference between athletes at identical positions along the mapping trajectory, the timestamps from each receiver were linearly interpolated to every integer meter along the mapping trajectory. Using the evaluated time points, both a split time (i.e., time from the common start time to any given position along the track) and a segment time (i.e., time between two given positions along the track) analysis were conducted. In both analyses, all 276 possible pairs of laps were analyzed.

In the split time analysis, the time difference between each pair of laps was calculated for every integer meter (starting at 10 m), disregarding measurements where one or more receivers

were not recording. The segment time analysis also compared all possible pairs of laps, and the track was divided into equal length segments between 20 and 180 m, in steps of 20 m. The first segment started at the start line, and the subsequent segments started every 20 m. The time taken to complete the segment in each possible pair of laps was then compared. Segments where the ZXY-Go receiver was missing data at the end points were omitted from the analysis. Time analysis precision and accuracy of each GNSS method were then judged from the difference to the reference receiver results.

#### Statistics

Position errors were quantified as horizontal plane deviations (vector magnitude), vertical deviation, and the difference in distance measured along the mapping trajectory. The error distributions were visualized as histograms displaying the count density in each bin, where the bin spacing was chosen according to the Freedman–Diaconis rule (Freedman and Diaconis, 1981). The area of the histogram columns was normalized to unity. For the speed we calculated the difference in horizontal plane speed, and the difference in speed along the mapping trajectory. Robust statistical measures were used as descriptive statistics of the distributions. Specifically, median error (Med) and interquartile range (IQR) were used to quantify accuracy and precision, respectively. For the strictly positive horizontal plane deviations, distribution mode and third quartile (Q3) were used instead. In addition, the typical error of the estimate (TEE) was calculated as described by Hopkins et al. (2009) to allow comparison with studies where TEE was used. Measurements with more than three median absolute deviations from the median were considered outliers and were omitted from the calculation of TEE. The 95% confidence intervals for the statistics were calculated using a bootstrap approach valid for stationary time series (Politis and Romano, 1994). Each empirical distribution was subsampled block-wise using block lengths of n 2/3 , where n was the number of measurements in the empirical distribution. All statistics are presented in the text as 95% confidence intervals. Two of the laps contained short periods (a few seconds) where the reference receiver's position ambiguities could not be resolved (i.e., the double difference ambiguities were float and accuracy not as good as when ambiguities are fixed). These two laps were omitted from the analyses of distances covered, because the reduced accuracy of the reference receiver during these time periods will affect measurements of distance covered throughout the lap.

### RESULTS

Results are reported directly in the text or figures, but main results are summarized in **Tables 1**, **2**.

#### Position Errors

Typical horizontal plane position errors were similar for the ZXY-Go and Cat-S5 receivers (distribution modes [0.46, 1.21] and [0.34, 0.51] m, respectively), but the ZXY-Go exhibited a heavier tail (Q3 [1.79, 2.55] m compared to Cat-S5 [0.95, 1.11] m. See also **Figures 5A,D**). The Gar-920XT receiver showed substantially


All errors are reported as the observed value with 95% confidence intervals. Distribution mode and third quartile (Q3) are reported for vector magnitudes, otherwise median, IQR, and TEE are used. Nomenclature: δxy, horizontal plane position error; δz, vertical position error; δl, error along mapping trajectory; δ| v| xy, horizontal plane speed error; δ| v| <sup>l</sup> , error in speed along mapping trajectory; δd<sup>s</sup> t −1/2 , stochastic error in distance covered normalized by the square root of time elapsed since the start of the lap; δt, split time error.


TABLE 2 | Accuracy (median error) and precision (IQR and TEE) of the evaluated receivers' position measurements, calculated for each individual lap.

The results are presented as the mean ± SD of all laps. When calculating TEE, measurements with a Euclidean difference exceeding three median absolute deviations from the median were considered outliers and were omitted from the analysis. The fraction of discarded measurements is presented in the last column.

FIGURE 5 | Position errors. (A,D,F) Distributions of horizontal plane errors for ZXY-Go, Cat-S5, and Gar-920XT, respectively. Dashed lines, distribution mode; dotted lines, third quartile. Horizontal axes are equally scaled. (B,C,E) Distributions of vertical error for ZXY-Go, Cat-S5, and Gar-920XT, respectively. Dashed lines, median error; dotted lines, IQR. Horizontal axes are equally scaled. The vertical error distributions of the two standalone receivers were clearly multi-modal, which suggests that the offset changed with time (as indicated by the different color saturation for the two groups of skiers, G1 and G2). Therefore, the analysis was also done on a lap-by-lap basis to evaluate accuracy and precision over shorter (∼9 min) time intervals (Table 2).

larger errors compared to the two other receivers (distribution mode [2.54, 3.28] m, **Figure 5F**).

The vertical position accuracy was best for ZXY-Go (distribution median [−1.50, 2.61] m), while Gar-920XT underestimated (median [−4.41, −0.54] m) and Cat-S5 overestimated (median [4.70, 5.47] m) vertical position slightly. However, as is apparent from **Figure 5**, the vertical accuracy changed substantially between the two groups of participants who started at different time points, especially for the Gar-920XT and Cat-S5 receivers. This implies that the IQR calculated from the aggregated data probably overestimates the expected variation over a typical race duration. Therefore, the median and IQR of the position deviations (both vertical and horizontal plane) were also calculated on each individual lap. The results of this analysis are presented in **Table 2**, and show that the IQRs of vertical deviation evaluated over a single lap were 1.16 ± 1.16, 0.92 ± 0.28, and 1.57 ± 0.31 m (mean ± SD of all laps) for ZXY-Go, Cat-S5, and Gar-920XT, respectively.

### Mapping Onto Mapping Trajectory

To reduce the position error, position data were mapped onto the mapping trajectory. The error in mapped position, measured as the distance between the receiver position and the reference position along the mapping trajectory, was similar for ZXY-Go and Car-S5 (IQR [0.80, 1.51] and [0.65, 0.81] m, respectively, **Figure 6**), while Gar-920XT exhibited a substantially larger error (IQR [3.93, 4.66] m). Example measurements and their corresponding mapped coordinates are plotted in **Figure 6B**.

#### Speed Errors

The horizontal plane speed error distributions are plotted in **Figures 7A,C,E**. The ZXY-Go receivers were most

FIGURE 6 | Mapping on mapping trajectory. (B) Section of the track showing how a subset of receiver coordinates were mapped onto the mapping trajectory (black line). The gray line shows the trajectory of the reference receiver for the given trial. The dots are the receivers' coordinates sampled at the same time, with 3-s intervals (see legend for color specification). (A,C,D) Distributions of the mapped position errors, quantified as the distance to the reference receiver position along the mapping trajectory [see legend in (D) for color specification].

precise (IQR [0.036, 0.043] m s−<sup>1</sup> ), followed by the Cat-S5 receivers (IQR [0.070, 0.075] m s−<sup>1</sup> ) and Gar-920XT ([0.614, 0.835] m s−<sup>1</sup> ). Both ZXY-Go and Cat-S5 were accurate (median [−0.001, 0.002] and [0.010, 0.013] m s−<sup>1</sup> , respectively), while Gar-920XT overestimated horizontal plane speed (median [0.076, 0.097] m s−<sup>1</sup> ). Speed along the mapping trajectory (**Figures 7B,D,F**) showed similar precision to the horizontal plane distributions (IQR [0.041, 0.067], [0.077, 0.087], and [0.701, 0.927] m s−<sup>1</sup> for ZXY-GO, Cat-S5, and Gar-920XT, respectively), but Gar-920XT accuracy was improved (median [0.002, 0.027] m s−<sup>1</sup> ).

### Errors in Distance Covered

Both Cat-S5 and Gar-920XT overestimated the distance covered during one lap compared to the length of the reference receiver trajectory (median errors 9.0 and 34.8 m, respectively). Precision was also better for Cat-S5 compared to the Gar-920XT (IQR 1.8 and 14.2 m, respectively), as apparent from **Figure 8B**. The variation (IQR) in distance covered between single laps (measured with the reference receiver) was 10.1 m. Hence, the precision of Cat-S5 is better than the differences that can be expected due to different trajectories used by the athletes over a 3.05 km course.

For both Cat-S5 and Gar-920, the time evolution of the error in distance covered was a combination of a linear drift which was equal to the mean error in speed, and a stochastic error (**Figures 8A,C**). The linear drifts were 1.7 and 67 mm s−<sup>1</sup> for Cat-S5 and Gar-920XT, respectively. If the stochastic errors are independent, identically distributed, and zero mean, Donsker's theorem implies that the mean-squared deviations from the linear trend line caused by systematic errors in speed should increase linearly with time. Although the assumptions of independence (due to the low pass filtering) and identical distributions (due to changing receiver conditions) are

evaluated receivers and the reference receiver. In addition, stochastic errors cause a deviation from the linear drift line which was approximately proportional to the square root of time. The gray shaded regions indicate the RMSD from the linear drift. (B) Box plot of the errors in distance covered evaluated at the end of each lap. Both Cat-S5 and Gar-920XT overestimated distance covered compared to the reference receiver, but Cat-S5 was substantially more precise.

violated in this study, the mean-squared residuals still appeared to increase approximately linearly with time (**Figure 9**), except for some regions of the track. The color-coding in **Figure 9** suggests that changes in skiing speed could explain at least some of these deviations. The slope of the linear regression line of squared residuals was 0.0043 and 0.27 m<sup>2</sup> s −1 for Cat-S5 and Gar-920XT, respectively (**Figure 9**). These findings imply that the expectation value for the error in distance covered increased linearly with time, and that the root mean-squared (RMS) deviation from the expectation value increased by the square root of time (**Figures 8A,C**). Therefore, the stochastic error in distance traveled divided by the square root of time elapsed was approximately constant throughout the lap. For the Cat-S5 and Gar-920XT receivers, the IQR of the stochastic error divided by the square root of time elapsed was [0.085, 0.104] and [0.64, 0.80] m s−1/<sup>2</sup> , respectively (**Table 1**).

### Split Time Analysis

The split time analysis resulted in precision (IQR) values of [0.27, 0.40], [0.12, 0.14], and [0.64, 0.75] s for ZXY-Go, Cat-S5, and Gar-920XT, respectively (**Figures 10A,C,D**). The split time error showed an inverse relationship with speed at the location where the split time was evaluated (**Figure 10B**).

### Segment Time Analysis

Segment time error increased with segment length, but appeared to plateau for segment lengths >100 m, particularly for ZXY-Go and Cat-S5 (**Figures 11A–C**). When averaged over the four segment lengths >100 m, the ZXY-Go receiver's absolute error (Q3) was 0.19 s, Cat-5S was 0.11 s, and Gar-920XT 0.85 s (**Figure 11D**).

### ZXY-Go Data Transmission

The ZXY-Go receivers successfully transmitted data on average for 33% (range: 21–44%) of the track length (**Figure 3**).

## DISCUSSION

The aim of this study was to assess the accuracy of three different classes of GNSS receivers (1 Hz wrist worn, 10 Hz standalone, and 10 Hz differential), to measure position, speed, and segment time

FIGURE 9 | Scatter plots of the squared deviations from the linear drift line in Figure 8 for Cat-S5 (A) and Gar-920XT (B). The residuals are color-coded based on skiing speed. Solid colored lines show the mean-squared deviation at the given time. Black lines are the least squares fit (with zero y-intercept) to all the measurements. If the errors in speed were independent, identically distributed, and zero-mean, the expectation value of the squared error in distance covered would increase linearly with time (by Donsker's theorem). In this experiment, these assumptions are violated due to changing receiver conditions and low-pass filtering of the trajectories. Nonetheless, the after subtraction of the linear drift, the error increases approximately linearly with time.

accuracy in endurance racing sports. The key findings of the study were: (1) there were substantial differences in accuracy between the three GNSS receivers, which need to be considered if applied to endurance racing sports; (2) split time error was strongly dependent on (and inversely related to) the athlete's speed; and (3) segment time error increased with increasing segment length.

Few other studies have evaluated the performance of multiple GNSS receivers simultaneously in sports applications. One study evaluated three different receivers, but the experiment was aimed at typical team sports exercises (Coutts and Duffield, 2010). Furthermore, most sports-specific GNSS receiver validations have used straight line distances and optical speed traps (or chronometers) as reference measures for distance and average speed (Schutz and Herren, 2000; Townshend et al., 2008; Barbero-Alvarez et al., 2009; Coutts and Duffield, 2010; Waldron et al., 2011). The average speed determined from speed traps is not an ideal reference for evaluating GNSS receiver errors for three reasons: (1) during human locomotor tasks the GNSS receiver will seldom follow a straight line between two speed traps; (2) care must be taken to average over the same time interval, particularly if the sampling interval is not negligible compared to the averaging time; and (3) average speed provides only limited insight in sport applications. Therefore, to assess receiver position and speed the reference tracking system should be capable of measuring the true instantaneous trajectory of the receivers, using systems such as video-based tracking (Gilgien et al., 2013, 2014b, 2015b; Fasel et al., 2016), reflective marker-based tracking (Nedergaard et al., 2015) or, as in this study, a high-end GNSS receiver previously validated against video-based systems or similar. This study extends previous studies on sport-specific GNSS applications in three ways: (1) by comparing three different GNSS receiver technologies under the same conditions; (2) by comparing the trajectories in a dynamic situation where each receiver's position could be validated instantaneously by comparison with the reference receiver's smoothing spline; and (3) by investigating the accuracy of split times and segment times obtained from GNSS receivers aimed at the sports market, in a situation relevant for typical endurance racing sports (i.e., running, cycling, or cross-country skiing).

### Position Error

Position itself was not of primary interest in this study, as differences in choice of trajectory were not assessed in the performance analysis. However, position error was of interest since speed, split, and segment time are derived directly from position. Comparing the instantaneous position errors found in this study with the instantaneous position error found in a GNSS method validation in a racing sport application (Gilgien et al., 2014b), indicates that not only the GNSS method applied but also the receiver and antenna type and positioning of the GNSS antenna on the athlete play an important role in position error. The GNSS conditions (PDOP) were comparable between the studies, being very good in the alpine skiing study and excellent in the present study, while the dynamics were more pronounced in the alpine skiing study, resulting in overall more challenging measurement conditions in the Gilgien et al. (2014b) study. The present study agrees with the findings of Gilgien et al. (2014b) that position error can be reduced by using a differential solution (ZXY-Go) compared to a standalone solution (Gar-920XT) and shows that there are substantial differences between different standalone solutions. Although the Cat-S5 receiver and the ZXY-Go receiver were similar in many of the evaluated parameters, there was a clear indication that the ZXY-Go measurements were less robust than those obtained with the Cat-S5. This can be clearly seen from the heavier distribution tail in **Figure 5** and the number of outliers in **Table 2**. One explanation is that the ZXY-Go position solutions were not filtered by the manufacturer, leaving potential for further position accuracy enhancement (although all receivers were filtered using the same low pass filter in the data processing). Comparing the standalone GNSS solutions of the Gar-920XT and Cat-5S with the standalone GNSS code position method (E) in the Gilgien et al.

(2014b) study, the position error was substantially larger for the Gar-920XT and smaller for Cat-S5. A more than 10 times larger error was found for the kinematic differential solution by ZXY-Go compared to a similar solution (Gilgien et al., 2014b) in the alpine skiing study. The fact that a geodetic high-end receiver was used in the alpine skiing study, combined with the large differences in position error for a given GNSS method between the present study and the Gilgien et al. (2014b) study, indicate that not only the GNSS method applied but also antenna and receiver size and quality are of importance for position accuracy in sport applications. Hence, the large position errors in the Gar-920XT might be associated not only with the heavily compromised antenna size and the receiver quality and processing procedure, but also with the mounting point on the athlete. The mounting point of the Gar-920XT, the wrist, which is swung forth and back continuously during skiing, causes changes in antenna orientation and GNSS signal reception, which challenges the GNSS processing (Weaver et al., 2015). GNSS signal shading by the athlete's body may also reduce the performance of the Gar-920XT compared to the other receivers.

#### Speed Error

With respect to horizontal plane speed, a study comparing five GNSS receivers ranging from a mobile phone receiver to a high-end differential receiver found larger errors in speed for a standalone wrist watch and a standalone handheld receiver than the present study could find (Supej and Cuk, 2014). The authors related the large error partly to the latency of about 2 s in the speed readings of these receivers. Latency effects were removed in the present study using the time synchronization

ZXY-Go, Cat-S5, and Gar-920XT, respectively. Maximal whisker length is 1.5 × IQR. Horizontal grid lines are equally separated (0.5 s). (D) Third quartile of the absolute segment time error, with error bar indicating 95% CI. Segment time error increased with increasing segment length, but started to flatten out for segment lengths of 100 m, particularly for ZXY-Go and Gar-920XT. The dashed lines show the mean of segments with length >100 m.

procedure. The removal of latency could be an important reason for the reduced speed errors found in the present study compared to Supej and Cuk (2014). An alpine skiing study (Gilgien et al., 2015b), validating the speed of the center of mass approximation using GNSS and modeling, found larger speed errors than the present study. However, these were based on three-dimensional position data and included the error from the modeling approximation of the center of mass. A study conducted on a roller coaster, simulating the dynamics of racing sports, found errors in the range of cm/s for consumer-grade receivers targeted to dynamic applications (Boffi et al., 2016), which is similar to the results of the present study.

We found only minor differences in the precision of speed measurements between horizontal plane speed and speed along the mapping trajectory. However, speed accuracy was improved by the mapping procedure, particularly for Gar-920XT. The speed used in the current study was deduced by differentiating the GPS positions (before or after mapping onto the mapping trajectory). Most GNSS receiver manufacturers calculate speed using other (proprietary) algorithms. For the Gar-920XT, the manufacturer's speed estimate was similar in precision to the speed reported in the current study, but was more accurate (exhibiting only a trivial overestimation). The Cat-S5 can calculate speed based on the Doppler principle. The precision was similar to the speed reported in the current study, but it tended to overestimate speed slightly compared to our reference receiver. A likely explanation for this overestimation is the low pass filter applied to the GNSS coordinates of the reference receiver in the current study. This filtering process removes high frequency movements within each technique cycle, effectively shortening the true trajectory of the receiver prior to differentiation. In contrast, the Doppler method measures speed directly based on the receiver's true trajectory. Therefore, the two speed measurements are not directly comparable even when treated with the same low pass filter.

The accuracy requirements to assess instantaneous speed differences during a race would obviously depend on the specific sport. To elucidate these requirements for cross-country ski racing, we compared the intra-athlete variation in instantaneous speed on successive laps. Specifically, we compared the speed on laps 1 vs. 3, and 2 vs. 4, evaluated at every integer meter along the track (**Figure 12**). From these data it was clear that the Gar-920XT receiver would fail in most instances to report reliable instantaneous speed differences (speed differences were greater than 0.5 IQR for only 43% of the measurements). In contrast, both the ZXY-Go and the Cat-S5 could be used to differentiate typical speed differences observed in this study (speed differences greater than 0.5 IQR in 98 and 94% of the measurements, respectively).

#### Error in Distance Covered

The findings of the current study suggest that errors in the distance covered exhibit a drift that is linear in time and equal to the errors in speed measurement, and a stochastic drift with an expectation value that increases with the square root of time. For many applications, the latter effect will be the most important, for instance when comparing several trials using the same GNSS receiver. The results also show that measurements of the distance covered by a GNSS receiver cannot be used for the time analysis purposes in the current study, because differences in the length of the athletes' trajectories accumulate over time. This problem cannot be wholly resolved by using more accurate position measurements, but requires a common mapping trajectory, as used in the time analysis in the current study.

### Split and Segment Time Error

Cat-S5 can.

An obvious, but important, prerequisite for using GNSS for time analysis is that a "meaningful difference" in performance is encompassed by a position difference greater than the receiver error, for two athletes starting simultaneously. Therefore, and as the results of this study imply, it is beneficial to segment the track so that the athlete has a high speed when passing the segment boundaries. For instance, for the Gar-920XT receiver, split time accuracy (Q3) was 1.90 s where the speed was 2 m s−<sup>1</sup> , and 0.25 s where the speed was 15 m s−<sup>1</sup> at the evaluated position.

Furthermore, the error in the time analysis decreases with decreasing segment length. This is most likely due to correlated position errors at both segment end points, resulting in a cancelation of the errors, given that the track is relatively straight. It is important to note that if the evaluated segment of the track includes a sharp turn and the track points in approximately opposite directions at the endpoints, the errors will most likely no longer cancel. However, as a minimal criterion, short track segments should be avoided for time analysis.

Time analysis accuracy requirements are typically a function of segment duration, since the relative time difference is almost independent of competition duration (Stöggl and Müller, 2008). However, for endurance racing sports, individual choices of pacing strategy (de Koning et al., 2011) or differences in technical skill level can result in considerable differences over relatively short segments. The results for section time accuracy presented in the current study may help to define sections for analysis in which the applied GNSS system provides the required accuracy.

### Methodological Considerations Validity of the Reference System

Under circumstances with excellent conditions for GNSS measurements (PDOP < 2), the reference system used in this study has previously been shown to have a position accuracy of about 5 cm (Gilgien et al., 2014b). This is small, but not negligible, compared to the distribution modes in **Figure 5**. Furthermore, the four GNSS antennas were mounted on different anatomical locations. We corrected for the average position differences by translating the evaluated receiver's position measurements, but individual differences in anthropometrics and changes in posture will introduce deviations from the ideal situations of identical antenna positions. The magnitude of these errors can be estimated by calculating the distances from the head-mounted antenna to the translated wrist or thoracic antennas. Using the measurements from a previous study (Myklebust et al., 2015; Gløersen et al., 2017), this error was estimated to be 0.26 m (RMS) for the wrist-worn receiver, and 0.09 m (RMS) for the backpack-mounted receivers. This is about 10 and 20% of the distribution modes (**Figure 5**) for the wrist-worn and backpack-worn receivers, respectively. It is therefore likely to have had some influence on the calculated errors. The error in speed derived from the reference receiver has not been validated directly, but was estimated to be <10 mm s−<sup>1</sup> using numerical simulations based on the expected position uncertainties (5 cm) and the filtering procedure applied in the current study. This estimate is in agreement with the findings of Boffi et al. (2016), who evaluated speed using a lower-end receiver than the reference receiver used in the current study.

#### Mapping Procedure

The mapping of the measured positions onto a common trajectory was necessary for a successful time analysis, because the distance covered by the individual athletes during each lap varied from lap to lap. We chose to omit the vertical position dimension when performing the mapping procedure. Because the vertical dilution of precision (DOP) is often substantially higher than the horizontal DOP, including the vertical position is likely to reduce the mapped position accuracy.

This mapping procedure would also be useful in calculations of the mechanical work rate of the athletes. On a track with substantial inclines, the energy required to raise the center of gravity is often the dominant work athletes need to perform. Having accurate measurements of vertical position is a key prerequisite to making reliable estimates of this work. The validity of mechanical work rate estimations using GNSS receivers was not addressed in the current study, but it should be considered in future studies.

#### Limitations

Because the conditions for GNSS measurements during these experiments were excellent, our findings reflect a best-case situation. Therefore, further assessment in sub-optimal conditions (higher PDOP and more challenging signal multipath conditions) is necessary to investigate how the different receiver methods are affected by changes in measurement conditions. Furthermore, the accuracies reported here cannot be generalized to sports with substantially higher speeds or accelerations (e.g., motor sports or alpine skiing). Large vertical speed and displacement can also cause the receiver accuracy to deteriorate, because of changes in the atmospheric signal transmission properties.

The differential receiver (ZXY-Go) evaluated in the current study did not have local storage and, due to frequent lack of line of sight, lost the data transfer link between receiver on the athlete and the base station, leading to loss of data in those time periods. However, both these issues can be resolved in future receivers. Because small carrier-phase differential receivers have the potential to substantially increase the three-dimensional accuracy of position tracking in sports applications, we decided to include this receiver in the study even if the current version is not suitable for time analysis in cross-country skiing.

Between-device reliability and test–retest reliability were not addressed in the current study, but could be of interest for further research.

#### SUMMARY AND CONCLUSION

The results of this study revealed substantial variation in the accuracy obtained using commercially available GNSS receivers aimed at sports applications, which should be considered when a GNSS receiver is chosen for a specific application in endurance racing sports. In summary, the ZXY-Go (differential) and Cat-S5 (standalone) receivers performed substantially better than the wrist-worn Gar-920XT receiver for horizontal plane position,

#### REFERENCES


speed, and time analysis calculations. The receiver's horizontal plane speed errors suggested that the ZXY-Go and Cat-S5 can detect typical instantaneous speed differences in cross-country ski racing, while the Gar-920XT cannot.

### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the Norwegian Centre for Research Data. The protocol was approved by the Ethics Committee at the Norwegian School of Sport Sciences. All subjects gave written informed consent in accordance with the Declaration of Helsinki.

### AUTHOR CONTRIBUTIONS

ØG, JK, and MG: conception and design, and data collection. ØG and MG: data analysis. MG: manuscript draft introduction. ØG: manuscript draft methods and results. ØG, MG, and JK: manuscript draft discussion. All authors contributed to manuscript revision, read, and approved the submitted version.

### FUNDING

This study was funded by the Norwegian School of Sports Sciences, Oslo, Norway; Olympiatoppen, Oslo, Norway; and the Norwegian Research Council (project 216699).

### ACKNOWLEDGMENTS

The authors are grateful for the assistance of the Norwegian Ski Federation, which provided accommodation during the data collection period, and to the participants who volunteered for the study.

cross-country skiers. Scand. J. Med. Sci. Sports 24, 708–716. doi: 10.1111/sms. 12063



ramp and outdoors on snow. Sport. Biomech. 14, 273–286. doi: 10.1080/ 14763141.2015.1052543



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Gløersen, Kocbach and Gilgien. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

digital media

of impactful research

article's readership