Validity and Reliability of a Phone App and Stopwatch for the Measurement of 505 Change of Direction Performance: A Test-Retest Study Design

Purpose: The aim of this study was to explore the validity and reliability of a phone app [named: change of direction (COD) timer] and stopwatches for the measurement of COD performance. Methods: Sixty-two youth basketball players (age: 15.9±1.4yrs., height: 178.8±11.0cm, and body mass: 70.0±14.1kg) performed six trials of 505 COD test (with the left side being the plant leg first, then the right side). The completion time was measured simultaneously via timing gates (with error correction processing algorithms), the phone app, and stopwatches. Results: There was an almost perfect correlation and agreement between timing gates and COD timer (r=0.978; SEE=0.035s; and LoA=−0.08~0.06s), but a lower correlation and agreement between timing gates and stopwatch (r=0.954; SEE=0.050s; and LoA=−0.17~0.04s) with statistical significance in completion time (ES=1.29, 95%CI: 1.15–1.43, p<0.01). The coefficient of variation revealed similar level of dispersion between the three timing devices (timing gates: 6.58%; COD timer: 6.32%; and stopwatch: 6.71%). Inter-observer reliability (ICC=0.991) and test-retest reliability (ICC=0.998) were excellent in COD timer, while the inter-observer reliability was lower (ICC=0.890) in the stopwatches. Conclusion: In the 505 COD test, the COD timer was able to provide a valid and reliable measurement. On the contrary, stopwatch was not recommended because of large error. Thus, if timing gates are unavailable, practitioners can adopt the COD timer app to assess 505 COD speed times.


INTRODUCTION
Change of direction (COD)encompasses the skills and abilities needed to change movement direction, velocity, or modes (Nimphius et al., 2017), which plays a pivotal role in matchwinning situations of team sports (Taylor et al., 2017;Wen et al., 2018;Loturco et al., 2019;Stojanović et al., 2019). Naturally, evaluating players' COD performance has received great attention from coaches and sports scientists (Baker and Newton, 2008;Chaouachi et al., 2012;Nimphius et al., 2017). The 505 COD test is one of the developed protocols to measure COD performance and involves a high-intensity cut which is often performed in competitions, thus is widely applicable to many team or racquet sports (Gabbett et al., 2008;Stewart et al., 2014;Nimphius et al., 2016). In addition, by measuring the completion time of left and right sides (as defined by the plant limb), the 505 COD test can be used to assess the imbalance between limbs (Wen et al., 2018). Furthermore, the duration of the 505 COD test is relatively short (2-3 s; Draper and Lancaster, 1985;Sayers, 2015) in comparison with other tests (~13 s; Pauole et al., 2000;Lockie et al., 2013Lockie et al., , 2014Wilkinson et al., 2019), which means it places more emphases on COD ability (i.e., some COD tests have come under criticism for having sections of the test overly focused on linear sprinting ability; Nimphius et al., 2016).
In practice, electronic timing gates, radar gun, and photofinish camera technology have been extensively adopted as the gold standard instruments for timing the 505 COD test (Haugen and Buchheit, 2016;Altmann et al., 2018). However, the high cost associated with these methods can make it challenging for practitioners with limited budgets. Meanwhile, the stopwatch is a more portable and less expensive alternative with acceptable relative reliability (ICC = 0.92-0.99; Hetzler et al., 2008;Mayhew et al., 2010). Although previous studies have indicated that manual timing has large absolute errors during linear sprinting tasks (Brechue et al., 2008;Haugen et al., 2016), no study has explored the validity and reliability of stopwatch in COD tests.
More recently, some cost-effective smartphone apps have been developed to measure various components of physical performance, such as vertical jump height or barbell velocity based on the slow-motion function of cameras, and proved to be practical and accurate (Gallardo-Fuentes et al., 2016;Balsalobre-Fernández et al., 2018;Haynes et al., 2019;Perez-Castilla et al., 2021). Among these, the COD timer app was specially developed to measure the completion time during the COD test and has been supported to be valid and reliable by the developers . However, the study only investigated the COD timer app in 5 + 5 COD test, and it is questionable whether its findings can be applied to other COD tests due to the different starting styles between tests (i.e., static vs. flying). Furthermore, the validity and reliability of the COD timer app has not been investigated by a third party other than the developers themselves. Furthermore, several meaningful measures of reliability (e.g., inter-observer and test-retest reliability), which may impact recorded completion time, have not been reported. Therefore, the aim of the present study was to assess the validity and reliability of the COD timer and the stopwatch using the 505 COD test.
We hypothesized that the COD timer app would be a valid and reliable alternative for the measurement of completion time in the 505 COD test, with better validity and reliability than the stopwatch.

Participants
Sixty-two healthy, youth basketball players (age 15.9 ± 1.4 yrs., height 178.8 ± 11.0 cm, and body mass 70.0 ± 14.1 kg) with at least 4 years of basketball training experience volunteered to participate in this study. Based on the work of Balsalobre-Fernández et al. (2019), a minimum sample size of 42 was determined from an a priori power analysis using G*Power (Version 3.1, University of Dusseldorf, Germany) based upon an effect size of 0.19, a power of 0.95, alpha level of 0.05, and correlation among repeated measures of 0.964. Prior to the study, the subjects were informed of the test procedure and the potential risk. Written informed consent was obtained from participants and their coaches in advance. Ethics approval was provided by the Shanghai University of Sport.

Design and Procedures
The present study used an observational design where data were completed for the 505 COD test in a single session. All the trials were timed simultaneously via the timing gates (Smartspeed pro, Fusion sport, Australia), a phone app (COD timer, Apple Inc., United States), and three different individuals using stopwatches (SW141, Sekio, Japan), and the results were compared in order to perform validity and reliability analysis with statistical procedures. Times were measured to the nearest 0.01 s. All tests were performed during the afternoon (4 p.m. ~ 6 p.m.) in similar temperature (24 ~ 26°C) and humidity (76 ~ 80%) conditions in 2 days.

Timing Gates
A pair of timing gates with error correction processing algorithms (Smartspeed pro, Fusion sport, Australia) were placed at the finish line. A distance of 2 m was adjusted between the infrared transmitter and the reflector. The height was set at approximately 0.9 m off the ground, corresponding to subjects' hip height as previously recommended and to avoid the timing gates being triggered prematurely by a swinging arm or leg (Dos'Santos et al., 2019). The timing gates with error correction processing (ECP) algorithms sampling at 1000Hz (accuracy to 1/1000th of a second) was considered as reference to measure the completion time of the trials in this study (Strutzenberger et al., 2016;Altman et al., 2018).

COD Timer
The COD timer app was installed on an iPAD (iPAD pro, Apple Inc., United States) with IOS 14.0 operative system. The iPAD was placed in a tripod 6 m away, perpendicular from the lane, and in line with the finishing line. The iPAD recorded Frontiers in Physiology | www.frontiersin.org the video of all trials. The start and finish of each trial was considered as the first frame in which the subject crossed the timing gates with their torso. Two authors from this study analyzed all the video independently twice, 1 week apart.

Stopwatch
Three experienced timers stood perpendicular to, and 3 m away from the lane, with their position in line with the finish line. They were instructed to start and stop their watches independently when the subject's torso passed through the finish line based on their visual perception (Mann et al., 2015). No communication was allowed among timers during the test.

COD Test
Every subject performed 505 COD test with a total of six trials (three trials with the left side being the plant leg first, then the right side for the remaining trials). The test was initiated by each subject and a 3-min rest was provided between each trial. All tests were conducted on a wooden basketball court to ensure ecological validity to the subjects' playing environment. Familiarization was conducted 1 week prior to the formal test, where players were allowed to practice the 505 test, under the supervision of the primary researcher. Prior to the trials, subjects completed a standardized 15-min warm-up protocol including jogging, dynamic stretching (two sets of four knee hug-moving, four walking quad stretches, two inchworms, and two world's greatest stretch on each side), and activation exercises (2 × maximal effort runs for 5 s). After that, subjects performed the 505 COD test. During the 505 COD test, the subjects started from the start line with a standing posture, sprinted through the vertical marker, reached the turning line, turned 180°, and re-accelerated to pass the finishing line as fast as possible (Figure 1).

Statistical Analyses
IBM SPSS Statistics 26 for windows (IBM Co., United States) and JASP 0.9.2 for windows (University of Amsterdam, Netherlands) were used to analyze the data. Validity analysis included two observers (phone app) and three timers (stopwatch) compared to the electronic timing gates in six trials of the 505 COD test. A linear regression with Pearson's r correlation coefficient, the standard error of the estimate (SEE), and the slope of the regression line was analyzed to assess the concurrent validity of the COD timer app and stopwatch, in comparison with the timing gates. Also, to test collinearity, the Durbin-Watson test was used. The strength of the r coefficients was interpreted as follows: trivial (<0.10), small (0.10-0.29), moderate (0.30-0.49), high (0.50-0.69), very high (0.70-0.89), or practically perfect (>0.90; Hopkins et al., 2009). Paired samples t-tests and Bland-Altman plots were used to identify potential systematic bias via mean bias and the regression line on the Bland-Altman plots (Bland and Altman, 1986). Cohen's d was used to assess the mean differences between the measures obtained with each instrument, which was rated as trivial (<0.2), small (0.2-0.59), moderate (0.6-1.19), or large (1.2-2.0; Rhea, 2004). Paired samples t-tests and Cohen's d effect sizes (with 95% confidence intervals) were also calculated to identify mean differences between observers. A one-way ANOVA with Bonferroni post-hoc testing was used to evaluate the differences between the three hand timers. The coefficient of variation (CV) was used to analyze the stability of timing systems, with a CV < 10% considered as acceptable reliability (Atkinson and Nevill, 1998). The interclass correlation coefficient with 95%CI (ICC, two-way random, and absolute agreement) was used to assess the testretest reliability (phone app) and inter-observers' reliability (phone app and stopwatch). ICC was interpreted as following: poor (<0.50), moderate (0.50-0.74), good (0.75-0.89), and FIGURE 1 | Layout of the 505 COD test.

RESULTS
After excluding invalid data, such as slip (three cases) or blurred images (caused by the failure to focus the phone's lens in time; two cases), a total of 367 trials and 1,101 cases were included in the final analysis. All mean date are presented in Table 1.

COD Timer
The COD timer exhibited excellent concurrent validity in 505 COD test in comparison with timing gates (r = 0.978; SEE = 0.035 s; and slope of the regression line = 0.968; p < 0.001; Figure 2). No collinearity was observed in the Durbin-Watson test (d = 2.4). Significant but trivial difference was observed between the COD timer and timing gates (Mean difference = 0.007 s; d = 0.19, 95% CI = 0.09-0.29; p < 0.001). The mean bias and 95% limits of agreement (−0.01 s, 95% CI = −0.08 s-0.06 s) between the COD timer and timing gates revealed a trivial difference. The regression line in the Bland-Altman plot showed no heteroscedasticity in the distribution of the difference between devices as revealed by its regression line (r 2 = 0.006; Figure 3).

Stopwatch
Pearson's correlation coefficient showed a very high relationship between the completion time measured with stopwatch and the timing gates (r = 0.954, SEE = 0.05 s, and slope of the regression line = 0.913, p < 0.001). No collinearity was observed as revealed by the Durbin-Watson test (d = 2.2). Significant and large differences were observed between the stopwatch and the timing gates (Mean difference = 0.067 s; d = 1.29, 95% CI = 1.15-1.43; p<0.001). A systematic bias between the stopwatch and the timing gates (Bias = 0.07 s; 95% LoA = −0.07-0.13 s) was found by the analysis of the Bland-Altman plot. Finally, the regression line in the Bland-Altman plot showed significant heteroscedasticity in the distribution of the difference between devices (r 2 = 0.022; Figure 3). There were significant and moderate to large differences between timers and timing gates (timer1: d = 1.18, 95% CI = 1.05-1.31; timer2: d = 0.61, 95% CI = 0.50-0.72; and timer3: d = 1.12, 95% CI = 0.99-1.25).

DISCUSSION
The aim of this study was to assess the concurrent validity, inter-observer agreement, and test-retest reliability of the COD timer smartphone app and stopwatch in measuring 505 COD completion time. Results showed that the COD timer was highly valid and reliable and can be an appropriate alternative for more economical and portable measurement of COD performance. In contrast, the stopwatch should be avoided due to large measurement errors in 505 COD test.
Compared to the traditional laboratory equipment, smartphone apps have the advantage of being easily affordable to all practitioners as well as being easy to operate, which makes them increasingly viable for sports researchers and fitness coaches (Peart et al., 2019). In agreement with Balsalobre et al.  Frontiers in Physiology | www.frontiersin.org revealed a very high concurrent validity of the COD timer app with respect to the timing gates. The linear regression analysis showed a very high association (r = 0.978) and the slope coefficient was very close to the identity line (slope = 0.968). Simply put, this means that the values measured with both devices were highly consistent and this was supported by the data presented in Bland-Altman plots. Most of the values were close to the mean of the differences between instruments, and the analysis of the regression line between the data points showed a very low r 2 value of 0.006, with a slope close to 0, indicating that the differences between devices were almost negligible. However, significant differences in completion time were observed between COD timer app and the timing gates (p < 0.01), which could be explained by the fact that the sample size was calculated specific to the power analysis, and might be inflated by the type error I to make false inference due to the large number of records. However, when interpreting the effect size data, only trivial differences were evident. Regardless, all of the aforementioned conclusions have been made possible by rapid advancements in technology and have greatly enhanced the functions of smartphones. This is evidenced by the ability to record videos at 240 frames per second (fps) and 1080p quality. In fact, the potential problem of the COD timer app is that the observer needs to select the start and finishing frames manually, which in turn may cause measurement error when calculating completion time. That said, our results suggest that with frame-by-frame analysis, the manual measurement error does not influence the concurrent validity. Meanwhile, a high level of reliability is also necessary for timing devices. To our knowledge, only few studies have considered the inter-observer reliability of the smartphone app. Romero-Franco et al., when testing 40 m sprint, showed a near perfect agreement (ICC = 0.998, 95% CI = 0.997-0.998) and no significant differences between two independent observers (mean difference = 0.004 ± 0.03, p = 0.999). Balsalobre-Fernández et al. also saw a high level of interobserver agreement when measuring the mean velocity of barbell (ICC = 0.941, 90% CI = 0.922-0.955;Balsalobre-Fernández et al., 2018) and the height of CMJ (ICC = 0.999, 95% CI = 0.998-0.999; Balsalobre-Fernández et al., 2015). Similarly, we found that the level of agreement between the two observers was also very high (ICC = 0.991), and inter-observers' differences were not significant with trivial effect size (mean difference = 0.007 s, p = 0.419, d = 0.04). The similar findings seen in these studies and the current study suggest the reliability of the slow-motion apps has been confirmed, and it highlights the usability of the COD timer app. Furthermore, after 1 week, we repeated the operation process of COD timer app to analyze the saved videos again. The result revealed near perfect consistency between the first and the second operation sessions in completion time (ICC = 0.998; Table 1). From a practical standpoint, this means that the practitioners can assess the video repeatedly using the COD timer and it is plausible for them to analyze a large number of trials, when convenient for them. Continuing on this practical theme, it should be noted that the COD timer app only costs 11 USD, which is far cheaper than timing gates and equal to the cost of a stopwatch. Taken together, the COD timer can be considered as a valid, reliable, and cost-effective alternative for practitioners who need to measure the 505 COD test, but without availability of the more expensive electronic timing gates.
On the contrary, although the stopwatch showed high correlation with timing gates (r = 0.954), a large difference was found in completion time (mean difference = 0.067 s, p<0.01, d = 1.29). This was further supported by the poor agreement (r 2 = 0.022) via the regression line in Bland-Altman plot. Interestingly, our study demonstrated that the stopwatch was always slower than timing gates in 505 COD test with a small difference (~0.07 s). This was actually in contrast to the previous literature which has reported faster times in stopwatches, with differences approximately 0.20 ~ 0.24 s compared to electronic timing systems (Brechue et al., 2008;Hetzler et al., 2008;Mayhew et al., 2010;Mann et al., 2015). To the authors' knowledge, no previous studies have explored the validity and reliability of a stopwatch during a COD test. Although the correlation between the stopwatch and timing gates was classified as excellent, the large absolute error still cannot be considered acceptable. Actually, the discrepancy of elite and average players in speed performance is relatively small, the time difference between stopwatch and timing gates is close to the 50th and 10th percentile over 10 m sprint in male soccer players (Haugen et al., 2014). The relevance here being that if practitioner opts to use a stopwatch, the large differences in reaction times may actually mask the inherent variations in COD performance often seen within a group of athletes.
Despite the novelty and usefulness of the present study, there were a few limitations which should be acknowledged.
First, the conclusion of this study can only be applied to the 505 COD test, future research should determine the validity and reliability across other tests, such as the pro-agility test. Second, the COD timer can only be applied to the IOS operation system. Thus, it is necessary to develop an equivalent for android smartphones, which would increase the usability of the app in the field.

APPLICATIONS AND CONCLUSION
The accuracy and repeatability are essential for timing systems when measuring in 505 COD test. The results of the present investigation add to the literature that such short completion time in 505 COD test can be easily, validly, and reliably measured using slow-motion video analysis by COD timer which is available on the App store (Apple Inc., United States). By contrast, stopwatch is not recommended because of the large measurement errors between timing gates and each timers.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, and further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
Prior to the study, the subjects were informed of the test procedure and the potential risk. Written informed consent was obtained from each participant and their parents/coaches in advance. Ethics approval was provided by the Shanghai University of Sport.

AUTHOR CONTRIBUTIONS
CZ: acquisition of data, conception and design of study, analysis of data, and drafting the manuscript. BC: acquisition of data. LK, CB, and LY: revising the manuscript. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
We gratefully appreciate all the volunteers who took part in this study.