Reliability of Wearable-Sensor-Derived Measures of Physical Activity in Wheelchair-Dependent Spinal Cord Injured Patients

Physical activity (PA) has been shown to have a positive influence on functional recovery in patients after a spinal cord injury (SCI). Hence, it can act as a confounder in clinical intervention studies. Wearable sensors are used to quantify PA in various neurological conditions. However, there is a lack of knowledge about the inter-day reliability of PA measures. The objective of this study was to investigate the single-day reliability of various PA measures in patients with a SCI and to propose recommendations on how many days of PA measurements are required to obtain reliable results. For this, PA of 63 wheelchair-dependent patients with a SCI were measured using wearable sensors. Patients of all age ranges (49.3 ± 16.6 years) and levels of injury (from C1 to L2, ASIA A-D) were included for this study and assessed at three to four different time periods during inpatient rehabilitation (2 weeks, 1 month, 3 months, and if applicable 6 months after injury) and after in-patient rehabilitation in their home-environment (at least 6 months after injury). The metrics of interest were total activity counts, PA intensity levels, metrics of wheeling quantity and metrics of movement quality. Activity counts showed consistently high single-day reliabilities, while measures of PA intensity levels considerably varied depending on the rehabilitation progress. Single-day reliabilities of metrics of movement quantity decreased with rehabilitation progress, while metrics of movement quality increased. To achieve a mean reliability of 0.8, we found that three continuous recording days are required for out-patients, and 2 days for in-patients. Furthermore, the results show similar weekday and weekend wheeling activity for in- and out-patients. To our knowledge, this is the first study to investigate the reliability of an extended set of sensor-based measures of PA in both acute and chronic wheelchair-dependent SCI patients. The results provide recommendations for sensor-based assessments of PA in clinical SCI studies.


INTRODUCTION
Neurological disorders such as Spinal Cord Injury (SCI) are characterized by the different degrees of impairment of motor and sensory function. Earlier studies have investigated the impact of physical activity (PA) on functional recovery and found a positive effect in various neurological diseases (1)(2)(3). Past intervention studies in SCI focused on the integration of activitybased therapies with various intensities, duration, and type of PA, into rehabilitation programs to improve functional recovery. The outcome of these studies, however, are contradictory, with some of them showing improved strength or functional ability of the upper limbs (4-7) and performance in daily life (8), whereas others could not show any significant effect on the functional recovery (9,10). One reason for such divergent results could be the subjective and non-comprehensive assessments of PA performed by the patient outside the controlled interventions. Thus, PA needs to be objectively assessed to better estimate the effects of interventions and the impact of PA on patient recovery in general.
In the past 15 years, accelerometers and inertial measurement units (IMUs) have been introduced to quantify PA more objectively. The use of accelerometers is well established in health sciences, especially in quantifying PA in the ablebodied population (11), elderly (12), children (13), and patients with various neurological conditions such as stroke (14,15), Parkinson's (16), and multiple sclerosis (17). In SCI, studies have been conducted to develop metrics to capture PA in wheelchairbound SCI patients (18,19).
The levels of PA change throughout the rehabilitation process due to neurological recovery and compensation (20,21), and can differ between individuals (22). Furthermore, they may vary from day to day as well due to environmental factors, but also due to patient characteristics like motivation, or general health status and pain. Therefore, there is a need to quantify how much the PA varies between single days within one patient and how many days are required to account for this variability to obtain a reliable representation of the overall PA level of the subject.
Guidelines on how many days the PA has to be monitored to obtain a reliable representation of the overall PA already exist for healthy adults and children. For healthy adults, a measurement period of 1 week has been suggested (23), while a measurement period of up to 11 days has been suggest for children (24). In older adults, a desired measurement duration of one to 2 days has been reported to achieve good reliabilities for sedentary, low and moderate-to-vigorous physical activity (25). In neurological diseases, e.g., in multiple sclerosis, guidelines suggest 4-6 days for sedentary behavior and 3-7 days for low and moderate-tovigorous physical activity (26). The existing guidelines, however, cannot easily be translated to the SCI population, and especially Abbreviations: SCI, Spinal Cord Injury; PA, physical activity; IMU, Inertial Measurement Unit; ICC, Intraclass Correlation Coefficient; AC, activity counts; LAT, laterality; MET, metabolic equivalent of task; SED, sedentary activity; LPA, low physical activity; MVPA, moderate-to-vigorous activity; DIST TOT , total distance; DIST ACT , actively wheeled distance; VEL, mean velocity during active wheeling. not to wheelchair-dependent patients because of the completely different PA patterns such as wheeling instead of walking.
Because of the novelty of PA research in SCI, no comprehensive guidelines on measurement periods exist for this population. Sonenblum et al. (27) proposed a measurement period of 1 week to obtain reliable estimations of PA related solely to wheelchair usage, such as distance wheeled and duration of wheeling episodes. Yet, this conclusion is drawn from a limited number of patients with different neurological conditions, and only in their chronic stages. Since the variability between days might change between the stages and it might also be different for the different metrics of PA, we propose guidelines for the wheelchair-dependent patients on how many days of measurement are required for various measures of PA during the different stages of rehabilitation after the incidence of SCI. The primary aim of this study was to estimate the reliability of several sensor-based metrics of PA at different time points during the rehabilitation progress. The secondary aim was to compare the reliability of PA measures across days of active rehabilitation and on weekends.

Patients
In total, 63 patients with SCI were included in this analysis, participating in two observational studies (for information about the protocol see Measurement procedure).
Patients suffering from a traumatic or non-traumatic acute SCI with all NLI and levels of lesion completeness were admitted to this study. Any neurological disease other than SCI, and any orthopedic or psychiatric disorders, were considered as exclusion criteria. Additionally, only wheelchair-dependent patients, defined by a value of < 3 in all the mobility domains (12, 13, and 14) of the Spinal Cord Independence Measure III (SCIM III) (28) were considered for the analysis.
The NLI and completeness of the lesion (AIS) was assessed following the International Standards for Neurological Classification of SCI (ISNCSCI) (29). Patients with an NLI from C1 to Th1 were classified as tetraplegic, while patients with an NLI from Th2 to S2 were classified as paraplegic. Recruitment took place from 2014 until 2017, at the sites of the Swiss Paraplegic Center in Nottwil, the Rehab Basel in Basel and the Balgrist University Hospital in Zurich, Switzerland. All patients signed a written consent before participating in the study in accordance with the Declaration of Helsinki. The study was approved by the ethical committees of the cantons of Zurich (KEK-ZH-Nr. 2013-0202), Lucerne (EK 13018), and Basel (EK 34313) and is registered on clinicaltrials.gov (Identifier: NCT02098122).

Measurement Procedure
For this study, the ReSense modules (30) were used as a measurement device. The ReSense modules are compact IMUs recording 3D acceleration, 3D angular velocity, 3D magnetic field strength, and barometric pressure for more than 24 h continuously. By turning all sensors except the accelerometer off, the battery life can be extended to over 2 weeks. In this study, only the acceleration data were used.
At all time points, patients were equipped with several ReSense modules (Figure 1). One sensor measuring acceleration was attached to each wrist with AlphaStrap Blue (North Coast) and Velcro Straps (Velcro) for a duration of three consecutive weekdays to capture upper limb movements. Patients were asked to wear the sensors continuously for about 72 h during day-and nighttime and just take them off for showering or swimming activities. They were told that their amount of activity was being measured and that they should engage in their everyday life actives. Due to the limited battery lifetime, the sensors were exchanged once a day and recharged. Additionally, one module measuring acceleration was mounted on the right wheel of each wheelchair for the duration of seven consecutive days to capture wheeling metrics precisely (18,31).
Data collection was conducted in the context of two observational studies (Figure 2). In the first observational study, patients were measured at five different time points during rehabilitation, each time for 3 consecutive days wearing the wrist sensors, and 7 consecutive days using the wheelchair sensor, respectively. The first four time points were within the clinical rehabilitation facilities ("in-patient"), whereas the last time point took place after discharge ("out-patient"). The inpatient rehabilitation was divided into four distinct stages, which conform to the time windows of the European Multicenter Study about SCI (EMSCI 1 :): very acute (VA), acute 1 (A I), acute 2 (A II), and acute 3 (A III), which are 2 weeks (0-15 days), 1 month (16-40 days), 3 months (70-98 days), and 6 months (150-186 days after injury), respectively. The last time point (out-patient) was defined to be 1 year after injury (chronic stage-C, 300-400 days). It is important to note that at stage A III, some patients were already discharged from the rehabilitation facility and were therefore analyzed within the out-patient group. Dividing the 1 www.emsci.org rehabilitation process into different stages has the advantage of stratifying the clinical picture of the patients and enables to analyze the patient data in short time windows in which a minimal functional change can be expected. In the second observational study, different patients were measured once after discharge for 3 consecutive days wearing the wrist sensors, and 7 consecutive days using the wheelchair sensor, respectively, at least 1 year after injury.
Not every patient was measured at every timepoint due to later recruitment or early drop-out and the number of patients of the 3-day measurement can differ from the number of the patients of the 7-day measurement due to technical issues with the sensors ( Table 1).

Data Analysis And Statistics
The number of included patients varied depending on the specific analysis and time point ( Table 1). For the analyses focusing on the whole in-patient group, data from stages VA, A I, A II, and partly A III of the 1st observational study were pooled. Similarly, data of the whole out-patient group were pooled from the 2nd observational study, and from stage A III (partly) and C of the 1st observational study.

Preprocessing
The desired sampling rate of the sensor was 50 Hz. However, the exact sampling rate of the ReSense sensors can vary between 49 and 51 Hz. Therefore, the raw data was resampled to 50 Hz using a common set of time points for all the modules that were used together (32). The periods of not wearing the sensors were removed from the data using a semi-automatic algorithm. The algorithm labels periods with 20 min of consecutive zero-counts as potential non-wear times (33). Thereafter, the labeled periods were visually inspected by an expert and manually adapted where necessary.

Sensor-Based Metrics
Sensor-based metrics were divided into 4 major categories: activity counts of overall upper limb movement, PA intensity levels (time spent in sedentary PA, low PA and moderate-tovigorous PA), metrics of wheeling quantity (total and actively wheeled distance), and metrics of movement quality (upper limb movement laterality and mean wheeling velocity as a proximate of wheeling performance).

Overall upper limb activity
Activity counts (AC) were used to enumerate total forearm activity in a generalized way and were calculated by applying the discrete integral over the acceleration magnitude in epochs with the length of 1 min (34) and subsequently averaging the AC values over all epochs. AC of the right and left wrist were summed up.

PA Intensity levels
Different intensity levels of PA were defined by using AC cutoff values. These cut-off values were derived from previous energy expenditure measures in combination with IMU data (35). The intensity levels were defined by means of the metabolic equivalence of task (MET) adapted for SCI (36), where sedentary FIGURE 2 | Measurement protocol. This study consists of two observational studies. In the 1st observational study, patients were measured at 5 time points during the rehabilitation process. In the 2nd observational study, a different patient cohort was measured only once, at least 1 year after injury. In stages VA, A I, A II, and partly A III of the 1st observational study, patients were in-patients (red). In the 2nd observational study, as well as partly in A III, and stage C of the 1st observational study, patients were out-patients (blue). At each time point (*), acceleration and angular velocity of the right and left wrists were recorded for 3 days, while the acceleration of the right wheel of the wheelchair was recorded for 7 days. Overall upper limb activity (AC) and PA based on energy expenditure (SED, LPA, and MVPA) were calculated based on the 3 day recordings. All wheeling-related measures (DIST TOT , DIST ACT , and VEL) were calculated based on the 7 day recordings. *pooled data set. Detailed numbers are given for all single stages constituting the 1st observational study: 2 weeks after injury (VA), 4 weeks after injury (A I), 3 months after injury (A II), 6 months after injury (A III), and 1 year after injury (C). In stage A III, in-as well as out-patients are included. For a combined analysis of all in-patients, data of stages VA, A I, A II, and A III (partly) were pooled. Data of stages A III (partly), stage C, as well as the 2nd observational study were pooled for a combined analysis of all out-patients. "Tetraplegics" are defined by a lesion level from C1 to Th1.
activities (SED) corresponded to a MET level below 1.5, low physical activity (LPA) to a MET value between 1.5 and 3, and moderate-to-vigorous activities (MVPA) corresponded to a MET level above 3. SED, LPA, and MVPA are expressed in minutes spent in the respective intensity level per 24 h.

Metrics of wheeling quantity
To calculate wheeling-related metrics, a previously published algorithm (18) was used to (i) detect the phases of wheeling activity by applying heuristic rules, and (ii) to classify these phases into active and passive wheeling by using support vector machine classifiers. The total distance (DIST TOT ) and the distance wheeled actively (DIST ACT ) were extracted from the data and normalized to 24 h.

Metrics of movement quality
Whereas, the three aforementioned categories described how often movements were performed, the following metrics describe how the movements were performed.
Upper limb movement laterality (LAT) represents the symmetry of upper limb movements in general. LAT was calculated by computing the AC in epochs of 2 s for the right and left hand, dividing AC of the right hand and left hand and log transforming this ratio. The median value of the absolute log transform was used for the analysis. Details about the calculation can be found in Brogioli et al. (19). Scores for LAT range from minus to plus infinity quantifying the amount of LAT, with zero for no LAT.
Mean velocity (VEL) can be interpreted as a proximate measure for the quality of wheeling. Patients with improved functional ability will be able to wheel on average faster than patients in earlier stages of rehabilitation, or with more severe impairments.
VEL was defined as the mean absolute velocity of active propulsion, and was extracted using the aforementioned wheeling algorithm (18).

Statistics
First, the single-day reliabilities of all sensor-based metrics were calculated. Then the number of days needed for a reliable measurement was identified.
Single-day reliabilities for AC, SED, LPA, MVPA, and LAT were calculated based on the 3-day measurements, because they require information of the wrist sensors. Single-day reliabilities for DIST TOT , DIST ACT , and VEL were calculated based on the 7day measurements, because they require information of the wheel sensor only.
Single-day reliability was defined as the Intraclass Correlation Coefficient (ICC), which was calculated using a variance portioning approach based on a one-way random effects model, with the random effect being on the subject level (37) where σ 2 s is the between-subject variance and σ 2 res the residual variance. This approach is a well-established method especially in the field of PA research (23,38,39).
The confidence intervals for ICC were calculated based on the exact confidence limit equation (40).
According to Koo and Li (41), ICC values higher than 0.9 are considered as excellent, between 0.75 and 0.9 as good, between 0.5 and 0.75 as moderate, and lower than 0.5 as poor reliability.
To calculate the number of days needed for a reliable measurement (N), the Spearman Brown prophecy formula was used (42), where ICC t is the desired level of reliability and ICC s is the singleday reliability. The desired reliability was set to 0.8, which is considered as an acceptable value according to literature (43).
To assess the relation of the wheeling-related metrics during weekdays and the weekend, equivalence tests were used. For normally distributed data, the Two Sided T-test (TOST) approach was used (44). In TOST, an epsilon (ε) has to be defined that corresponds to the level of practical equivalence (LOPE). We chose ε as the mean value of all the standard deviations of the respective metric: where n is the number of patients, and σ metric i is the standard deviation of the i-th subject for the metric of interest.
For non-normally distributed data, the TOST procedure was adapted by using the non-parametric Mann-Whitney-Wilcoxon Test instead of the Student's t-test.
A sample size calculation was performed according to the method presented in (45). The results of this analysis can be found in Table S1.
Preprocessing and calculation of the output metrics was conducted using MATLAB R2017a (MathWorks, Natick, MA, USA). Statistics were computed using R (The R project for Statistical Computing, R Core Team).

Patient Characteristics
The mean age of all patients was 49.  Table 1.

Single-Day Reliabilities
Single-day reliabilities of metrics of PA varied depending on the time after SCI (i.e., rehabilitation progress) ranging from excellent to poor reliability levels (Figure 3). ICC of metrics describing movement quantity (AC, SED, LPA, MVPA, DIST TOT , and DIST ACT ) tended to decrease during the rehabilitation progress (Figures 3A-C) and decreased e.g., from excellent reliability levels (0.93) for LPA in stage VA to poor levels (0.44) for LPA in out-patients. In contrast, measures describing movement quality (LAT and VEL) tended to increase during rehabilitation ( Figure 3D). Especially, reliability of VEL improved from a poor level of the ICC (0.19) at stage VA to a moderate level (0.66) for out-patients. Overall upper limb activity (AC) showed excellent ICC levels (ICC > 0.92) during the first three acute stages with a decrease at later stages of rehabilitation to a good level (0.79) and a moderate level (0.65) after discharge (out-patient).
Overall single-day reliabilities were higher in tetraplegic patients than in paraplegic patients for most metrics (Figure 4). One exception to this was found in the reliability of MVPA in the out-patients, were the single-day reliability for tetraplegic patients was poor (0.24) and thus lower than the moderate level (0.65) for the paraplegic patients. Furthermore, the single-day reliability of LPA is poor (0.03) in paraplegic out-patients."  , moderate-to-vigorous activity (MVPA), total distance traveled in a wheelchair (DIST TOT ), distance traveled actively in a wheelchair (DIST ACT ), laterality (LAT), and mean velocity during active wheeling (VEL) for wheelchair-dependent paraplegic patients (full circle, solid lines) compared to wheelchair-dependent tetraplegic patients (empty circle, dotted lines) for the in-patients (from 2 weeks after injury to 6 months after injury) and out-patients (> 6 months after injury). The dashed horizontal lines depict the ICC level of 0.8, which was chosen as a requirement for a reliable measurement. Solid and dotted lines indicate the confidence intervals. Indicated patient numbers n are the pooled numbers.

Required Number of Days
A mean reliability of 0.8 is reached when monitoring inpatients for 2 days and out-patients for 3 days for all metrics (Figures 5A,C). A 7-day measurement is estimated to reach excellent reliabilities for all metrics in both in-and out-patients (Figures 5B,D).

Influence of Weekday vs. Weekend
With the chosen LOPEs and a significance level of 0.05, equivalence could be established for DIST TOT as well as DIST ACT between weekdays and the weekend in all in-patients and out-patients, as well as single stages VA and A I (Table 2, Figure 6A). At stages A II and A III, no equivalence could be shown ( Figure 6A) for DIST TOT and DIST ACT . Results for active distance are very similar to total distance for all stages, and thus not presented.
For VEL, equivalence could be shown in all in-patients and out-patients, as well as at single stages A I, A II, whereas in stage VA and A III no equivalence could be established ( Table 2 and Figure 6B).

DISCUSSION
To our knowledge, this study is the first to investigate the reliabilities of a comprehensive set of sensor-based measures of PA in both acute and chronic wheelchair-dependent SCI patients. These findings provide recommendations for the application  2 | Descriptive statistics for total distance traveled in a wheelchair (DIST TOT ), distance traveled actively in a wheelchair (DIST ACT ), and mean velocity during active wheeling (VEL) performed during weekdays and weekends (mean ± SD and median (IQR) for pooled in-patients and out-patients, as well as all single stages, VA (very acute−2 weeks after injury), A I (acute I−4 weeks after injury), A II (acute II−3 months after injury), A III (acute III−6 months after injury).

Workday mean (± SD)
Weekend mean (± SD) The last three columns contain the calculated limits of practical equivalence (LOPE) used for the equivalence tests, the confidence intervals, and the p-values resulting from the equivalence tests. a denotes p-values resulting from the Mann-Whitney-Wilcoxon Test of equivalence for non-normally distributed data, the remaining p-values were calculated using the TOST procedure for normally distributed data.
of sensor-based assessments of PA that enable non-obstructive long term recordings throughout clinical studies in in-and out patients.

Single-Day Reliabilities
Single-day reliabilities of PA metrics depend highly on the clinical condition, i.e., the stage of rehabilitation and the extent of functional impairment. The reliability of measures of movement quantity such as activity counts of upper limb activity, PA intensity levels and wheeling-related metrics decreased during in-patient rehabilitation and in the out-patient setting, while in general was found to be higher for tetraplegic patients. One reason for this might be that in-patients have a more regular daily schedule due to preplanned therapy sessions as observed by therapists at the different rehabilitation centers, resulting in lower variability of daily activities. Reliabilities of metrics of movement quantity are higher in patients with a higher impairment like in tetraplegia and in the early stages of rehabilitation, as the use of the upper limbs for these patients is mostly limited to the very structured therapy sessions, lowering the variability between single days. Additionally, these patients might also reach their upper limits of PA during their daily schedules, resulting in a very low variability between single days.
High reliability levels of upper limb activity in terms of AC compared to the other quantitative metrics suggest that this measure, although widely used in PA-research, may provide a rather rough approximation of PA levels in SCI patients, lacking the detailed information about PA intensity patterns and specific movements like wheeling.
Metrics based on PA intensity levels show lower single-day reliability levels than AC, suggesting that these metrics capture more detailed information about PA levels which likely vary between single days. Another possible explanation could be that the AC thresholding for this analysis introduces noise into the estimates. Future studies are needed to investigate this in more detail.
The reliability of LPA in paraplegic out-patients was considerably lower compared to the remaining metrics based on PA intensity levels. Since the reliability is calculated by dividing the between-subject variance by the total variance (Equation 1), for LPA in paraplegic out-patients, the poor reliability can be explained by a very low variance between the individual patients compared to the variance between the single days. While tetraplegic patients show a much higher between-subject variance, this might be a hint that LPA could be influenced by the level of impairment of the upper limbs. LPA is likely to represent activities of daily living, as demonstrated earlier (35). FIGURE 6 | Boxplots for total distance traveled in a wheelchair (A) and mean velocity during active wheeling (B) during weekdays vs. weekends in all single in-patient stages (VA, AI, AII, AIII), as well as out-patients. */ + denotes p-value of < 0.05, **/ ++ a p-value of < 0.01, ***/ +++ a p-value of < 0.001, respectively. P-values were calculated using the TOST procedure for normally distributed data (*), respectively, the adapted equivalence test based on the Mann-Whitney-Wilcoxon Test for non-normally distributed data ( + ).
Assuming that a large amount of activities of daily living (not involving mobility), e.g., feeding, showering, and dressing are equally presented in each patient, LPA should not vary strongly between single patients. This hypothesis is valid for patients that are not impaired in the upper-limbs, i.e., paraplegics, as can be seen in the low variability of LPA between the patients (LPA: SD: 1.1h, Min: 9h, Max: 13h). In contrast, the level of impairment of the upper limbs varies strongly in the tetraplegic group, which might be the reason for a higher variability of LPA between the patients (LPA: SD: 2.8h, Min: 6h, Max: 15 h).
In contrast to the increased reliability in LPA for tetraplegic outpatients, MVPA shows a decreased single-day reliability in these patients. A possible reason is that these patients are challenged to even reach moderate-to-vigorous intensities (46, 47) and thus show it only occasionally and not in everyday PA.
In contrast to metrics of movement quantity, single-day reliabilities of metrics of movement quality like LAT and VEL increased during the rehabilitation and stayed on a higher level after discharge. During in-patient rehabilitation, patients learn various skills to handle their impairment, e.g., wheeling techniques or compensatory strategies for activities of daily living, which may result in higher variability between single days at the earlier stages of rehabilitation. Moreover, at the beginning of the rehabilitation process arm rehabilitative training is often unilateral (e.g., with ArmeoPower (48) training), leading to a high discrepancy of LAT between specific therapy sessions and leisure time and thus increasing the variability of LAT between different days. Furthermore, the therapy schedules may vary strongly between different days at these stages, which can result in more variable measures of movement quality on different days due to dedicated rehabilitation sessions with specific training aims such as improving the function of the more impaired side in tetraplegics leading to a higher variability in measures of movement quality as can be seen for LAT in the AI stage. After the patients learn certain strategies, they may apply them more consistently during their daily activities, resulting in higher single-day reliabilities at the later stages of rehabilitation. The high reliability of LAT in the stage VA might be due to the fact that patients are mainly bound to the bed, and not showing much PA in general, which may lead to a higher reliability.

Required Number of Days For Reliable Measures
Based on our data, metrics of movement quality should optimally be measured for 4 days to achieve a mean reliability of 0.8, which is commonly used in the field of PA research (23,24,26,49). In the in-patient setting, we suggest measuring metrics of movement quantity for 2 days to achieve a mean reliability of 0.8, while in the out-patient setting measuring on average for 3 days is required to achieve the same reliability. The findings of high reliability of 2-days recordings increase the applicability of sensor measurements in the clinical routine where, especially in acute patients, wearing sensors for too long may be an additional burden. However, we suggest 4 days to capture all analyzed metrics of movement quantity reliably, and one to 2 days for the measures of movement quality in the out-patient setting.
Measuring for 7 days would yield excellent reliabilities for all metrics in all patients, which might be relevant in research and clinical studies where even the detection of small changes in PA patterns may have an impact on outcomes. However, measurement duration is always in tradeoff with clinical applicability and patient compliance.

Difference Between Weekdays And Weekend
We investigated whether it makes a difference to measure PA on weekends or during the week. For this, only wheeling-related metrics (DIST TOT , DIST ACT , and VEL) were analyzed, as 7day recordings were only available from the wheel-mounted sensor. In in-patients as well as in out-patients we could show equivalence of DIST TOT and DIST ACT between weekdays and weekends. This suggests that measurements can be taken on any day of the week, keeping in mind that single days might represent unexpected outliers due to an event not occurring regularly. Nevertheless, the results for the in-patients have to be taken with precautions. Splitting up the in-patients into the single stages, equivalence between weekdays and weekends of DIST TOT and DIST ACT could only be shown at the very early stages of rehabilitation (VA and AI), suggesting that for the stages of A II and A III both weekdays and weekends should be measured in order to obtain a comprehensive picture of the patients' overall PA. During early phases of rehabilitation, patients typically receive individual therapy instead of group therapies and their therapy schedules are less tight as observed by therapist at the different centers. We hypothesize that this could explain the observation of similar amounts of activity on the weekends and during the week. At later stages, however, the therapy schedule of the patients gets tighter during the week, which is why they might use the weekends for recovery. The fact that some patients can leave the rehabilitation facility over the weekend at later stages of rehabilitation might have an additional impact on their different behaviors during weekdays and weekends.
One could hypothesize that in out-patients the wheeling distances differ during weekdays and weekends mainly due to the fact that patients might work during the week and thus show different activity patterns than on the weekends. However, in out-patients, equal wheeling distances (DIST ACT and DIST TOT ) were found during the week and on the weekends, which might indicate that the patients we measured were not yet, or if then only partially back to work (50) or worked rather from home instead of having a working space away from home.
Equivalence of VEL could be shown in all in-and outpatients, suggesting that measurements can be taken on any day of the week to reliably capture VEL. However, when examining individual stages of the in-patient rehabilitation, at stage VA as well as at A III, no equivalence could be shown. In the latter stage, patients showed a higher VEL during the week than on the weekends, which might be due to the integration of sports activities into their therapy schedule, as already shown for ambulatory SCI patients (51). The result found for the stage VA is based on only a limited number of observations as most of these patients do not wheel actively, and thus has to be taken with care.

Comparison to Literature
Our results for the reliabilities of wheeling-related metrics in the out-patients are in line with literature, proposing up to 1 week of measuring wheeling-related PA in wheelchair-dependent chronic SCI patients (27). For AC as well as PA intensity times (SED, LPA, and MVPA), we can only compare our results to those of the able-bodied population. Single-day reliability for AC was found to be moderate in able-bodied individuals (23), which is consistent with our results in the out-patients. Similarly, single-day reliabilities for SED and MVPA are moderate and comparable to our results (23,25). In contrast to a low single-day reliability found in our study, a good single-day reliability for LPA has been reported in the able-bodied population (23). This low single-day reliability happens to be distinctive to the wheelchairdependent SCI population, and thus should not be compared to the able-bodied population.
To the best of our knowledge, this is the first work to analyze the reliability of physical activity metrics in the in-patient setting and thus no comparable data is available.

Choice of Accelerometer Cut-Points
Cut-off points are commonly used to define intensity levels and were established in previous studies for the healthy population (52,53) and for stroke survivors (54). However, appropriate cutoff values depend on populations and type of wearable sensors used (55). Furthermore, changes in those cut-off values directly influence the metrics of intensity levels (56). Therefore, we defined cut-off values specific for our population of interest and for the wearable sensor used in this study. We calculated our cutoff points based on indirect calorimetry values as commonly done in the field (52)(53)(54)(55)(56). Transferring our results to methodologies using different accelerometer cut-off points has to be done carefully, as the influence of the cut-off points on the reliability is unknown, and reliability values might change.

Study Limitations
We would like to emphasize three main limitations of our study. The first one is the moderate sample size particularly in the very acute stage. Sample size is often a problem in SCI research. Especially in very acute stages recruitment of the patients and measuring these is challenging. In reliability studies, low sample size results in larger confidence intervals, making the interpretation of the results more difficult. Nevertheless, our sample size is reasonable if compared to other studies in the SCI population.
A further limitation is recording for only 3 days with the sensors attached to the wrists. Especially in tetraplegic patients, there is a risk of pressure sores caused by wearing the sensor straps for too long. Thus, a longer measurement time would expose the patients to an increased risk of damage to the skin. Furthermore, compliance decreases with increased number of measurement days. This limitation might result in larger confidence intervals.
A sample size calculation was conducted. Assuming an acceptable confidence interval width of 0.2, our sample sizes in the inpatient-setting are sufficient. In the outpatient setting, higher sample sizes would be required to make more precise statements. The problem of large confidence intervals in reliability studies has been addressed previously (45). This issue can be resolved by either increasing the number of subjects or the number of measurement days, and should be considered for further studies.
One limitation in studies using wearable sensors in general are possible behavioral reactions, i.e., subjects could alter their behavior because of the knowledge of being measured. Conflicting statements about the amount of reactivity have been made in literature (57,58). However, the accuracy of PA measures based on wearable sensors is higher than the accuracy of the conventional questionnaires (59,60) and thus better suitable to estimate PA levels.
Lastly, PA intensity levels were estimated from activity counts based on a previous study. Dedicated algorithms for the direct estimation of energy expenditure (35,61) or direct measurements of energy expenditure might lead to slightly altered results, but the latter is very challenging to perform especially with acute patients due to the required equipment and the extensive protocol including standardized food intake and calibration phases.

CONCLUSION
We conclude that single-day reliabilities of metrics to capture PA in acute and chronic wheelchair-dependent SCI patients vary considerably depending on the clinical setting. With increasing functional recovery of the patients, metrics of movement quantity tend to become less reliable, whereas metrics of movement quality become more reliable. Depending on the specific metrics, 2 days are required on average to capture PA reliably in in-patients, whereas 3 days are required for out-patients. Furthermore, we suggest using AC only as a rather general measure for assessing the overall PA level of patients, and only in combination with more detailed metrics, e.g., PA intensity levels and wheeling-related metrics. This avoids a possible loss of information about the variability of PA during a whole day. Our results are based on a reasonable sample size for this population and thus provide robust recommendations on how to design clinical studies investigating PA as a primary outcome, or as a confounder in intervention studies in order to better evaluate the actual intervention effect.