Evaluation of Postural Sway in Post-stroke Patients by Dynamic Time Warping Clustering

Post-stroke complications are the second most frequent cause of death and the third leading cause of disability worldwide. The motor function of post-stroke patients is often assessed by measuring the postural sway in the patients during quiet standing, based on sway measures, such as sway area and velocity, which are obtained from temporal variations of the center of pressure. However, such approaches to establish a relationship between the sway measures and patients' demographic factors have hardly been successful (e.g., days after onset). This study instead evaluates the postural sway features of post-stroke patients using the clustering method of machine learning. First, we collected the stroke patients' multi-variable motion-capture standing-posture data and processed them into t s long data slots. Then, we clustered the t-s data slots into K cluster groups using the dynamic-time-warping partition-around-medoid (DTW-PAM) method. The DTW measures the similarity between two temporal sequences that may vary in speed, whereas PAM identifies the centroids for the DTW clustering method. Finally, we used a post-hoc test and found that the sway amplitudes of markers in the shoulder, hip, knee, and center-of-mass are more important than their sway frequencies. We separately plotted the marker amplitudes and frequencies in the medial-lateral direction during a 5-s data slot and found that the post-stroke patients' postural sway frequency lay within the bandwidth of 0.5–1.5 Hz. Additionally, with an increase in the onset days, the cluster index of cerebral hemorrhage patients gradually transits in a four-cluster solution. However, the cerebral infarction patients did not exhibit such pronounced transitions over time. Moreover, we found that the postural-sway amplitude increased in clusters 1, 3, and 4. However, the amplitude of cluster 2 did not follow this pattern, owing to age effects related to the postural sway changes with age. A rehabilitation doctor can utilize these findings as guidelines to direct the post-stroke patient training.

Post-stroke complications are the second most frequent cause of death and the third leading cause of disability worldwide. The motor function of post-stroke patients is often assessed by measuring the postural sway in the patients during quiet standing, based on sway measures, such as sway area and velocity, which are obtained from temporal variations of the center of pressure. However, such approaches to establish a relationship between the sway measures and patients' demographic factors have hardly been successful (e.g., days after onset). This study instead evaluates the postural sway features of post-stroke patients using the clustering method of machine learning. First, we collected the stroke patients' multi-variable motion-capture standing-posture data and processed them into t s long data slots. Then, we clustered the t-s data slots into K cluster groups using the dynamic-time-warping partition-around-medoid (DTW-PAM) method. The DTW measures the similarity between two temporal sequences that may vary in speed, whereas PAM identifies the centroids for the DTW clustering method. Finally, we used a post-hoc test and found that the sway amplitudes of markers in the shoulder, hip, knee, and center-of-mass are more important than their sway frequencies. We separately plotted the marker amplitudes and frequencies in the medial-lateral direction during a 5-s data slot and found that the post-stroke patients' postural sway frequency lay within the bandwidth of 0.5-1.5 Hz. Additionally, with an increase in the onset days, the cluster index of cerebral hemorrhage patients gradually transits in a four-cluster solution. However, the cerebral infarction patients did not exhibit such pronounced transitions over time. Moreover, we found that the postural-sway amplitude increased in clusters 1, 3, and 4. However, the amplitude of cluster 2 did not follow this pattern, owing to age effects related to the postural sway changes with age. A rehabilitation doctor can utilize these findings as guidelines to direct the post-stroke patient training.

INTRODUCTION
A stroke is mainly caused by a lack of oxygen when the brain's blood flow is interrupted by a blockage (i.e., cerebral infarction, CI) or an artery rupture (i.e., cerebral hemorrhage, CH). Stroke patients tend to inherit an irregular postural sway during quiet standing (Chern et al., 2010), which increases the risk of falling. In this regard, evaluation of their quiet standing postural sway is essential.
Researchers have evaluated the quiet standing postural sway of post-stroke patients for many years. However, the postural sway features (e.g., sway amplitude) of the patients can be different because of different patient demographic factors (e.g., days after onset, age, and the influence of the level of damage in lesion regions) (Bansil et al., 2012;Cho et al., 2014;Halmi et al., 2020). Unfortunately, the relationship between postural sway features and patient demographic factors has not been well-studied. Some researchers focused only on sway amplitude. For example, Mizrahi et al. (1989) measured and analyzed the bilateral forces of the supporting limbs of stroke patients and found that they had significantly higher sway activity compared with normal controls. In the anterior-posterior and mediallateral (ML) directions, (Wang et al., 2017) found that stroke patients had a more pronounced center-of-pressure (COP) sway than healthy people. Paillex and So (2005) demonstrated that temporal patterns of the difference between the COP and centerof-gravity could be characterized differently for healthy subjects and patients. Some researchers revealed greater sway activity in hemiplegic subjects compared with normal controls (Mizrahi et al., 1989).
Furthermore, machine-learning methods (e.g., multivariate time-series clustering) can find postural sway features using complete time-series data. For Parkinson's disease, Das et al. (2011) explored the motor symptoms of patients using a motioncapture system and a support vector machine. However, only a few studies have thus far analyzed the differences in poststroke patients' quiet standing postural sway. Furthermore, there is no consensus with regards to the best method or feature set for analyzing motion-capture data to understand and assess post-stroke postural sway.
Hence, this study evaluates postural-sway features of poststroke patients using a motion-capture system with a multivariate time-series machine-learning clustering technique.
The remainder of this paper is organized as follows. Section 2 reports the method of clustering and parameter distribution calculation. Section 3 presents the results. Section 4 discusses the implications of the results. Finally, we present the conclusions in section 5.

METHODS
In this section, we describe our research method (Figure 1). First, the kinematic posture data of patients are measured, as detailed in section 2.1. Then, we extract features, as presented in section 2.2. Next, the clustering method is described in section 2.3. Finally, we calculate the parameter distribution, as detailed in section 2.4.

Study Population
Fujita Health University recruited the study subjects for our research. According to the agreement approved by the University Ethics Committee, all subjects provided informed written consent (HM18-467). We hired 10 male subjects. Five were CH, and five were CI; their statistical information is shown in Table 1. We added new data to previous research results (Li et al., 2021), including ages, days after onset, and hemiplegia. In our previous work (Li et al., 2021), we assumed that CH and CI presented differences in standing posture and leveraged a support vector machine for classification. However, in this paper, we evaluate the postural sway of post-stroke patients in a quiet standing position via clustering.

Patient Data Collection
A 3-dimensional (3D) motion analysis system, KinemaTracer© (KISSEI COMTEC, Matsumoto, Japan), was used to precisely measure the quiet standing posture of post-stroke patients. The system hardware leveraged one recording/analyzing laptop and four charge-coupled-device cameras arranged around a standing platform on a level floor without a handrail. As shown in Figure 2, we attached the markers (30-mm diameter) to the acromion on both sides of the subject: the hip joint (positioned on the line connecting the superior anterior iliac spine and greater trochanter at 1/3 of the distance from greater trochanter), the knee joint (the AP midpoint of the lateral epicondyle of the femur), the ankle joint (exterior), and the toes (fifth metatarsal head). The measurement sampling frequency was 60 Hz for 30 s. A more detailed description of our methodology and  mathematical formulas for data collection can be found in Matsuda et al. (2016) and Li et al. (2021).

Kinematic Collected Variables
In this study, we utilized 33 kinematic variables extracted from the 3D motion analysis system for post-stroke patients' postural-sway evaluation. The kinematic variables contain 3D displacements (X-axis, Y-axis, Z-axis) of the center of mass (COM) and 3D displacements of 10 markers during the 30-s period. Hence, we collected 33 kinematic variables.

Feature Extraction
We extracted features for use when clustering post-stroke patient data with high performance. The methodology of this process is described in Figure 3. First, the variables were offset by setting the location of the ankle markers as midpoints as the patient stands on the front part of the level floor.
Next, we eliminated the anthropometric differences between subjects using the distance between shoulder and ankle markers using Equation (1): where p i is the i-th X/Y/Z-axis position of the kinematic variable; p ′ i is the i-th X/Y/Z scaled marker position; l 1 is the average distance between the shoulder and hip markers; l 2 is the average distance between the hip and knee markers; and l 3 is the average distance between the knee and ankle markers. n is the number of kinematic variables, and the value of n was set to 33.
Next, we used a double-pass, second-order Butter-worth lowpass filter with a cutoff frequency of 12 Hz to filter the p ′ data. Thus, the noise arising from changes in the orientation of the subject's body and other factors during measurement (Abdulhay et al., 2018) were removed. Finally, to increase the data set size for each cluster, we divided every 30-s data sample into several data slots t seconds in length. For example, the division of a 30-s data sample into six 5-s data slots is shown in Figure 4. To determine the best value of t, we tested a range (t = 3, 5, 6, and 10) by evaluating the clustering results.

Clustering
To find the postural-sway patterns of the post-stroke patients and to identify the relationships between the patient body displacement and their different characteristics (e.g., age, days after onset, and hemiplegia side), we used the multivariate timeseries (MTS) clustering method. In a related field, numerous MTS methods have been explored (Montero and Vilar, 2014;Brandmaier, 2015;Genolini et al., 2015;Sardá-Espinosa, 2019).
As one method of MTS clustering, partitioning clustering with dynamic time warping (DTW) clustering is used to evaluate the similarity of different data slots (Malik and Lai, 2017;Rybarczyk et al., 2018). In Figure 5, the DTW compares the similarity between two temporal sequences (Data A and B), which may vary in speed. The DTW is used for temporal sequences, such as video, audio, and graphics data. Moreover, compared with other MTS clustering methods (e.g., a permutation distribution cluster), DTW provides faster calculations (Montero and Vilar, 2014;Sardá-Espinosa, 2019). Therefore, we used the DTW method to evaluate the similarity of data slots from different patients. To illustrate the similarity exhibited by data slots of 5 s in length in the case of postural sway, we present Figures 5B-D which show the similarity in 5-s slot data taken from individuals as well as from different patients. Moreover, we can evaluate the sway amplitude and frequency measured from the data slot as its similarity feature via DTW. Furthermore, this method calculates the distance between all points in the data; hence, the smaller the gap, the closer the match. The DTW could work well on the 33 kinematic variables because the postural sway of human is periodic (Giveans et al., 2011). All periods of sway data are similar. Thus, we assumed they can be detected by DTW. Moreover, to avoid the cut off of such periods, based on the period, we divide the 30 s of data into t-s data slots. Another researcher also used DTW to detect hip sway (Cuntoor et al., 2003).
After assessing the similarity between different slots, DTW clustering divides data slots into K clusters, and each cluster Orange line is the COM X data of a data slot from CH3. (D) Dynamic time warping (DTW) with real postural sway data from different patients in different clusters. The blue line is the COM X data of a data slot from CH2. The orange line is the COM X data of a data slot from CH5. Because the similarity of these two data slots is low, no similarity lines could be drawn.  has a centroid. Here, the cluster implies a group wherein the data slots are more similar than those in other groups. Moreover, the centroid is the cluster center. We attempted two different centroid calculation methods. One is the partitionaround-medoid (PAM) method, and another is DTW barycenter averaging (DBA) (Sardá-Espinosa, 2019).
The PAM is a method to find K-medoids point of clustering (Mannor et al., 2011). First, PAM selects K representative medoids (the most central clusters) to construct an initial cluster. Then, it continuously changes the medoids to find a better cluster representative with more significant reductions in distortion function. In each iteration, the set of best medoids for each cluster forms a new respective medoid. As shown in a 2-dimensional example of Figure 6, the medoid center in the red dot in the right figure is the most central object in the clusters with the smallest sum of distances from other data, which differs from the mean center in the red dot in the left figure.
The DBA refines another method of finding the K-medoids method. Here, medoids were defined as an average sequence of sets of sequences. The cluster was divided based on the distance between the average sequence and sets of sequences (Petitjean et al., 2011). Real postural sway (marker) data is used to further illustrate this process in Figure 7.
We present the set of 5-s data slots as an example. The input data consisted of 186 sets of 33 × 300-dimensional vector data representing the standardized X/Y/Z-axis positions of 10 markers and a COM from 31 sets of 30 s of experimental data. We had 31 of 30 s experimental data. Every 30 s of data was divided into six slots of 5-s data. Hence, we had a total of 31 × 6 (186) sets of (33 × 300)-dimensional vector data. Because the measurement sampling frequency was 60 Hz for 30 s, for 5 s data, the number of columns is 60 Hz × 5 s (300). The DTW clustering compared each vector of one data slot and its corresponding vector of another. We randomly initialized the centroid of the cluster, and to avoid the effect of random errors in a centroid and to choose the best clustering solution, we repeated the process 10 times for the data set.  Afterwards, the method of determination of the best K clusters was introduced. First, to avoid the effect of random centroid behaviors and to choose the best clustering solution, we repeated the process 10 times for each data slot. Next, we evaluated the cluster solution using three cluster indices, including the Davies-Bouldin (DB) index (Davies and Bouldin, 1979), the Calinski-Harabasz (C-H) index (Caliñski and Harabasz, 1974), and the Dunn (D) index (Dunn, 1974), separately. These three indices evaluated the minimum value of the product of mean and the SD of the intra-cluster gap.
The DB is shown in Equation (2), where K is the number of clusters. In cluster i, δ i is the mean gap between data units to their cluster centers, c i . In cluster j, δ j is the mean gap between all data units to their cluster centers, c j . d(c i , c j ) is the gap of cluster centers, c i and c j . The best cluster solution has the minimum DB value.
The C-H index is defined Equation (3), where K is the number of clusters and N is the volume of the data set. The BGSS indicates the sum of squares of the partition between clusters, and WGSS represents the sum of squares of the partition within a cluster. The best cluster result has the biggest indicator value.
In the cluster, the D index is defined as value calculated by dividing the smallest distance (d min ) within the cluster to the biggest distance (d max ), as shown in 4:

Parameter Distribution Calculation
After obtaining clustering solutions of the ts data slot and the K cluster, we analyzed the sway features of each kinematic variable of t-s data slots between clusters by applying a post-hoc test.
Here, based on previous research (Petri, 2002;Paillex and So, 2005;Abdulhay et al., 2018), we calculated three kinds of sway feature parameters for each kinematic variable of ts data slots: amplitude, standard deviation (SD), and sway frequency. Amplitude is defined as the gap between maximum and minimum values of one kinematic variable in one ts data slot. The SD is defined as the standard deviation of one kinematic variable in one t-s data slot. Sway frequency is defined as the frequency value corresponding to the first most prominent peak in frequency domain map by the fast Fourier transform.
Before implementing the post-hoc test, we first determined the post-hoc test method by observing distribution and the homogeneity level of parameters using Shapiro-Wilk (Mohd Razali and Bee Wah, 2011) and Bartlett's tests (Tobias and Carlson, 1969). If the parameter data follow a normal distribution and display homogeneity of variance, we can use Turkey-Kramer post-hoc method. Otherwise, we use a pairwise Wilcoxon test (Pohlert, 2014) with a Benjamini-Hochberg p-value adjustment method. Therefore, we performed a post-hoc test to find which cluster pairs were significant to each parameter.
We consider the case of the left shoulder marker when the data slot is 5 s and K = 4 as an example. First, based on the clustering result, we grouped the 186 data slots into four groups.
Then for each group, we extracted the 5-s left-shoulder markers' x, y, and z values from each slot. Then, we calculated amplitude, SD, and sway frequency for each kinematic variable in each data slot. Finally, we implemented the post-hoc to find the significant kinematic variables for discussion.

General Cluster Performance
In this work, we used two models, DTW-PAM and DTW-DBA. Based on the cluster validity evaluation, we compared the cluster results in which the data-slot time, t, was in the range of 3, 5, 6, and 10 s; K was in the range of 3, 4, and 5; and the method was DTW-PAM or DTW-DBA, as shown in Tables 2, 3. Then, we found that the DTW-PAM model of the t = 5-s data slot with K = 4 was better than the other results. Hence, we inferred from the clustered index that there was a difference in the standing postures of post-stroke patients. Then, only the detailed solution of DTW-PAM was introduced.
Finally, based on the clustering result of the t = 5-s data slot with K = 4 on the DTW-PAM model, we observed and calculated the median value of days after onset, age, and diseasetype percentage from the first to the fourth clusters, as shown in Table 4.

Parameter Distribution Analysis
To analyze the sway features of each kinematic variable of t-s data slots between clusters, first, using the Shapiro-Wilk test, we found that the parameters did not follow a normal distribution and displayed homogeneity of variance. Therefore, we used the pairwise Wilcoxon test (Pohlert, 2014) to perform the posthoc test to find clusters representing significant differences. In Table 5, the post-hoc test subjects and results are listed. For each axis of each body on the left or right side (indexed from 1 to 33), we performed a post-hoc test to determine which cluster pairs had significant differences. The result shows that the differences between clusters are mainly explained by amplitude and SD. In Table 5, the contribution of the shoulder, hip, knee, and COM variables that are particularly significant are colored blue. As a result, we present Figures 8, 9, whose x-axes represent the ML amplitude after normalization and postural-sway frequency, respectively. The kinematic variables of the three axes (x, y, z) are likely to be strongly correlated with each other in Table 5. Thus, only the values of the index in the X-axis (ML) are shown. The left and right sides are similar; only the left side is shown.
Meanwhile, from Table 4, we find that from clusters 1 to 4, the days after onset value decreased. From Figure 8, we observed a pattern from clusters 1, 3, and 4 in which the amplitude increased, meaning that as days from post-stroke onset decreased, the postural-sway amplitude increased.
Then, we created the days-after-onset cluster table to explore the transition pattern of days after onset in CH and CI separately in Table 6, where we used numbers 1-4 and colors from blue to red to represent the clustered data from days after onset. The numbers from 0 to 1,000 indicate the days after onset for Taking the first line of Amplitude as an example, "1&2" means the kinematic variable COM X was shown to be significant in clusters 1 and 2 by the post-hoc test. The contribution of the shoulder, hip, knee, and COM variables were particularly significant (indicated by blue).
Frontiers in Human Neuroscience | www.frontiersin.org each cluster group. We can see that as the days after onset increased, CH patients transited from clusters 4 to 3, to 2, and to 1, in order, whereas for CI patients, they did not show a pronounced transition over time. Moreover, in Table 6, CH3, CI1, CI2, and CI4 was the cluster which was clustered into different clusters, because it is difficult to cluster them perfectly in machine learning. Clusters in which many data slots are clustered are considered to be the main clusters.

DISCUSSION
This study aimed to determine and understand the posturalsway features of post-stroke patients in quiet standing postures. Using DTW-PAM, differences were observed between patient clusters. The markers' amplitude, SD, and frequency indicated that disease-days after onset and disease subtypes (CH or CI) contributed more to postural-sway features than did other features. After analyzing Table 5, we determined that amplitude had a similar significance performance as the SD, and it had greater significance than did frequency in the clusters, meaning that amplitude and SD were more valuable than the frequency in the clusters. In particular, the differences were more pronounced in the shoulder, hip, and knee. This finding may provide a focus area for post-stroke patient therapy. From Figure 8, we found that the upper parts of limbs (e.g., shoulder, hip, and COM) had significantly larger amplitude values than did the lower parts of limbs (e.g., knee, ankle, and toes). This finding is similar to a previous study that showed that waist sway was more significant than leg sway (Dickstein and Abulaffio, 2000). From Table 4 and Figure 8, we found that as days from post-stroke onset decreased; the postural-sway amplitude increased in clusters 1, 3, and 4. However, the amplitude of cluster 2 did not follow this pattern, which may be due to age effects that there was a relationship between postural sway changing with age (Kim et al., 2010).
Frequency was not significantly different in the post-hoc test, but we found that the frequency of postural sway fell in 0.5-1.5 Hz for COM, and from cluster 3, CH subjects of lower age and lower days after onset kept their sway frequencies within 0.6-0.7 Hz. A previous researcher found that frequency of body sway fell in the range of 0.1-0.2 Hz (Koltermann et al., 2019), and because they also found that post-stroke increases sway frequency (Mizrahi et al., 1989), our finding was found to be reasonable.
Furthermore, in Table 6, we observed that CH-patient body postural sway gradually changed as days after onset increased (clusters 4 to 1). Meanwhile, CI-patient body postural sway did not show the same onset-days correlation. Another researcher found that CH patients made more significant recovery gains, although they had more excellent functional (motor) impairments (e.g., standing and walking) than CI patients. They also found that CH patients having the most severe disability improved more than those with CI of comparable severity (Kelly et al., 2003;Katrak et al., 2009). From this knowledge, we assumed that a CH onset-days correlation might emerge from their better recovery ability. This finding may give researchers and practitioners new ideas about sway-pattern changes during post-stroke patient rehabilitation. In addition, Table 6 shows that the group formed of CH3, CI1, CI2, and CI4 were clustered into different clusters. After observing their data plots and analyzing the patient demographic factors, we found they all had fewer days after onset (less than 180 days, subacute phases). During the subacute phases, the stroke patient recovers more noticably and is more unstable in muscle force than in the chronic phase (Kiran, 2012;Chow and Stokic, 2014). Hence, these patients exhibit different sway patterns, even over the same experiment duration, and data from the subacute phase patients is considered to be difficult to cluster. This hence might be reason why different slots for one person were clustered into different clusters.
This study has certain limitations. The first is that the number of subjects was only 10. With more subjects, more calculations and analyses could be performed. The second is that we only considered male patients. We plan to investigate more subjects, including female patients, and analyze their posturalsway characteristics in future research. The third is that we did not perform clustering for the healthy age-matched and young subjects. If we add such subjects, we could compared the postural sway among healthy human and stroke patients to determine which postural-sway characteristics are important, which will add meaning to the study.

CONCLUSION
This study evaluated the postural-sway features of post-stroke patients using a motion-capture system to collect standing posture data. After collecting stroke patients' multi-variable motion-capture standing posture data, we processed them into data slots of t seconds long. Subsequently, we determined the optimal length of the data slots and number of clusters, and clustered the t-s data slots into K cluster groups using the DTW-PAM method. Finally, to find the critical kinematic variables, we performed a post-hoc test. We found that the shoulder, hip, knee, and COM played essential roles in clustering, and the amplitude of the marker was more helpful than its frequency. Furthermore, we created a days-from-onset clustering table and a box plot of the shoulder, hip, knee, and COM variable amplitudes and frequency separately in ML direction using 5-s data slots. We found that as the days after onset increased, CH patients transited from cluster four to clusters 3, 2, and 1 of a four-cluster solution, whereas for CI patients, they did not show such pronounced transitions over time.
The above finding would provide researchers new ideas about sway-pattern changes for post-stroke patient rehabilitation. In the following research, we plan to increase the number of subjects.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Fujita Health University, the University Ethics Committee. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
DL conceived the ideas, designed the methodology, performed the programming, and wrote the paper. JO and KK supervised the paper and provided comments on the research directions. MM preformed the experiments and provided study materials. RC and KT reviewed and edited the paper. All authors have read and approve the final manuscript.

FUNDING
This work has been partially supported by JSPS KAKENHI 19H05730 and the Mohammed bin Salman Center for Future Science and Technology for Saudi-Japan Vision 2030 at The University of Tokyo (MbSC2030).