Item Difficulty of Fugl-Meyer Assessment for Upper Extremity in Persons With Chronic Stroke With Moderate-to-Severe Upper Limb Impairment

Background and Purpose: Limited research has been conducted with the aim of understanding which upper extremity movements are difficult for persons with severe chronic stroke. The purpose of this study was to test the structure of the Fugl-Meyer Assessment for Upper Extremity (FMA-UE) using Rasch analysis in persons with chronic stroke with moderate to severe deficits and to determine the item difficulty hierarchy. Methods: This was a secondary analysis of data from previous randomized, controlled trials, or clinical trials. The participants were 101 persons with chronic stroke with moderate to severe hemiparesis (time after onset of stroke, 1375.3 ± 1157.9 days; the 33-item FMA-UE, 31.1 ± 12.8). Principal component analysis and infit statistics were used to evaluate dimensionality. Rasch analysis using a rating scale model was performed, and item difficulty was determined. Results: Six misfit items were removed. The results showed that the 27-item FMA-UE was unidimensional. Rasch analysis showed that the movements performed within synergies were among the easiest items. Shoulder and elbow movements were among the easiest items, whereas forearm and wrist movements were among the moderately to most difficult items. Hand items spanned various difficulty levels. Discussion and Conclusions: The FMA-UE is a valid assessment tool of upper extremity motor function in persons with chronic stroke with moderate to severe deficits. The results showed that item difficulty was consistent with the stepwise recovery course proposed by Fugl-Meyer. The movements that are difficult for patients with moderate to severe chronic paresis were determined, which would enable comparison of each movement using a measure of motion difficulty in future studies.


INTRODUCTION
With advances in technology, severe chronic motor impairment of the upper limbs is now one of the major targets in stroke rehabilitation (1). Although motor recovery used to be thought to reach a plateau chronically in persons post-stroke, substantial motor improvement was shown after repetitive task training or constraint-induced movement therapy (2)(3)(4)(5). These training regimens have been developed based on the theory of motor learning (6)(7)(8), defined as the repetition-mediated increase in the speed and accuracy of a newly acquired motor behavior (9). With repetition of the selected behaviors, the highly stereotyped motor skill is finally acquired, and this process results in an expansion of neuron ensembles in the motor cortex (10). Technology-aided interventions, such as robotics and neuromuscular electrical stimulation, offer the opportunity of repetitive motor training for persons post-stroke with severe motor deficits. However, conflicting reports (1,4,11) shows lack of consensus regarding the effectiveness of proximal vs. distal or mono-vs. multijoint approaches. The lack of studies about key movements predicting motor recovery or response to interventions might contribute to this. Importantly, no studies have evaluated which upper extremity movements are difficult to perform for persons with severe chronic stroke. Clarification of these issues would facilitate the decision of where to set priorities in planning rehabilitation strategies.
The Fugl-Meyer Assessment (12) is the gold standard to assess motor function of post-stroke hemiparesis (13,14). The Fugl-Meyer Assessment for Upper Extremity (FMA-UE) has sound psychometric properties of reliability (15)(16)(17)(18)(19)(20), validity (15)(16)(17)20), and responsiveness (15,16,19,20). Each item consists of movements reflecting motor function in post-stroke hemiparesis, spanning from proximal to distal joints. Determining the item difficulty of the FMA-UE is not only useful for an accurate evaluation of upper extremity paresis, but it is also applicable to rehabilitation practice. Woodbury et al. analyzed the structure of the FMA-UE using Rasch analysis and reported the item difficulty hierarchy (21,22). However, the majority of the target population were persons with acute stroke with mild to moderate motor deficits. It is therefore necessary to test whether the item difficulty hierarchy is the same in persons with severe chronic stroke. Accordingly, the purpose of this study was to test the structure of the FMA-UE using Rasch analysis in persons with chronic stroke with moderate to severe upper extremity motor deficits, and to determine the item difficulty hierarchy.

Study Design
This study was a secondary analysis of data from previous randomized, controlled trials or clinical trials (23)(24)(25)(26). Outlines of each study, including ethical approval and clinical trial registration numbers, are provided as Supplementary Material. In this study, all persons who had participated in these trials at Keio University Hospital between April 2017 and June 2019 were included. For participants who were hospitalized several times during the study period, the assessment implemented on the first admission was used, excluding the data from the second and subsequent admissions. One of the authors, blinded to the interventions, extracted the data from the medical records of the participants.
This study was approved by the institutional ethics review board (20190144). The outline of the study was published on the public website, and the participants were guaranteed the right to refuse participation.

FMA-UE
The FMA-UE was used as an outcome measure in the clinical trials. The FMA-UE consists of 30 items assessing motor function and 3 items assessing reflex function. The score most applicable to task performance is given from "0, inability, " "1, beginning ability, " to "2, normal" (total score range, 0-66). Based on the standardized guideline developed by Platz et al. (27) the FMA-UE was administered by trained physiatrists before and after the treatment/intervention. This study used the pre-treatment/intervention data. The assessors were trained as follows: they were instructed to (a) review the standardized guideline developed by Platz et al. (27); (b) watch the training video developed by See et al. (19) (ArmFM_TrainingVideo); (c) watch the subject 1 video (ArmFM_TestSubject1) (19) and score the patient; (d) review the answer (ArmFM_AnswerKey_Subject1) (19); (e) repeat processes (c) and (d) for subject 2; (f) watch the assessment by an attending physiatrist with more than 10 years of experience and score the patient at the same time; (g) review the two scores and note scoring discrepancies; (h) repeat processes (f) and (g) until the score discrepancies become < 2, set below the MDC (3.2 points) (19) under which values could be regarded as measurement error); and (i) assess the patient using the FMA-UE and get feedback from the attending physiatrist (at least 3 times).

Participants
The participants were 101 persons with chronic stroke (time after onset of stroke, 1375.3 ± 1157.9 days). The participants' demographic characteristics are presented in Table 1. The Stroke Impairment Assessment Set (28) was used to assess motor and sensory impairment in the affected upper limb. The modified Ashworth scale (29) was used as a measure of resistance to passive movement. The severity of motor impairment for a paretic upper limb was evaluated using the 33-item FMA-UE. In this study, FMA-UE >45 was defined as mild, ≥30 and ≤45 was defined as moderate, and <30 was defined as severe, according to the previous study (1). Most of the participants (85%) were classified as moderate or severe, whereas a small number (15%) were classified as mild. Similar proportions of persons with different severities were included in a previous study (21), and therefore, it was decided to include mildly affected persons in the subsequent analysis.
Participants received hybrid assistive neuromuscular dynamic stimulation (HANDS) therapy (23) or participated in other randomized, controlled trials or clinical trials (24)(25)(26). The common characteristics of the population were as follows: the time from stroke onset was longer than 180 days; participants had the ability to walk independently with/without

Score Distribution and Local Independence
Before performing Rasch analysis, the distribution of each item score was overviewed and items for which all participants had the same scores were removed. Subsequently, the items were screened to determine whether they would violate the two assumptions of Rasch analysis: local independence and unidimensionality. Local independence means that an item being measured is independent of the performance (and score) of any other item. That is, a certain performance in one item should never lead to any other item score. Unidimensionality means that each item for a measurement scale measures only one construct, that is, motor function of a paretic upper extremity in this context.

Dimensionality
To evaluate dimensionality, principal component analysis (PCA) and infit statistics were used. These statistical methods are commonly used to test the dimensionality of upper extremity outcome measures (30). For PCA, if a measurement scale could measure only one construct, in this case, upper extremity motor function, then the variance (i.e., eigenvalue) explained by the first factor would be very large. In the present study, factors with eigenvalues > 1 were extracted. The percent of total variance accounted for by the first factor was assumed to be 20-40% (31). In the present study, 40% was considered acceptable. Factor loadings, the extent to which each item is related to (i.e., loads on) the factors, were determined. Fit statistics, calculated with Rasch analysis, are one of the most common indicators for testing the degree to which an item deviates from the assumption of unidimensionality (31). The values obtained from fit statistics are mean squares (MnSq) of residuals, the difference between observed scores for an item and expected values predicted by the model. High MnSq values indicate that the item does not fit the model. Infit statistics are less susceptible to outliers than outfit statistics. In the present study, infit MnSq values <1.7 were considered acceptable (31).

Rasch Analysis and Item Difficulty Hierarchy
After misfit items had been removed, another Rasch analysis was performed using a rating scale model (32). The rating scale model is an expanded model of the original dichotomous Rasch model and can be applied to measurement scales with multichromatic choices (e.g., 3-point scale of the FMA-UE) (30). This analysis to was used determine the item difficulty hierarchy, calibrated with logits. R (R Foundation for Statistical Computing, Vienna, Austria) was used for the statistical analysis.

Score Distribution and Local Independence
After the distribution of each item score had been overviewed, the following four items were omitted: "biceps reflex, " "triceps reflex, " "elbow flexion, " and "finger mass flexion." For these items, all participants had the same or similar scores (that is, all participants obtained the highest score for the two reflex items; for the other items, the majority of participants had the highest score, and none of them scored zero), so these four items could not be dealt with in the rating scale model. In addition, the item "normal reflex" was removed because it was only assessed when the previous three items received the highest possible score, which thus interfered with local independence. For the other items, local independence was assumed to be maintained. Consequently, five items were removed before PCA and infit statistics.

Dimensionality
The PCA identified five factors with eigenvalues > 1. The first factor accounted for 40.0% of the total variance, and the other four factors accounted for 13.4, 7.1, 4.9, and 3.8% of the variance, respectively. These results were then compared to those of previous studies (21,22). and it was concluded that unidimensionality was preserved. The infit statistics revealed that the "hook grasp" item exceeded the acceptable range ( Table 2). The infit statistics beyond the acceptable range made the item a candidate for removal, and the outfit statistics and factor loading were reviewed. This item showed abnormally high outfit statistics (outfit MnSq, 2.06; Table 2) (31). In addition, the factor loading value was not high (r = 0.46). These findings indicated that the "hook grasp" item was a misfit, so the item was removed from the subsequent analysis. With removal of this item, the Akaike Information Criterion decreased by 143.8, which indicated improvement of the goodness of fit. Finally, Rasch analysis of the 27-item FMA-UE was performed. The three reflexes, elbow flexion, finger mass flexion, and hook grasp were removed. *The item difficulty measures were based on the logit value indicating transition from inability to beginning ability; values were adjusted so that mean was 0 and standard deviation was 1.

Rasch Analysis and the Item Difficulty Hierarchy
The results of the Rasch analysis are shown in Table 3. The item difficulty measures, calibrated with logits, were adjusted (i.e., normalized) so that the mean was 0 and the standard deviation (SD) was 1. The error values were standard errors of the item difficulty measures obtained by dividing the raw error values by the SD of the item difficulty measures in the original scale. 's ability is between two white dots, and 2 if a participant's ability is on the right side of the right white dot. The item "finger mass extension" is highlighted, and the vertical line divides the participants, which suggests that over half of the participants were incapable of extending their fingers.

Synergies vs. Coordinated Voluntary Movements
The movements performed within flexor/extensor synergies were confined to the easiest items. In contrast, all coordination/speed items were at the most difficult levels. The next easiest item out of synergies was "shoulder flexion to 90 • with elbow extended."

Shoulder/Elbow/Forearm
For each joint movement, all the shoulder and elbow movements except for "shoulder flexion 90-180, elbow extended" were among the easiest items. Forearm movements were among the moderate difficult items, and alternating movements such as forearm pronation/supination were more difficult than stabilized movements.

Wrist
Wrist movements were among the moderate to most difficult items. The difficulty increased from stability, alternating movement, to circumduction, regardless of elbow position.

Hand
Finger movements spanned various difficulty levels. "Finger mass extension" was in the middle overall. The difficulty of each grasp increased from "cylindrical (the easiest), " "spherical, " "pincer, " to "thumb adduction (the most difficult)." The person-item map is presented in Figure 1. The person's ability is plotted as a histogram in the upper panel, and the item difficulty is plotted in the lower panel. The horizontal axis of the lower panel is a parameter of the item difficulty (calibrated with logits; values not normalized), with higher logits representing higher item difficulty. The horizontal axis of the upper panel is a parameter of the person's ability using the same scale as the parameter of item difficulty in the lower panel. The left white dot depicts the item difficulty measures, based on the logit value indicating transition from inability to beginning ability. Item scores are likely to be 0 if a participant's ability is on the left side of the left white dot, 1 if a participant's ability is between two white dots, and 2 if a participant's ability is on the right side of the right white dot. Figure 1 visualizes the distribution of persons capable of each upper extremity movement. For example, the item of "finger mass extension" is highlighted, and the vertical line divides the participants; this figure suggests that over half of the participants were incapable of extending their fingers.

DISCUSSION
In this study, the difficulty hierarchy of the FMA-UE was determined using Rasch analysis in persons with chronic stroke with moderate to severe upper extremity motor deficits. Rasch analysis has some advantages, such as the interval scale, item difficulty hierarchy, and unidimensionality (30). Rasch analysis enables determining the item difficulty hierarchy and comparing a person's ability with item difficulty using the equal interval scales, which can be helpful for setting appropriate rehabilitation goals targeted at the individual's ability. For example, Woodbury et al. generated the FMA-UE keyform recovery maps using Rasch analysis, thus translating a standardized measurement scale into a tool for designing treatment plans to provide optimally challenging tasks and progress task difficulty according to a person's ability (33). However, the item difficulty hierarchy of a measurement scale obtained using Rasch analysis changes across different target populations (30). The participants in this study had several different characteristics from those in the study by Woodbury et al. (21) one was the chronicity (time after stroke onset, 1375.3 ± 1157.9 vs. 16.9 ± 31.2 days), and another was the motor severity. This study, in which the 33-item FMA-UE was used, showed that the majority (85%) of the participants were persons with moderate to severe deficits. In contrast, Woodbury et al. (21) using the Orpington prognostic scale to define the severity of stroke, noted that the participants were predominantly persons with mild to moderate deficits, including only 10% with severe deficits. Although direct comparison is not possible because the definition of severity differs in the two studies, one finding suggests that the participants in the present study had more severe upper motor deficits than those in the previous study (21); the present study showed that over the half of the participants were incapable of finger extension, whereas the motion was easy for the participants in the study by Woodbury et al. (21). Thus, the present study was conducted on the assumption that the item difficulty hierarchy of the FMA-UE would differ from the previous study (22) in persons with chronic stroke with moderate to severe stroke.

Dimensionality
Unidimensionality, which is one of the assumptions of Rasch analysis, can also indicate the structural validity of a measurement scale; that is, only one construct is measured, which in this case was upper limb motor function. The present PCA and infit statistics results showed that the 27-item FMA-UE was unidimensional. The PCA showed that the percent of variance explained by the first factor was relatively low compared with the results reported by Woodbury et al. (21). Post-stroke hemiparesis is associated with increased spasticity, increased stiffness (and reduced compliance) of muscles, soft tissue contracture, reduced muscle strength, and maladapted synergy formation over time (34)(35)(36). Furthermore, persons with more severe paresis are reported to have a higher risk of developing spasticity (37). In fact, the percentage of the participants with MAS ≥1 in this study was higher than in the previous report by Wissel et al. (38). Similarly, abnormal upper limb synergy and compensatory movements are likely to be observed in moderately to severely impaired persons post-stroke (39,40). These alterations may have made the structure of the FMA-UE more variant.
"Hook grasp" was identified as a misfit in the present study, and this item was also erratic in the participants at 6 months post-stroke (22). The FMA-UE items were generated based on Brunnstrom motor testing. Brunnstrom (41) described the hook grasp as "holding onto the handles of a handbag placed in the hand, " which can be performed within flexor synergies, whereas Fugl-Meyer (12) defined this movement as "extending the metacarpophalangeal joints of digits II-V and flexing the proximal and distal interphalangeal joints, " which requires extensors and flexors individually. This modification possibly made the task different from the originally intended movement for the paretic hand, thus making the item deviate from the construct measured by the other items. However, it remains to be controversial whether the hook grasp reflects motor function of a paretic upper extremity, because no motor control theory has been provided to support this, as noted in a previous study (22).

Item Difficulty Hierarchy
In addition to the three reflex items that were also removed in the study by Woodbury et al. (21) "elbow flexion" and "finger mass flexion" were omitted in the present study. Almost all participants obtained the highest score possible for these items in the present study, which suggests that the reflex and synergistic movements were the easiest items. The present results are consistent with the stepwise recovery course proposed in Fugl-Meyer's original article (12). However, the studies by Woodbury et al. showed that the item difficulty order did not follow the expected stepwise sequence, with synergies and each joint movement spanning the difficulty hierarchy (21,22). As Woodbury et al. noted, the item difficulty hierarchy in persons with acute stroke with mild to moderate deficits would be arranged according to inherent task-specific demands of the movements (21). In contrast, the present findings suggest that the item difficulty in persons with chronic stroke with moderate to severe deficits would reflect synergies, as Fugl-Meyer had originally described (12). The difficulty hierarchy of grasp in the present study showed that gross movements using a whole hand (e.g., cylindrical and spherical) were easy, whereas coordinated movements using digits (e.g., pincer and thumb) were difficult, and these results were consistent with those of the previous studies, regardless of stroke chronicity (21,42). The difficulty of finger mass extension occurred between cylindrical and spherical grasp in the present study. These findings suggest that the greater the space for grasping an object, the more difficult the movement will be, because it requires the ability to extend the fingers.
Finally, the movements that are difficult for persons with moderate to severe chronic upper limb paresis were identified. Although a few previous studies reported key movements predicting motor recovery or response to interventions in persons with severe chronic stroke (43, 44), limited research has been conducted with the aim of understanding how difficult they were compared to other movements. The present study filled the gap among the previous studies and enables comparison of each movement of paretic upper limbs using a measure of motion difficulty. For example, "shoulder flexion to 90 • with elbow extended" was reported as a key movement (43). This item was among the easiest items next to synergies in the present study, and it might be a candidate initial target for stroke rehabilitation and for technology-aided/robotic therapy. We do not assume that this is an effective approach, which would require further investigations in clinical trials. Thus, the present findings provide an important piece of basic knowledge for rehabilitation targeted at persons with moderate to severe chronic stroke. This knowledge might help in selecting treatment targets, in which case using the 27-item FMA-UE might be beneficial. Creation of keyform recovery maps for persons with moderate to severe chronic stroke from these results would also be possible, but it requires further investigation in a population with a wider range of severity to ensure how far the difficulty hierarchy is maintained beyond the different upper limb functions.
The present study had several limitations. First, the cut-off values of severity for the FMA-UE were not established, and various values have been reported (45, 46). This study was designed around neurorehabilitation for persons with moderate to severe chronic stroke, so the cut-off values in a recent review of this field were used, and caution should be taken when using these results. MAS was also used to assess resistance to passive movement and future work could include a more specific measure of spasticity, such as the Tardieu scale, to account for this outcome (47). Second, although the FMA-UE includes multi-joint movements, not all combinations are assessed. The motion difficulty would change according to the positions of proximal joints, so care should be taken when interpreting the present results. Third, the participants were recruited in a single center, and the patient characteristics according to the institution affect the generalization of the results. For example, persons with apparent contracture on the paretic upper limb were excluded, so clinicians should be cautious when applying these results to persons with severe contractures, although this population is generally not likely to be eligible for upper limb motor rehabilitation. In addition, the item difficulty that matched the ability of the participants in this sample was estimated fairly accurately, but items that were too easy or too difficult for these participants were less accurately estimated; thus, the results of the present study cannot be applied to persons with the most severe or mildest deficits.

CONCLUSIONS
The FMA-UE is a valid assessment tool of upper extremity motor function in persons with chronic stroke with moderate to severe deficits. The present results showed that item difficulty was consistent with the stepwise recovery course proposed by Fugl-Meyer. The upper extremity movements that are difficult for patients with moderate to severe chronic paresis were determined, which would enable comparison of each movement using a measure of motion difficulty in future studies.

DATA AVAILABILITY STATEMENT
The data analyzed in this study is subject to the following licenses/restrictions: Anonymized data sets are stored by researchers in accordance with ethical committee approval requirements. Requests to access these datasets should be directed to Michiyuki Kawakami, michiyukikawakami@hotmail.com.

ETHICS STATEMENT
This study was approved by the institutional ethics review board (20190144). The outline of the study was published on the public website, and the participants were guaranteed the right to refuse participation.

AUTHOR CONTRIBUTIONS
NH and MK contributed to the study concept and design, data acquisition and analysis, data interpretation, and drafting of the manuscript. RI contributed to the study design, data acquisition and analysis, data interpretation, and drafting of the manuscript. KT and TN contributed to the data interpretation and data acquisition. KO and ML contributed to the data interpretation and editing of the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
This research was partially supported by a grant from the JSPS KAKENHI (Grant nos. JP18H03135 and 19K19886).