Measuring outcomes of rehabilitation among the elderly—a feasibility study

A feasible system for measuring patient outcomes of rehabilitation is required for assessing the real-world cost-effectiveness of rehabilitation. This study aims to assess the feasibility of measuring outcomes of rehabilitation among elderly individuals with early-stage Alzheimer's. We used the principles of Design Science to construct a set of metrics consisting of standardized PROM (Patient-Reported Outcome Measure) questionnaires, clinician-reported measures, and observational measures of functioning. We used standardized questionnaires whenever possible to ensure the validity and reliability of the questionnaires. The set of metrics was piloted on 16 individuals living at home with regular home care services. After the pilot, we further refined the set of metrics based on relevance, sensitivity to change, and applicability. We found that measurement was feasible and we propose the final set of metrics as a minimum set, which could be further improved upon by addition of metrics relevant to each subgroup of elderly individuals. We also found that using self-reported questionnaires in this population is not without difficulties. We therefore suggest that the role of informal caregivers be considered, and that accessibility of outcome questionnaires be improved.


Introduction
Rehabilitation of the elderly is often an underused intervention that has potential to improve functioning, prevent falls, and delay the need for more intensive services (1)(2)(3)(4).From the point of view of the payer, cost-effectiveness of rehabilitation is key: whether rehabilitation can delay the need for hospice care long enough to offset the cost of rehabilitation.Calculating the costs is a relatively simple, while by no means negligible, task.What is missing from the equation, though, are the outcomes.
Much interest has been invested in measuring the outcomes of health care services in recent years and decades.Current consensus is that the set of measures should includewhenever possible-patient-reported measures to complement clinician-reported (ClinROM) and observational measures (5).Furthermore, a standardization of measures used is necessary for comparison (6).As the patient is often not the payer, it is also important to include the point-of-view of the payer, meaning costs and outcomes relevant to the payer (5).Furthermore, outcome data could be used in patient and intervention selection.
Another much-discussed aspect in healthcare is value-based purchasing.In theory, tying a part of the remuneration to outcome targets could incentivize the service producer to aim for best possible outcomes for the patient.Although there are many ways to design a value-based purchasing model (7) and it is unclear how exactly to design a successful one (8), a balanced set of measures is a necessary step.
Rehabilitation is a difficult setting for measuring outcomes, though, because the individual goals for each patient can vastly differ.The universal goal, according to the United Nations, is to enable persons with disabilities to "reach and maintain their optimal physical, sensory, intellectual, psychiatric and/or social functional levels, thus providing them with the tools to change their lives towards a higher level of independence" (9).WHO, on the other hand, emphasizes the integration of the individual and their environment (10), which is also reflected in their International Classification on Functioning (ICF) and its practical tools, ICF Core Sets.
Whether emphasis is on the individual or their surroundings or the interaction between the two, the ultimate goal of rehabilitation is to maximize the independence and autonomy of the individual.What differs from one individual to the other are the intermediary goals: what must happen for independence to be maximized.One way to quantify the attainment of individual goals in a comparable way is Goal Attainment Scaling (GAS) (11).
The outcome of rehabilitation of the elderly is often measured through mobility, which is a good proxy for Activities of Daily Living, ADL (12).A systematic review by Vermeulen et al. (13) found that a low ADL was preceded by weight loss, decrease in grip strength, and poor balance, among other things.ADL itself is also an often-used measure of rehabilitation outcomes in this population (1,3,13), as is the Functional Independence Measure (FIM) (1,12).
Much like PROMs, measures of patient experience (Patient-Reported Experience Measures, PREM) also range from the generic [such as Net Promoter Score (NPS), widely used across industries] to the specific (such as PREM-SBC for secondary breast cancer).The dimensions measured are often similar in the generic and specific measures, and usually include communication, professionalism, etc.No PREM questionnaires developed specifically for rehabilitation or physiotherapy have been translated into Finnish.
Overall, literature is ripe with outcome measures used in studying rehabilitation.However, what is necessary in a randomized trial may not be feasible in real life -that is, the measuring in a randomized trial is much more comprehensive than what the professionals have time for in real life.
As populations in developed countries are getting older, the prevalence of different forms of dementia are increasing (19).Over 46 million people worldwide live with dementia, and this has been estimated to increase to over 130 million by 2050 (19).The total estimated cost of dementia was 1 trillion $ in 2018, estimated to double by 2013 (19).Alzheimer's disease is by far the leading cause of dementia, causing at least 70% of dementia cases worldwide (20).
The cognitive decline in people with dementia necessitates that particular attention be paid to accessibility (21).It is therefore that we chose Alzheimer's patients as our target group: a measuring system that is feasible in this population will likely be feasible in the wider elderly population.
Our primary aim was to assess the feasibility of a system for measuring outcomes, with elderly, early-stage Alzheimer patients as our target group.Thus, we primarily aimed to identify the relevant and applicable outcome metrics among already standardized and validated instruments.We then assessed their sensitivity to change in order to trim down the set of metrics based on this information.

Materials and methods
This is a qualitative feasibility study, which utilizes Design Science to design a system of outcomes measurement.In the context of this study, we define feasibility as consisting of three critical aspects: relevance, sensitivity to change, and applicability.In terms of relevance, we focus on the point-of-view of the patient and the professional, while also incorporating the pointof-view of the payer.
Design Science is solution-oriented and directly tied to practice and deals with the understanding and improving of an artifact that is intended to solve a problem (22).The artifact constructed in this study is a system for measuring the outcomes of rehabilitation in our chosen segment.We chose Design Science as a method because of the nature of the problem at hand: we set out to construct a system, an artifact to fulfill a practical need, and Design Science is a research paradigm for doing precisely that.
Taking a Design Science approach often involves a review of existing literature, followed by repeating cycles of synthesis and evaluation (23).Figure 1 describes the steps taken in this study, highlighting the cyclical nature of the process.

Pre-pilot phase
In developing the measurement model (the set of metrics and their measurement interval), we utilized the principles outlined in the literature above.We referred to the ICF Rehabilitation Core Set (24), which includes 30 items, and the TOIMIA Functioning Measures Database (25) which, among other things, bridges measures to ICF.Based on this information, we calculated the coverage of each alternative PROM, ClinROM, and observational measure, and of combinations thereof.
For PROM selection, we referred to Finnish TOIMIA guidelines for measurement of self-reported functioning (26) for both which questionnaires to use, and the time points of measurement.We also had workshops of the preliminary sets of metrics with the physiotherapists and with the management of the physical therapy provider.
Due to the heterogeneity of elderly people in terms of both health and activity, we decided to focus on a specific group of elderly.As elderly with Alzheimer's disease are prone to a decrease in their physical activity, possible interventions should yield positive outcomes.Thus, using measures on a rather homogenous group of patients we could consider if our set of metrics could be implemented on a wider scale with elderly patients.
The inclusion and exclusion criteria were designed to include elderly patients with a mild cognitive impairment and an adequate level of physical functioning which enables living at home with home care.Thus, the inclusion criteria were: being at least 65 years of age, retired from work, a diagnosis of Alzheimer's disease (ICD-10 codes G30.1, G30.9), living at home, and able to walk for 10 meters (independently or with a mobility aid).The exclusion criteria were other reasons for dementia: e.g., Parkinson's disease, Lewy body dementia, head trauma, and early-onset Alzheimer's (G30.0).
Using these criteria, we recruited 19 patients undergoing an individualized three-month rehabilitation intervention which consisted of cognitive and physical exercise with a physiotherapist (PT).One of the patients passed away during the study, and two were unable to complete the study for other reasons.Thus, our research population consisted of 16 individuals.The patients were 78-92 years of age (with a mean age of 85), and 69% (11)   The steps of this study.

Post-pilot phase
After interventions and data gathering had completed, we interviewed both of the physiotherapists separately (semistructured interview and structured scoring of each metric).Each interview lasted about two hours and was conducted by two researhers, LP and JN.To increase the rigor of the qualitative part of the study, we asked the physiotherapists to score each questionnaire used.The assessment questions were formulated based on literature (27,28) on patient experience of PROM usage-modified for professionals filling out the questionnaires in collaboration with the patient.The dimensions of metric assessment were relevance, understandability, usefulness, and overall usability.The questions used were as follows: 1. Were the questions relevant?2. Were the questions understandable?3. Were the questions useful (i.e., did they bring out topics that would not have been discussed otherwise?)? 4. Overall, would you use this questionnaire in the future?
The physiotherapists were asked to score each questionnaire on each dimension on a scale of 1 to 5.
We then had a workshop with the physiotherapists and with the management to finalize the set of metrics.

Pre-pilot phase
We chose to adhere to standardized questionnaires to ensure the reliability and validity of the metrics.
In the TOIMIA guidelines (26), a measurement interval of three months was recommended, which coincided with the planned duration of the intervention.When choosing the measures used, we strived to minimize the amount of additional work for all stakeholders, especially the professionals.We therefore decided to include ADL, measured through the Barthel Index (29), as a ClinROM, since it is a part of RAI [Resident Assessment Instrument (30)] and therefore widely available.Similarly, for observational measures, we chose SPPB [Short Physical Performance Battery (31)] and Grip Strength, both already used by physiotherapists.
We also discussed questionnaire choice with the physiotherapists and agreed that while all three were somewhat negatively worded (for example, in WHODAS 2.0, all items are formulated as "In the past 30 days, how much difficulty did you have in: -", thus emphasizing difficulties), PROMIS-10 was the least negative.Based on this, and the coverages mentioned above, we chose PROMIS-10, along with PASS (Patient Acceptable Symptom State) and GRC (Global Rate of Change), as recommended by TOIMIA (26).
For patient satisfaction measurement, we chose NPS (Net Promoter Score) for easy comparison, and additionally modified a short survey based on the patient satisfaction survey used by The Finnish Institute for Health and Welfare in their biennial national FinSote survey (32).
To measure obtained goals, GAS (GAS 1) (33) was chosen, as it is often used in Finland, including in rehabilitation.
Table 1 lists the metrics used in the pilot, and whether measurements were done both before and after the intervention, or only after.

Post-pilot phase
The idea was that the patients were to have a mild cognitive impairment only, so as to be able to fill out questionnaires independently.However, they turned out to be more cognitively impaired than expected.Thus, they were not able to fill out questionnaires by themselves, but required the help of the physiotherapist.Also, for the most part, the patients were unable to reflect on their symptoms for the last 7 days (or however many days the recall period of each questionnaire was), but rather tended to answer based on their experience on that particular day.Furthermore, many found the questionnaires too difficult and too complex in their wording and had trouble understanding the 5-point Likert scale used in many of the questionnaires.Some patients had a relative present, and the relative sometimes divulged information that conflicted the information divulged by the patient.
Based on the pilot data and the post-pilot interviews, Table 2 displays the fraction of patients with a change detected per metric (reflecting the sensitivity), and their average score for relevance and applicability (arithmetic average of the four scores given by the two physiotherapists; in total, eight scores per metric).The final set of metrics was chosen based on the scores in Table 2, and interview data was utilized to form the rationale for inclusion.Based on the scores, we decided to leave out three metrics.PASS turned out to be too crude a metric (scale "yes" or "no"), especially for the cognitively impaired, whose answers are more likely to reflect their general attitude towards their functioning than any changes in it.The cognition-related PROM, while specifically aimed at cognitively impaired persons, seems unreliable-it might be useful if filled out by relatives of the patient.Pain is included in most generic PROMs and thus is redundant in this set of metrics-furthermore, cognitively impaired persons seem prone to answer using the extremes of the scale (9/12 gave either 0 or 10 pre-intervention, 10/15 postintervention).
Table 2 shows which metrics were included in the final set of metrics, with comments on each metric.

Discussion
We aimed to assess the feasibility of a system for measuring patient outcomes in rehabilitation of early-stage Alzheimer's patients, and to design a set of metrics through an iterative process.
We were able to design a set of metrics which adheres to the critical aspects defined earlier: relevance, sensitivity for change, and applicability.The set of metrics is presented in Table 2. To the best of our knowledge, such a set of metrics for measurement of realeffectiveness of rehabilitation has not been published before.The ICHOM Older Person Standard Set (34) is somewhat similar, yet it does not focus specifically on the outcomes of rehabilitation and is thus too generic.The set we present is proposed as a minimum set, which could be further improved upon by the addition of some items discussed below.Furthermore, cost per patient is also needed to assess cost-effectiveness, which is essential for the payer.
We found that measurement was feasible.Despite the aforementioned problems, we were able to conclude a few things about which metrics seem to work for this population and which do not.
It would seem that patients with cognitive impairment, even if it is mild, need simpler questionnaires.This is congruent with the findings of Kramer & Schwartz (35), who suggest that the field of rehabilitation could benefit from considering the cognitive accessibility of PROM use.In practice, we suggest that both the questions and the multiple choice options need to be simpler.One way to achieve this could be by using a visual scale.
One barrier to adaptation of PROMs in routine use is patient burden.In the field of maternity care, a systematic review by Chen et al. (36) found that one factor hindering the acceptability of PROM collection was that the patients were concerned about the length of the questionnaires and the inadequacy of instructions.They offered suggestions for increasing response rate, such as explaining the purpose of the questionnaires to the patients and giving clear instructions, which could be applicable in the elderly population as well.In this study, physiotherapists collected PROMs through interviews, and thus we cannot comment on patient burden as such.We did strive to minimize patient burden by trimming down the set of metrics to a minimum, but real-life response rate remains to be seen.People with Alzheimer's disease tend to have difficulties with abstract thinking and understanding of text, which are both essential for filling out questionnaires.Assistance and explanation with questions from different surveys were needed from a physiotherapist and/or a relative.Including observational measures in our set of metrics helps attain a more comprehensive view of changes in patients' well-being.This reduces reliance on one source of information and the impact of cognitive abilities of the patient.On the other hand, using a more heterogeneous group of elderly patients with a wider range of cognitive abilities would reduce the problem of written questionnaires.Nonetheless, using a questionnaire tailored for elderly patients with cognitive impairments should be further studied.
Also, we found that for many of the patients the most important outcome was an increase in mobility and specifically confidence in leaving one's home.Future studies could benefit from questionnaires measuring this phenomenon.CONFbal is a questionnaire which measures balance confidence (37), which could be one component of the phenomenon.However, we suggest that confidence in leaving one's home is about more than just balance, particularly for cognitively impaired individuals.
While some metric of personal goals and their attainment is needed, GAS is highly person-dependent and thus gives uneven results.Some service providers also consider it "too unwieldy for routine use" (38).On the other hand, some studies have found GAS to be more responsive to change than standardized outcome measures: Stole and Rockwood have studied this both in a geriatric setting (39) and in a cognitive rehabilitation setting (40).For lack of a better measure, we decided to include GAS in the final set of measures.However, we believe that standardization of goals and attainment levels, such as that developed by the Geriatric Assessment and Rehabilitation Unit some 25 years ago (38), could improve its uniformity in the future.
It should be noted that the changes mentioned in Table 2 can be small depending on the sensitivity of the metric.Therefore, the fraction of patients with change detected probably reflects two things: the sensitivity of the metric and the suitability of the metric for this particular population and context.For example, PROMIS-10 uses a scale of 0-100, while PASS uses a scale of 0-1, which may partly affect the sensitivity of the questionnaire.In future studies, we suggest that clinically meaningful changes should be assessed and taken into consideration.Furthermore, in a cognitively intact population the QRC question could be used to estimate the meaningfulness of change much like Osoba et al. (41) have used Subjective Significance Questionnaire to assess the subjective significance of changes in PROMs.
The physiotherapists involved in this study reported that there was sometimes a discrepancy between the views of the patient and their relatives regarding the health and well-being of the patient.This is probably attributable in part to the cognitive deterioration of the patients, but the physiotherapists also noted that sometimes the relatives' judgement can be affected by their beliefs and attitudes: their view (often more negative than that of the patient) is not necessarily correct.
Relatives could also take a bigger role in helping patients fill out questionnaires.In this study the questionnaires were filled out with the help of the physiotherapist.In real life relatives could fill this role.However, as mentioned in the preceding paragraph the possible bias of the relative may affect results, which must be considered.
Overall, the pilot results presented many surprises.The usefulness of the metrics and the feasibility of measuring were affected by many unforeseen details.We feel that this emphasizes the importance of a feasibility study, particularly in populations of compromised functioning: without piloting the questionnaires it is impossible to know what will work.
Outcomes data can be used in estimating the real-effectiveness (42, 43) of a rehabilitation intervention when benchmarking between providers.However, our study did not aim to provide outcomes data to assess the effectiveness of the intervention.
The set of metrics we present includes assessment of functioning by both the patient and the professional.The scoping review by Ravn et al. (44) analyzes 26 trials, of which 10 measured either ADL of QoL, 8 measured both, and 8 measured neither.In the 2023 Cochrane review on cognitive rehabilitation for people with dementia (45) 5 out of 6 trials measured quality of life.It seems that while ADL and QoL are often-used outcome metrics in studies on this subject, there is still a lack of comprehensive measuring of outcomes from multiple points-of-view.A set of metrics comprising different points-of-view could improve the evidence base of rehabilitation of the elderly.

Limitations of the study
We aimed to measure MMSE (Mini Mental State Examination) in addition to the Finnish questionnaire on memory, so we could compare the patient-reported score with the MMSE, which is clinician-reported.However, due to human error (communication breakdown between researchers and the physical therapy provider), MMSE was not measured.
Furthermore, as only 7 out of 16 patients had finalized GAS forms, we decided to disregard these in the analysis.Due in part to the untimely passing of one of the physiotherapists during the study, the reasons for the incomplete filling out of GAS questionnaires remain unclear.
Notably, for some of the patients rehabilitation began in the winter months and ended in the spring, while for others rehabilitation began in the early summer and ended in late summer.In Finland, weather conditions vary greatly between the seasons: in the winter, streets can be icy and slippery, rendering many elderly people unable to leave their home on their own.Spring, on the other hand, is ideal for outdoor activity.Likewise, summer can be quite hot at times and as Finnish homes rarely have air conditioning many elderly people suffer from the heat.Thus, the timing of the rehabilitation intervention could affect the results.
The relatively advanced state of cognitive deterioration among the patients meant that they had difficulty recalling their symptom state for the last 7 days as required by some of the questionnaires used.Thus, the results may indicate the difference between the situation on a particular day rather than a true change in their symptom state within those three months.Therefore, random factors may have had a greater effect than they would among cognitively sound individuals.
As patients filled out the questionnaires with the help of their physiotherapist-who was also the provider of the service being evaluated-there may be some bias in the results, especially regarding customer satisfaction.

Conclusions
We found that measuring the outcomes was feasible.We were able to design a set of metrics which is relevant, sensitive for change, and applicable.The iterative process we employed provided results quite quickly.A similar process could also be useful in developing other such sets of metrics, such as metrics used in national quality registers.
Feasibility is ensured by using a concise set of metrics devoid of overlapping items.Using the ICF Core Set allowed us to maximize coverage while minimizing overlap.
A feasible system for measuring patient outcomes of rehabilitation is a step towards assessing the real-world cost-effectiveness of rehabilitation.This, in turn, could be used to optimize rehabilitation use in order to achieve better quality of life for elderly patients, and lower total cost for the payer.It could also be a step forward in outcomes-based payment systems.
Future studies could benefit from the inclusion of informal caregivers as a source of information regarding patients' health and wellbeing.This can be achieved using standardized questionnaires and/or conducting interviews.Some quality-of-life measures offer different versions of the questionnaire for the informal caregivers.
were female.Patient recruitment and interventions commenced in early 2022.Interventions were conducted by two physiotherapists with 5 and 25 years of experience in rehabilitation of the elderly.Interventions completed in August 2022.Measurements were conducted both before and after the intervention.

TABLE 1
The pilot set of metrics.

TABLE 2
Assessment of the suitability of the metrics.