Performance of Regression-Based Norms for Cognitive Functioning of Persons With Multiple Sclerosis in an Independent Sample

Background: Cognitive impairment is common in multiple sclerosis (MS). Interpretation of neuropsychological tests requires the use of normative data. Traditionally, normative data have been reported for discrete categories such as age. More recently continuous norms have been developed using multivariable regression equations that account for multiple demographic factors. Regression-based norms have been developed for use in the Canadian population for tests included in the MACFIMS and BICAMS test batteries. Establishing the generalizability of these norms is essential for application in clinical and research settings. Objectives: We aimed to (i) test the performance of previously published Canadian regression-based norms in an independently collected sample of Canadian healthy controls; (ii) compare the ability of Canadian and non-Canadian regression-based norms to discriminate between healthy controls and persons with MS; and (iii) develop regression-based norms for several cognitive tests drawn from batteries commonly used in MS that incorporated race/ethnicity in addition to age, education, and sex. Methods: We included 93 adults with MS and 96 healthy adults in this study, with a replication sample of 104 (MS) and 39 (healthy adults). Participants reported their sociodemographic characteristics, and each was administered the oral Symbol Digit Modalities Test (SDMT), the California Verbal Learning Test (CVLT-II), and the Brief Visuospatial Memory Test-Revised (BVMT-R). From the healthy control data, we developed regression-based norms incorporating race, age, education and sex. We then applied existing discrete norms and regression-based norms for the cognitive tests to the healthy controls, and generated z-scores which were compared using Spearman rank and concordance coefficients. We also used receiver operating characteristic (ROC) curves to compare the ability of each set of norms to discriminate between participants with and without MS. Within the MS samples we compared the ability of each set of norms to discriminate between differing levels of disability and employment status using relative efficiency. Results: When we applied the published regression norms to our healthy sample, impairment classification rates often differed substantially from expectations (7%), even when the norms were derived from a Canadian (Ontario) population. Most, but not all of the Spearman correlations between z-scores based on different existing published norms for the same cognitive test exceeded 0.90. However, concordance coefficients were often lower. All of the norms for the SDMT reliably discriminated between the MS and healthy control groups. In contrast, none of the norms for the CVLT-II or BVMT-R discriminated between the MS and healthy control groups. Within the MS population, the norms varied in their ability to discriminate between disability levels or employment status; locally developed norms for the SDMT and CVLT-II had the highest relative efficiency. Conclusion: Our findings emphasize the value of local norms when interpreting the results of cognitive tests and demonstrate the need to consider and assess the performance of regression-based norms developed in other populations when applying them to local populations, even when they are from the same country. Our findings also strongly suggest that the development of regression-based norms should involve larger, more diverse samples to ensure broad generalizability.

Background: Cognitive impairment is common in multiple sclerosis (MS). Interpretation of neuropsychological tests requires the use of normative data. Traditionally, normative data have been reported for discrete categories such as age. More recently continuous norms have been developed using multivariable regression equations that account for multiple demographic factors. Regression-based norms have been developed for use in the Canadian population for tests included in the MACFIMS and BICAMS test batteries. Establishing the generalizability of these norms is essential for application in clinical and research settings.
Objectives: We aimed to (i) test the performance of previously published Canadian regression-based norms in an independently collected sample of Canadian healthy controls; (ii) compare the ability of Canadian and non-Canadian regression-based norms to discriminate between healthy controls and persons with MS; and (iii) develop regression-based norms for several cognitive tests drawn from batteries commonly used in MS that incorporated race/ethnicity in addition to age, education, and sex.
Methods: We included 93 adults with MS and 96 healthy adults in this study, with a replication sample of 104 (MS) and 39 (healthy adults). Participants reported their sociodemographic characteristics, and each was administered the oral Symbol Digit Modalities Test (SDMT), the California Verbal Learning Test (CVLT-II), and the Brief Visuospatial Memory Test-Revised (BVMT-R). From the healthy control data, we developed regression-based norms incorporating race, age, education and sex. We then applied existing discrete norms and regression-based norms for the cognitive tests to the healthy controls, and generated z-scores which were compared using Spearman rank and concordance coefficients. We also used receiver operating characteristic (ROC) curves to compare the ability of each set of norms to discriminate between participants with and without MS. Within the MS samples we compared the ability of each set of norms to discriminate between differing levels of disability and employment status using relative efficiency.

INTRODUCTION
Over 40% of persons with multiple sclerosis (MS) are thought to experience cognitive impairment which adversely affects social participation, independence, and employment (1,2). Cognitive impairment at diagnosis has been found to be associated with disability progression over time (3). Neuropsychological assessments objectively evaluate cognitive function, and are increasingly important in the care of persons with MS patients, as new rehabilitative strategies and pharmacologic therapies for cognitive impairments continue to emerge. Given that access to comprehensive neuropsychological assessments is often limited, several abbreviated test batteries have been recommended for use in persons with MS, including the Brief International Cognitive Assessment for Multiple Sclerosis (BICAMS) (4). Brief Repeatable Battery of Neuropsychological Tests (BRB-N) (5), and the Minimal Assessment of Cognitive Function in MS (MACFIMS) (5,6). Interpretation of test results for both research and for clinical practice requires the use of normative data, although most available published normative data for these tests were developed in American populations. Application of American norms to Canadian populations is not recommended due to differences in performance between Canadian and American adults on measures of intellectual ability (7). Moreover, published norms were often established in samples that no longer reflect contemporary demographics; for example the proportion of individuals with higher levels of education was lower than in the present day population. Notably, Intelligence Quotient scores have risen over time (8), and use of outdated norms may lead to misclassification of cognitive status by underestimating the normal range of performance (9). In consideration of these issues, recommendations for international validation of the BICAMS were made to encourage its adoption (10).
Traditionally, normative data have been reported for discrete categories, such as age and/or education. More recently, continuous norms have been developed using multivariable regression equations that account for multiple demographic factors simultaneously. Regression-based norms for use in the Canadian population were recently developed for tests included in the MACFIMS battery (11), including the subset of tests included in BICAMS. Because these norms were derived from control populations recruited for other purposes, the number of participants available was fewer than the recommended 100 participants for some tests. In addition, while developed for use in Canada, the controls were drawn from only one region of Canada (i.e., province of Ontario), and the performance of these norms in an independently collected sample of healthy Canadian persons has not yet been assessed. Establishing the generalizability of norms is essential to determine if they may be appropriately applied in clinical and research settings more broadly than those from which the normative samples were drawn.
We sought to (i) test the performance of the previously published Canadian (Ontario) regression-based norms in independently collected samples of healthy controls from other Canadian regions; (ii) develop local regression-based norms for the tests included in the BICAMS; and (iii) examine differences in impairment classification rates in local healthy controls when applying BICAMS regression-based norms from different populations; and (iv) examine the ability of Canadian and non-Canadian norms to discriminate between local healthy and MS samples.

METHODS
We conducted the primary analysis using MS and healthy control samples from Manitoba, Canada. Manitoba is a central Canadian province with a population of ∼1.4 million people. We replicated our analyses in MS and healthy control samples from the eastern Canadian province of Nova Scotia (population ∼1.0 million), which are described further in the replication section.

Setting and Participants
In Manitoba, we enrolled a subgroup of persons with MS participating in a longitudinal study of immune-mediated inflammatory diseases (the "IMID" study) as previously described (12). Participants were recruited from the single specialized care center for persons with MS in the province. This subgroup of 111 participants attended an IMID study visit between September 2016 and July 2017 which included cognitive testing (13). MS participants were aged ≥18 years, with adequate knowledge of the English language to provide informed consent.
We enrolled healthy controls from September 2018 to September 2019. Inclusion criteria for study participation included aged ≥18 years, with adequate knowledge of the English language to provide informed consent. Exclusion criteria included any chronic medical condition, known cognitive impairment, any positive response to the Structured Clinical Interview for DSM-IV (SCID-IV) screening questions for depressive or anxiety disorders, any head injury associated with loss of consciousness or amnesia, or chronic medication use with the exceptions of contraceptives, hormone replacement therapy, transient antibiotic use, or multivitamins (14). Hypertension, as identified during the study visit (see below), was also an exclusion criterion even if not reported as a diagnosed condition by the participant. We recruited participants using multiple methods including posters placed in hospital, university, and community settings throughout Winnipeg; mail-outs of a study poster to homes in Winnipeg; and word of mouth. Sample size requirements for the development of regression-based norms are 2.5 to 5.5-fold smaller than for the development of discrete norms, while retaining similar or better precision (15), and samples of 100-500 persons are sufficient. Thus, our target sample size was 100.

Participant Characteristics
All participants, including those with MS and healthy participants, underwent standardized assessments and completed questionnaires (12). Participants reported their sociodemographic characteristics including sex, date of birth, ethnicity, years of education, and annual household income as described in detail previously (12). Participants also reported their smoking status; we classified participants who had smoked at least 100 cigarettes as ever smokers (16). We determined body mass index (BMI, kg/m 2 ) based on height and weight measured at the study visit. Only participants with MS underwent a neurological examination for calculation of the Expanded Disability Status Scale (EDSS) score by an EDSS-certified neurologist.

Neuropsychological Measures
We were primarily interested in the development of local regression-based norms to support an ongoing study examining the influence of vascular and psychiatric comorbidity on cognition in MS (13). The neuropsychological tests conducted examined cognitive domains most often affected in MS, and the comorbidities of interest (17,18) and included tests of information processing speed, verbal learning and memory, and visual learning and memory. From these tests we examined the test scores comprising the BICAMS, i.e., the oral Symbol Digit Modalities Test (SDMT) (19), the California Verbal Learning Test (CVLT-II; Trial 1-5 total recall score) (20), and the Brief Visuospatial Memory Test-Revised (BVMT-R; summed recall score for all three learning trials) (21). Each participant also completed the Wechsler Test of Adult Reading (WTAR) as an estimate of premorbid IQ.

Analyses
First, we summarized participant characteristics using descriptive characteristics including mean, standard deviation (SD), frequency and percent (%).
Second, to develop regression-based norms in our healthy control group we adapted the approach previously described by Berrigan et al. (22) Specifically, we converted raw scores to scaled scores with a mean of 10 and standard deviation (SD) of 3 based on the cumulative frequency distribution in our control group. Then, we developed a separate regression model for each test or subtest of interest, where the scaled test score was the dependent variable. To account for the bounded distribution of the scaled scores and ensure that predicted values did not fall outside the range of possible values, we used truncated rather than linear regression models. The independent variables were sex (coded as 1 = male, 2 = female), years of education (continuous), age (continuous), age-squared (continuous), and race/ethnicity (coded as 1 = white, 0 = non-white). We included an age-squared term to account for potential non-linear relationships (22). We included race/ethnicity given that cognitive tests may assess individuals of different racial backgrounds differently (23,24). We did not include estimated pre-morbid IQ as this variable was not included in the development of regression-based norms in MS. For consistency with published Canadian norms, we also report norms without this predictor, and in individuals aged 65 years and under. For each regression model we report the constant and non-standardized coefficients that generate the normative formulae. Model fit was assessed using a pseudo-R 2 calculated as the squared correlation of the observed and predicted values of the dependent variable (25). We assessed assumptions of homoscedasticity using the White test and residual plots, and assessed assumptions of normality using quantile-quantile plots.
Third, we applied previously published regression-based Canadian norms for the tests where available (11,22). Two sets of norms were available for the SDMT; we tested both the norms developed using only Ontario participants (11) and the norms developed using participants from Ontario and Nova Scotia (hereinafter Ontario/Nova Scotia) (22). Because these norms were developed in persons aged 18 to 65 years (Supplementary Table e1), and accordingly may not perform adequately in older participants, we excluded study participants over age 65 years when examining their performance. Z-scores of ≤-1.5 were classified as impaired. We expected that if the norms performed well, based on a normal distribution ∼7% of our healthy control sample would be classified as impaired on each test.
Fourth, we compared the Canadian regression based norms with non-Canadian regression based norms after applying the norms to generate z-scores. Other norms examined included regression-based norms developed in two other English-speaking populations [Buffalo, New York, United States (hereafter "New York"); Dublin, Ireland (hereafter "Ireland")] (26), the discrete norms available from the published test manuals for each test, and the recently published discrete norms for the SDMT by Strober et al. which were intended to update the previous discrete norms (27). We did not examine regression-based norms for BICAMS developed in non-English-speaking populations (28). The characteristics of the samples used to develop these norms are shown in Supplementary Table e1. For these comparisons, we examined the Spearman correlations between the z-scores. We considered correlations of ≤0.39 as low, 0.40-0.59 as moderate, 0.60-0.79 as strong, and ≥0.80 as very strong (29). Because Spearman correlations can establish whether the rank order of participant z-scores are the same, but not whether the same z-score values are assigned, we also examined the concordance coefficients (30). In order to assess the ability of the various norms to differentially discriminate between persons with MS and healthy individuals we compared the area under the receiver operating characteristic (ROC) curve between the various norms, using binary logistic regression, where the dependent variable was MS vs. healthy participant classification.
Given prior reports of an increased frequency of cognitive impairment in persons with MS at greater levels of disability, we examined the ability of each set of norms to discriminate between differing levels of neurologic disability amongst the MS sample (31). We categorized MS participants according to their EDSS scores into mild (0-2.5), moderate (3.0-4.0), and severe (≥4.5) disability groups. We also examined the ability of the norms to discriminate between employed and unemployed persons with MS, where employment status was determined based on the Work Productivity and Impairment Scale (32). Discriminating ability was examined using relative efficiency (RE), where the RE of each set of norms was calculated as the ratio of between group (3 EDSS levels; or 2 employment categories) ANOVA F-statistics. The largest F-statistic represents the greatest discriminative ability.

Replication
Data from an independent sample of MS participants and healthy controls, collected in Nova Scotia, Canada, were used to repeat the analyses comparing Canadian and non-Canadian regression-based norms, including correlations between the norms and their ability to discriminate between healthy and MS samples. These participants were enrolled in an ongoing longitudinal study of attention network functioning in MS and were recruited from the single specialized MS care center in that province. Unlike the Manitoba sample, these MS participants were selected to have an EDSS <4.5, with an age range from 20 to 60 years old. Exclusion criteria included insufficient visual acuity or impaired dexterity that would impede performance on cognitive tasks) or comorbid conditions that were likely to have a significant impact on their cognition (e.g., neurologic disorders other than MS, diagnosed learning disability, previous head injury with loss of consciousness, and sub-optimally managed psychiatric disorder as determined by clinic staff). As the independent Nova Scotia sample was selected to have no more than moderate levels of neurologic disability, only one participant fell within the "severe" EDSS category of >4.5 used in the previous analyses. Therefore, these participants were instead divided into only two categories: mild (0-2.5) and moderate (3.0-4.5). The data of 104 MS participants, tested between August 2016 and July 2018, were used in the current study replication. Healthy control participants (n = 39) recruited over this time period met the same exclusion criteria as the MS group but had no history or family history of MS and no history of psychiatric disorder; they were matched to the MS group based on age, years of education, and sex. Although all necessary cognitive measures were available in this dataset, several demographic variables were not collected: Ethnicity, annual household income, smoking status, and body mass index.

RESULTS
Throughout, we present the findings in Manitoba followed by the findings in the Nova Scotia replication sample. Of the 103 healthy participants from Manitoba, 96 were under age 65 years, and of 111 participants with MS, 93 were under age 65 years. The healthy participants were younger on average, but the age range of the healthy participants (18.2-64.4) was similar to that of the participants with MS (20.8-63.8) years. Most participants in each group were women, although the proportion who were women was higher in the MS group ( Table 1). The average number of years of education was consistent with at least some postsecondary education in both groups although the healthy control group averaged 2.4 more years of education than the MS group.  Race/ethnicity did not differ between the two groups, nor did estimated household income.
In the replication sample, most participants were also women, and the average number of years of education was consistent with at least some post-secondary education ( Table 1). Table 2 shows raw score to scaled score conversions used to develop the regression-based norms in healthy controls aged 65 years and younger in Manitoba. Table 3 shows the regression-based formulae with and without race as a covariate. The degree of variance in the cognitive tests explained by demographic factors varied slightly between tests.

Impairment Classification Rates
When we applied the published regression norms to the healthy Manitoba sample, the impairment classification rates often differed substantially from the expected rate of 7%, even when the norms were derived from another Canadian (Ontario) population. The exceptions for the SDMT were the regression-based norms from Ontario/Nova Scotia and New York; and for the CVLT were the regression-based norms from New York, and the discrete norms ( Figure 1A).
When the published regression norms and locally developed Manitoba norms were applied to the independent Nova Scotia healthy sample, impairment classification rates were lower and more often within the expected range based on a normal population distribution (i.e., 7%) ( Figure 1B). However, there were notable outliers: 30.8% and 28.2% of controls in the replication sample of healthy controls were impaired on the CVLT-II and BVMT-R, respectively, using the New York norms; 25.6% were impaired on BVMT-R using the Ontario norms; and 25.6% impaired on the BVMT-R using the discrete norms.

Correlations and Concordance Between Norms
In the Manitoba sample, most, but not all of the Spearman correlations between z-scores based on existing published norms  for the same cognitive test exceeded 0.90 (Table 4). However, concordance coefficients were often lower, ranging from 0.45 to 0.96 ( Table 4). The discrepancies between norms appeared to be greatest between the norms from Ireland as compared to all other norms. This pattern of high correlation coefficients, with the greatest discrepancies between the norms from Ireland and other norms, was replicated in the independent Nova Scotia sample (Supplementary Table e2). In addition, correlations between the locally developed Manitoba norms and all other norms showed the same pattern.

Ability of BICAMS Norms to Discriminate Between MS and Healthy Control Groups
All of the norms for the SDMT discriminated between the MS and healthy control groups, based on ROC analyses, but they differed in their ability to do so (Figure 2A) Similarly, based on ROC analyses of the independent Nova Scotia sample, all norms for the SDMT discriminated between the MS and healthy control groups, while none of the norms for the BVMT-R total recall discriminated between groups (Supplementary Figure e1). However, unlike the Manitoba sample, all norms for the CVLT-II verbal learning did discriminate between MS and healthy control groups.

Ability of Different Norms to Discriminate Between MS Participants With Differing Levels of Disability or Employment Status
We next examined whether application of the various norms influenced the extent to which the tests discriminated between differing levels of disability based on the EDSS, amongst individuals within the Manitoba MS cohort. For the SDMT, the Manitoba norms were best able to discriminate between disability groups ( Table 5). The relative efficiency (RE) for the Ontario and Strober norms exceeded 0.92 compared to the Manitoba norms but the remaining norms had substantially lower RE of 0.52-0.54. For the CVLT-II verbal learning, the Manitoba norms were again best able to discriminate between disability groups. The New York norms had a similar discriminating ability with a RE of 0.97. The remaining norms had lower RE of 0.36-0.69. For the BVMT-R total recall, the discrete norms had the best discriminating ability, while the New York norms had the lowest RE. Considering only the Manitoba norms, the BVMT-R best discriminated between differing disability levels, followed by the SDMT and CVLT-II. This same pattern was seen for the Ontario, Ireland and discrete norms from the manual, but not for the New York norms where the BVMT-R had the poorest discriminating ability.
Similar to the findings for disability, the various norms differed in their ability to discriminate between employed and unemployed participants with MS. For the SDMT, the Manitoba norms best discriminated between employed and unemployed participants. For the CVLT-II verbal learning, the Ontario norms were the best discriminator, followed closely by the Manitoba norms which were similar with a RE of 0.95. For the BVMT-R, the discrete norms from the manual discriminated best between employed and unemployed participants. Considering only the Manitoba norms, the BVMT-R discriminated better than the SDMT, followed by the CVLT-II. This pattern was consistent for the Ontario, Ireland, and discrete norms from the manual, but not for the New York norms where the BVMT-R had the poorest discriminating ability.
In the sample of 104 MS participants from Nova Scotia, for the SDMT, the Ireland norms were best able to discriminate between the two (i.e., mild vs. moderate) disability groups ( Table 5). The New York and Ontario/Nova Scotia norms had the next highest RE at 0.83 and 0.81, respectively. Regardless of the norms used, the CVLT-II verbal learning and BVMT-R total recall were unable to discriminate between mild vs. moderate disability groups. The Nova Scotia replication sample did not collect data regarding employment.

DISCUSSION
In this cross-sectional study, we applied a set of previously developed regression-based norms from Ontario, Canada for tests comprising the BICAMS, to an independently collected healthy sample from Manitoba, Canada to assess their generalizability. We also replicated our findings in a second, smaller, normative sample from Nova Scotia, Canada. In healthy controls, the rates of impairment differed from standard population expectations, sometimes being higher than expected and sometimes being lower. The application of regressionbased norms developed in other non-Canadian English-speaking populations also produced variable impairment rates that differed from expectations, as did the discrete norms from the test manuals. All of the norms differed in their ability to discriminate between MS and healthy populations from Manitoba, and between Manitobans with MS who had differing levels of disability or employment status. The local Manitoba norms generally had better discriminating ability in the Manitoba sample than other norms, but the CVLT-II and BVMT-R were still poor at discriminating between healthy participants and participants with MS. A prior report in a Belgian sample also found that the CVLT-II did not discriminate between persons with and without MS (33). Prior studies examining the sensitivity of neuropsychological tests suggest that the SDMT discriminates best between people with and without MS (34), and the SDMT is commonly found to be the test most associated with other clinically relevant factors (3). This high sensitivity of the SDMT to cognitive impairment in MS has been attributed to its assessment of commonly affected cognitive abilities including processing speed and working memory, as well as its requirements for efficient visual scanning and oculomotor functioning (27).
Overall, our findings indicate that using regional norms to interpret all BICAMS tasks is likely to be most informative. Spearman correlations between the different norms all exceeded 0.75 and most correlations exceeded 0.90. However, concordance coefficients were lower, indicating that while the norms rank ordered participants similarly, the absolute z-scores differed. Notably, in the Manitoba and Nova Scotia samples, concordance was lowest between the norms from Ireland and the other English language norms, which were developed in regions of Canada or the United States; potentially reflecting greater cultural differences between Ireland and North America than among North American regions for this verbal memory test. A prior study found that nationality influences performance on all three BICAMS tests, even after adjusting for age and years of education (35). That study highlighted the importance of considering both the language and culture of the individual being tested and called for additional studies across countries with common languages to address the potential influences of cultural factors. An approach by which BICAMS can be validated in other languages has been recommended (10) and a systematic review in 2018 reported on the performance of BICAMS as translated from English into 11 languages, following which performance was assessed (28). However, within countries, including Canada, where inhabitants may use one or more languages and/or are members of different cultural groups, there may be a need for particular effort to ensure appropriate norms are applied. In principle, clinicians, and researchers may choose to use discrete norms that are commercially available for the cognitive tests they employ, locally validated norms, or regression-based norms from other populations. For example, regression-based norms derived from a Canadian sample have been employed in Sweden, albeit modified to exclude educational level (36). A large multi-center trial of exercise and cognitive rehabilitation will be applying Dutch norms at the Denmark site (37). Notably, even when we employed only norms developed in other regions of Canada, our local norms, and discrete norms from the manuals for each test that are used in clinical practice, we observed meaningful variations in impairment classification rates and in the ability to discriminate between and within groups. This reflects the differences in the absolute zscores, as demonstrated by the lower concordance coefficients than correlation coefficients. These differences may reflect differences in the healthy populations enrolled, as well as differences in the approaches used to develop the norms. For example, Walker et al. used raw test scores in their regression models and did not incorporate a non-linear term for age (11), while Berrigan et al. used scaled scores and incorporated a non-linear term for age that reflected nonlinear findings reported in large samples (22). Our findings suggest that methodological issues such as these constitute an important component of the wide variation in the frequency of cognitive impairment reported in the MS literature [reviewed in Chiaravalloti and DeLuca (38)]. Differences in the ability to discriminate between healthy and MS groups, and between groups of persons with MS at differing levels of neurologic disability and employment status, also highlight how the use of different norms affects the identification of factors influencing cognitive outcomes.
Within the Manitoba healthy sample, the contribution of demographic characteristics to cognitive performance also varied across the three cognitive tests evaluated, with the variance explained ranging from 15 to 21%, consistent with prior reports (26). The poorer performance seen on the SDMT and BVMT-R with increasing age is consistent with prior reports in healthy populations (39,40). Sex was associated with performance of the CVLT-II, but not the SDMT or BVMT-R. One prior report suggested that the association of sex SDMT performance is only seen for the written version of this test, with women having better scores than men, whereas this is not the case of for the oral version used here and recommended for persons with MS (39). Education was not associated with cognitive performance, but most of our healthy sample was well-educated. Race predominantly contributed to performance on the SDMT in our sample although the association between race, ethnicity and performance of cognitive tests is wellrecognized (40).
Raw scores on cognitive tests have been demonstrated to have higher sensitivity than demographically-corrected scores for discriminating between persons with and without cognitive impairment, but demographically-corrected scores have higher specificity (41). Several options exist for demographically correcting scores. Discrete norms are easy to develop but require continuous variables such as age to be categorized. This creates somewhat arbitrary and discontinuous changes in expected performance for individuals at the boundaries of those categories and relatively large sample sizes are required to develop precise norms with smaller categories that address this issue (15). Regression-based norms have become popular because they do not categorize continuous variables, and the improved efficiency of estimation allows for the use of substantially smaller sample sizes while providing more precise estimates. For the BICAMS, the international validation standards recommend that the minimum sample size is 65 healthy volunteers, provided that they are group matched on demographics to an MS sample (10). Samples of ≥150 persons or more are encouraged for generalizability. We used linear regression models to develop our norms as is common in the literature. This approach is affected by whether model assumptions are met, and model assumptions were met in this study. Nonetheless, skewness may interfere with norm accuracy (42), and outliers may exert a substantial influence on the norms that are developed, particularly in smaller samples. Linear regression examines the relationship between the conditional mean of the dependent variable to the independent variables of interest, and assumes that this adequately represents relationships across the entire distribution of the dependent variable. Moreover, traditional linear regression does not account for the fact that cognitive tests typically have a limited range of scores and therefore, we employed a truncated regression model to account for this issue.
Limitations of this study should be recognized. To ensure comparability with existing Canadian-Ontario regression-based norms, we did not include participants over age 65 years. However, after restricting our analyses to persons who were aged ≤65 years, we had 96 participants for developing local norms in Manitoba. While this exceeds the minimum 65 persons recommended in the BICAMS international standards for validation (10), it is slightly <100 recommended based on simulation studies (15). Like the healthy samples used to develop regression-based norms for BICAMS that we evaluated here (Supplementary Table e1), our healthy sample predominantly included women (n = 32 men). Most of our study population were white, thus further work is needed to develop norms that account for the racial/ethnic diversity in Canada and elsewhere. This is particularly important as recognition grows of the burden of MS in populations traditionally considered to be at a lower risk of MS such as indigenous Canadians and African Americans (43,44). We did not capture acculturation which may also influence performance of norms (45). On average, the healthy control sample in Manitoba was younger than the MS sample, and more highly educated; differences in sex distribution were more modest as indicated by the standardized difference of <0.20. Norms should be applied cautiously in populations with different characteristics than those in whom they were developed due to limitations in generalizability, as illustrated by our findings. However, while the samples differed on average, the age and years of education distributions overlapped.
Regression-based norms have advantages over discrete norms. However, our findings emphasize the value of local norms when interpreting the findings of cognitive tests (46) and demonstrate the need to consider and assess the performance of regressionbased norms developed in other populations when applying them to local populations, even when they are from the same country. This is important to avoid misclassifying individuals as to whether they are cognitively impaired or unimpaired. Our findings also strongly suggest that the development of regressionbased norms should involve larger, more diverse samples to ensure broad generalizability. Specifically, greater representation is needed of men, individuals over age 65 years, and of varying racial, ethnic, and social backgrounds.

DATA AVAILABILITY STATEMENT
The datasets presented in this article are not readily available because some participants did not agree to data sharing; some data may be accessible to qualified investigators with the appropriate ethical approvals and data use agreements. Requests to access the datasets should be directed to Ruth Ann Marrie, rmarrie@hsc.mb.ca.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the University of Manitoba Health Research Ethics Board and the Nova Scotia Health Authority Research Ethics Board. The patients/participants provided their written informed consent to participate in this study.