Onset Symptom Clusters in Multiple Sclerosis: Characteristics, Comorbidities, and Risk Factors

Background: Multiple sclerosis (MS) symptoms are expected to aggregate in specific patterns across different stages of the disease. Here, we studied the clustering of onset symptoms and examined their characteristics, comorbidity patterns and associations with potential risk factors. Methods: Data stem from the Swiss Multiple Sclerosis Registry, a prospective study including 2,063 participants by November 2019. MS onset symptoms were clustered using latent class analysis (LCA). The latent classes were further examined using information on socio-demographic characteristics, MS-related features, potential risk factors, and comorbid diseases. Results: The LCA model with six classes (frequencies ranging from 12 to 24%) was selected for further analyses. The latent classes comprised a multiple symptoms class with high probabilities across several symptoms, contrasting with two classes with solitary onset symptoms: vision problems and paresthesia. Two gait classes emerged between these extremes: the gait-balance class and the gait-paralysis class. The last class was the fatigue-weakness-class, also accompanied by depression symptoms, memory, and gastro-intestinal problems. There was a moderate variation by sex and by MS types. The multiple symptoms class yielded increased comorbidity with other autoimmune disorders. Similar to the fatigue-weakness class, the multiple symptoms class showed associations with angina, skin diseases, migraine, and lifetime prevalence of smoking. Mononucleosis was more frequently reported in the fatigue-weakness and the paresthesia class. Familial aggregation did not differ among the classes. Conclusions: Clustering of MS onset symptoms provides new perspectives on the heterogeneity of MS. The clusters comprise different potential risk factors and comorbidities. They point toward different risk mechanisms.


INTRODUCTION
Multiple sclerosis (MS) research has always faced the challenge of significant heterogeneity of phenotypes and variety of potential risk mechanisms. This applies to research both across and within the most common MS subtypes-the primary-progressive form (PPMS), the relapsing-remitting form (RRMS), and its secondary-progressive continuation, the SPMS. Heterogeneity became a salient topic in the 1990s on the basis of research on diverse pathogenic mechanisms in MS (1,2). Earlier epidemiological investigations had already documented sexspecific changes in MS prevalence (3), a readily observable indicator of risk heterogeneity. In the early 1980s, Canadian immunologists identified two clearly distinguishable MS types based on occurrence of past infectious events before the MS onset (4). More recently, machine learning algorithms have emerged as promising tools for building classifications on multiple characteristics (5). Last but not least, diseasemodifying treatments have provided additional corroborating evidence for the existence of heterogeneous types in MS by showing that immunomodulatory drugs have varying effects across patients (6).
In terms of methodology, appropriately assessing, and reproducing the heterogeneity of subtypes or-equivalentlyheterogenous patient subgroups in complex diseases is crucial (7), and MS can in fact be regarded as a complex disease (8). This study is a another effort in this vein: it focuses on the onset symptoms of MS in order to identify subgroups of persons sharing the same symptom configurations at the beginning of the disease.
MS comprises a broad spectrum of symptoms, including vision impairment, sensorimotor deficits, paralysis, dizziness, balance problems, spasms and pain, paresthesia, bladder, and intestinal dysfunction, as well as neurobehavioral, neuropsychiatric, and various further problems. So far, only a few studies have examined whether MS symptoms aggregate into specific patterns (9)(10)(11)(12). The focus of these studies was on the consequences and outcomes related to specific symptom clusters, independently of when the symptoms occurred during the disease course. Onset symptoms have largely escaped the attention of MS researchers who applied classification analyses.
Onset symptoms represent a potential link to processes that precede the clinical onset of MS. Some of these initial processes might emerge long before the first symptoms occur and include a variety of risk mechanisms, whereas other processes seem to occur closer to the clinical onset. Recent clues have come from research on the MS prodrome (13)(14)(15), which documented a more intense use of health services for mental and physical problems over several years before the clinical manifestation of MS. Research on the clinically isolated syndrome (CIS) (16) and the radiologically isolated syndrome (RIS) (17,18) has further corroborated this focus.
In sum, focusing on early stages of MS may foster knowledge on processes preceding and succeeding the clinical onset. In this study, we clustered MS onset symptoms reported by participants from the Swiss MS Registry (SMSR). In order to characterize the clusters, we examined their socio-demographic features, MS-specific characteristics and associations with potential risk factors, with preceding infectious diseases and with comorbid inflammatory diseases.

Study and Participants
The SMSR started in June 2016 (19,20) as a nationwide patientcentered longitudinal survey funded by the Swiss MS Society (see http://www.Clinical-Trials.gov identifier: NCT02980640). Participants in the SMSR are adults (≥18y) with a CIS or with an MS diagnosis, confirmed by their treating physician. A separate part of the SMSR is reserved for relatives and close friends. Participation of persons with MS (PwMS) is limited to those living in Switzerland or receiving care through the Swiss health system, and is based on informed consent.
All SMSR surveys were provided in the three national languages (German, French, or Italian) and were completed either through an online system or via paper-pencil versions. The participants entered the surveys by completing a short initial questionnaire followed by a comprehensive baseline questionnaire. Further questionnaires followed semi-annually and were confined to MS-specific subjects (diagnosis process (21,22), patient satisfaction (23), profession and job, depression, and nutrition.
Up to November 2019, a total of N = 2,159 participants were enroled in the SMSR. Data for 2,063 participants had been checked and were available for the analysis. Information on onset symptoms and other basic information (socio-demographic data, clinical MS type, time of diagnosis, familial aggregation, MSspecific therapies) was taken from the short initial questionnaire. Data on potential risk factors and comorbid diseases/disorders stem from the baseline questionnaire. Information on MS type was also updated using data from subsequent questionnaires.

Analyzed Variables
The onset symptoms covered in this analysis comprise vision, fatigue, speech, dysphagia, weakness, paralysis, paresthesia, dizziness, pain, gait, balance, bladder, spasms, tics, tremor, bowel, epilepsy, sexual problems, memory, and depression symptoms. For clustering analyses, the least frequent symptoms (<8% of the sample) were omitted. A sum variable was created to represent the symptom load.
The clinical MS type was defined by three categories, with CIS and relapsing-remitting MS in a composite RRMS category separate from the PPMS and the SPMS types. The age of onset was represented by two variables: either by the diagnosis date or based on first symptoms reported by the participant. In the latter case, missing age values and age values higher than those of diagnosis date were replaced by the age of recieving a diagnosis. In outliers (>3 standard deviations for the difference between the age of first symptoms and the age of diagnosis) the symptombased onset variable was set to missing. The disability status was represented by an expanded disability status scale (EDSS) proxy, i.e., a three-category variable based on walking distances, use of walking aids and use of a wheelchair, that was proposed by our group in a former study [for more details see (24)]. Further MS-related variables comprised the number of relapses, use of immunomodulatory therapies (current and lifetime), and current use of alternative medicine. Familial aggregation was defined as having any first-degree relatives with MS.
Health-related quality of life was assessed by a visual analog scale, which was used as a supplement to the European Quality of Life 5-Dimension Scale (EQ5d) (25). In addition, a screening instrument for depression, the WHO-5 Well-Being Index (26) was applied.
Sociodemographic variables included sex, birth year and age, education level (high school vs. lower level), nationality (Swiss vs. other). Potential risk factors that were assessed from the beginning of the SMSR comprised smoking (here dichotomized as lifetime smoker vs. other), alcohol consumption (daily/weekly, less frequent, never), and body mass index (BMI). Among comorbid diseases and disorders, only the most frequent ones were introduced in the analysis: • mononucleosis • angina/tonsillitis • skin diseases (acne, psoriasis) • herpes/fever blisters • cystitis • migraine • gastro-intestinal disorders (colitis ulcerosa, Crohn's disease, gastritis, irritable bowel syndrome) • atopic diseases (hay fever, asthma, eczema, food allergies) • drug allergy • other autoimmune disorders.
In all comorbid diseases and disorders, this is lifetime prevalence data. No information about the onset year was available. However, most of the listed conditions typically emerge before the age of onset of MS.

Statistical Analysis
The analysis design comprised four steps: descriptive statistics, clustering of onset symptoms, followed by the conventional design incorporating bivariate associations and multinomial regression analyses based on the symptom clusters. In the multinomial regression analysis, the variable representing the classes was regressed on a selection of potential risk factors and comorbidities that were significant or trend significant at the 5%level in bivariate association analyses. The clinical MS type was not included since we considered it as an outcome rather than as a predictor of the latent classes. Backward and forward selection outcomes were compared in order to confirm the results. The inclusion of a predictor at each step was based on p < 0.05, its exclusion on p > 0. 10.
MS onset symptoms were clustered using latent class analysis (LCA), which is a classification model like factor analysis or cluster analysis. In contrast to factor analysis, which is a variablecentered approach that places variables along dimensions or factors, LCA is a person-centered approach, i.e., it aims to group individuals into homogeneous classes (27,28). In LCA, the proportion of participants in each class is determined by class probabilities. Depending on the selection of variables, the classes in the LCA can be interpreted as representing subtypes of a disease.
Initially, LCA models with one to seven latent classes were routinely fitted to the data in order to determine the optimal number of latent classes in the final model. We considered several fit indices: Akaike information criterion (AIC), the Bayesian information criterion (BIC), and the sample-size adjusted BIC (ABIC) as well as the Lo-Mendell-Rubin likelihood ratio test (LMR-LRT) (29). Typically, we prefered models with a number of classes between the number suggested by the BIC and the number suggested by the AIC (30). Model selection was furthermore determined by the distinction between the classes, their size and their theoretical adequacy.
In analyses that focus on pattern recognition, either through a classification model such as an LCA or, implicitly, in analyses of groups of markers and disorders, we explicitly refrain from performing adjustments for multiple testing [see also (31)]. The analyses were conducted with Mplus (version 7 for Macintosh) and SPSS (version 23.0 for Macintosh).

RESULTS
The analysis is based on 2,063 persons with MS or CIS, of whom 1,503 were women (72. The onset symptoms are listed in Table 1. The most frequent symptoms (proportions >25%) were paresthesia, vision problems, fatigue, weakness, gait problems, balance problems, and paralysis. The LCA of onset symptoms (n = 1,942, 115 data points missing due to lacking information or to symptoms not included in the LCA) yielded a preferable solution with six classes. The choice was based on the BIC values (lowest value), the decelerated decline of the AIC and ABIC values and the interpretability of the latent classes (see model fits of 1-7 classes in Supplementary Table 1). To facilitate comparisons, the outcomes of the five and the seven class solution are also described below.
The probabilities of the onset symptoms per latent class are shown in Figures 1A-F. Separate analyses for men and women are documented in Supplementary Tables 2, 3, Supplementary Figures 1, 2). The results were almost perfectly comparable with six classes in the larger subsample of women and five classes in the smaller subsample of men.
On one side of the spectrum the multiple symptoms class (LC1, 14.1%) was located, with high probabilities across most of the onset symptoms. On the other side of the spectrum, there were the classes with a solitary onset symptom: LC5 (vision problems, 21.3%) and LC6 (paresthesia, 15.4%). Between these poles three further classes emerged, one of them related to gait problems in combination with balance problems and dizziness (LC2, 13.5%, gait-balance class), another class related to gait problems in combination with paralysis, weakness and spasms (LC4, 23.9%, gait-paralysis class), and a final class (LC3, 11.7%, fatigue-weakness class) that was characterized by weakness, fatigue, but also increased probabilities of dizziness, depression symptoms, memory, and gastro-intestinal problems. Differences between classes were also apparent in the average number of onset symptoms, which was about 9 in the multiple symptoms class, about 1.4 in the vision and paresthesia classes and about 4 in the other classes, but yielded no significant variation by sex or by MS type (results not shown).
When comparing the six class solution with the five and seven class solutions of the LCA, it turned out that the differentiation relates to gait problem classes. In the five class solution, there was only one gait problem class entailing both classes from the six class solution, gait-balance (LC2) and gait-paralysis (LC4), respectively. In the seven class model, the gait-paralysis class (LC4) divided in a class with less marked probabilities of gaitparalysis symptoms and a class with more pronounced and multiple symptoms, thereby incorporating also cases from the multiple symptoms class (LC1).

MS-Specific Characteristics
The differences by sex between classes reflected the key symptoms [see sex ratio in gait problems and in the gait-paralysis class (LC4)]. The same applies to differences by clinical MS type (i.e., PPMS vs. other; see gait problems in LC4 and in PPMS; see details in Tables 1, 2). Here and in other instances (age at onset, number of relapses, EDSS proxy), the gait-paralysis class (LC4) took up one extreme, contrasted either by the fatigue-weakness class (LC3), or the vision problems (LC5) and the paresthesia class (LC6). The overall consequences as measured by the EQ5d visual analog scale were most burdening in the multiple symptoms class (LC1), whereas the psychological consequences as represented by the WHO-5 Well-Being Index were most serious in the fatigueweakness class (LC3). In both instances, the paresthesia class (LC6) yielded the least burdening outcomes. Other MS-specific features, such as the use of immunomodulatory therapies and continuation of therapies, or familial aggregation, did not differ between classes.

Risk Factors and Comorbidities
The multiple symptoms class (LC1) stood out with regard to other potential risk factors. Notably, it was associated with a low education level (57.1 vs. 41-49% in other classes; see Table 2). Together with the fatigue-weakness class (LC3), it comprised a higher proportion of lifetime smokers (65%) than the other classes (50-58%).
These two classes repeatedly shared the highest proportions with respect to specific comorbid diseases and disorders (see Table 3): skin diseases, migraine, and-together with LC5 (vision problems)-cystitis, drug allergy, and angina/tonsillitis. Mononucleosis was most frequently reported in the fatigueweakness class (LC3) and the paresthesia class (LC6) (∼20 vs. 11-13% in other classes), whereas comorbidity with other autoimmune disorders appeared most frequently in the multiple symptoms class (LC1, 11.7 vs. 3-6% in other classes). Overall, the comorbidity patterns in the multiple symptoms and the fatigueweakness class (LC3) were contrasted by low comorbidities in the gait-problem classes.    Table 4 shows the results from the multinomial logistic regression analysis with the latent classes as the outcome variable. The fatigue-weakness class (LC3) was used as the reference category. The strong associations from the bivariate analyses were retained, whereas weaker associations relating to skin diseases and drug allergies were smoothed out. Forward and backward selection procedures yielded the same final model comprising sex, education level, and lifetime comorbidities with smoking, mononucleosis, angina/tonsillitis, migraine, and other autoimmune disorders as predictors.

DISCUSSION
This study is among the first to explore the heterogeneity of MS through clustering of onset symptoms. It identified six typical configurations (classes) of onset symptoms that are characteristic for different groups of PwMS: a multiple symptoms class with many onset symptoms, three classes with four or five symptoms on average (gait-paralysis, gait-balance, fatigue-weakness), and two solitary classes (vision problems, paresthesia). Each symptom can belong to two or more classes and therefore can have fairly different implications. Similarly, MS characteristics (for example, the clinical MS subtype), comorbidities (for example, migraine, other autoimmune diseases) and potential risk factors (for example, upper respiratory tract infections, smoking) are differentially related to specific classes.

Configurations of MS Onset Symptoms
The classes in this study represent typical configurations of MS onset symptoms. These configurations partly overlap with common theoretical assignments to dysfunction domains [e.g., motor, sensory, optic neuropathy, cerebellar/ataxia/brainstem (32)]; brainstem, sensory, bowel and bladder, cerebral, vision dysfunction in Tao et al. (33). This is underlined by the multiple symptoms class that was characterized by an overall increased symptom load. Other classes assembled fewer symptoms.
With four to five symptoms on average, the fatigue-weakness class aggregated also dizziness and neuropsychiatric symptoms (depression symptoms, memory) which might be indicative of limbic pathway lesions in MS (34) and might shed a new light on the phenomenon of isolated cognitive relapses (35). Interestingly, bowel problems also featured in this class, suggesting that gastro-intestinal inflammation might have some effect (36). Gait problems can be mainly assigned to two separate classes with four to five onset symptoms that relate to pyramidal symptoms (paralysis) and cerebellar dysfunction (balance, dizziness). Finally, the analysis revealed two monosymptomatic classes: the vision problems and the paresthesia class. Research has already pointed at such solitary onset symptoms by labeling them as monofocal (37), monoregional (38), or single-attack (39). The MS-related characteristics of the six classes comprised only slight dissimilarities between men and women, across the age of onset or between the general MS types (PPMS, RRMS, SPMS). In future, a better understanding of the classes will shed more light on these dissimilarities. Only marginal differences were found regarding familial aggregation and the use of immunomodulatory therapies.

Characterizing the Latent Classes
In the following, we aim at providing a more precise picture of each latent class (LC). The multiple symptoms LC obviously assembles the worst features of MS risk: many different onset symptoms, different potential risk factors. Nevertheless, it is a peculiar LC. While MS is traditionally considered to be more frequent in middle-and upper socio-economic classes than in lower socio-economic classes (70)(71)(72), the multiple symptoms LC seems to be the exception; lower education level, less frequently reported infectious mononucleosis [signifying an earlier age of childhood infections (73) such as EBV], and increased smoking prevalence (74) are typical features of lower socio-economic classes. These findings are largely congruent with the predictors of a high number of impaired functional domains found in the study of Briggs et al. (5). The increased prevalence of other autoimmune diseases is marked in this class and indicates a more generalized deficiency of the immune system that goes beyond a selective predisposition for MS. Smoking enhances the probability of the association between MS and other autoimmune diseases (75). The paresthesia and vision problems LCs show a contrasting picture. They are both associated with a higher educational level, a lower proportion of PPMS, fewer relapses than other LCs, and a lower age at onset. However, they differ intriguingly in the proportion of participants reporting mononucleosis. In comparison with the multiple symptoms class, it seems plausible that the pathogenic mechanisms in both monosymptomatic classes are limited or restricted in some way.
The fatigue-weakness LC was related to an increased rate of previous mononucleosis but also to conditions indicating upper respiratory tract inflammation (angina/tonsillitis and smoking) and possibly inflammation in the gut, suggested by bowel problems at onset and the trend association with drug allergy. The gait-paralysis LC has a less skewed sex ratio than other LCs, an older age of onset, a higher number of relapses and a higher proportion of PPMS. Despite the one-sided onset symptom profile, it shares these burdening features with the multiple symptoms LC. The profile of the gait-balance LC is similar but attenuated. In terms of comorbidities, both LCs with gait problems contrast with the fatigue-weakness and the multiple symptoms LC, with low proportions of angina/tonsillitis, drug allergy, and regarding LC2 also migraine and skin diseases.

Strengths and Weaknesses of This Study
This study benefited from the large number of participants in the SMSR and the comprehensive assessment of different characteristics of MS. The methodology used in this study enabled a fine-grained analysis of MS subtypes. Typically, associations that are specific for subtypes vanish in analyses of overall data, or cause chronically inconsistent research findings.
The price to pay for an LCA in this context is the lack of comparability with healthy persons or controls-only internal comparisons are available at first attempt. The current analysis was confined to variables assessed with the initial and baseline questionnaires of the SMSR. Thus, some MS characteristics and several potential risk factors were not available for this analysis, notably vitamin D (76,77), gastro-intestinal inflammation (36), traumatic experiences and stressful life periods (78). Moreover, no information about the onset year of any potential risk factor or comorbid condition was available.
For many PwMS, the beginning of pathological processes precedes onset symptoms for months or years. Thus, clinical onset symptoms might also include "later" symptoms and thus contain a certain amount of noisy information. Additional noise in the analyses emerged from the fact that we could not control for the age of onset of most conditions. We assume that they typically occur prior to the onset of MS. • IM is more frequent when the EBV infection occurs later than in childhood, i.e., in adolescence and adulthood (43) • EBV the most securely established risk factor in MS (41,44) • IM is per se an additional risk factor for MS (43) • Higher proportions of IM hint at delayed EBV infections • A delayed EBV infection with (and without) IM increases the MS risk • Subjects with a "resilient," i.e., well-trained and well-regulated immune system less frequently experience manifest outcomes of common infections (30,45), thus report also lower rates of mononucleosis (e.g., LC4 and LC2 members) Angina/Tonsillitis LC3 with increased proportion • Migraine (in particular migraine with aura) could lead to an increase of the BBB permeability (61) • Migraine could emerge in a pre-symptomatic MS phase (61) Skin diseases LC3 with increased proportion • Reported associations between MS and skin diseases relate to psoriasis (62-64) • Onset of psoriasis preceding MS onset yields a severity-response relationship (63) • Increased levels of TNF-α and IL17 in both diseases (63) Autoimmune diseases

LC1 with increased proportion
• Increased comorbidity with autoimmune diseases typically includes inflammatory bowel disease, thyroid disease, psoriasis (65)(66)(67)(68) • Comorbidity between RA and MS may be reduced (69) • The comorbidity with other autoimmune and chronical inflammatory diseases indicates a more generalized deficiency of the immune system This study shares further common limitations of studies based on self-reporting data. This includes various forms of recall bias and imprecise information contributing to more noise in the data.
Last but not least, classification models like the LCA come along with some specific problems. It is important to acknowledge that the potential number of classes in an LCA clearly depends on the sample size and the selection of variables introduced in the analysis. The model fit parameters rather help to identify the corresponding optimal range than to fix the exact number of classes. Therefore, the interpretability of the LCA outcomes plays also an important role.
A more specific concern relates to the fact that the LCA aims to group observations into homogeneous classes. Such a clear-cut delimitation of classes represents a rough simplification as is easily deducible from the multitude of onset symptom configurations in clinical practice. The simplification results in additional noise, which typically becomes apparent in subsequent analyses using the LCA outcomes.

CONCLUSIONS
MS can be differentiated into several clusters along onset symptoms, thus revealing a new perspective on the heterogeneity of MS. These clusters comprise slight differences regarding MS characteristics such as clinical MS types (PP-RR-, SPMS), sex ratios, or age at onset, but they strongly diverge with regard to potential risk factors and to comorbidities. The clusters open prospects for a better understanding of basic issues in MS, such as relations between onset and later symptoms, differences between MS types, and, last but not least, the dynamics behind the current increase of MS incidence and prevalence figures.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Committee Zurich (Study number PB-2016-00894). The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
VA-G designed and conducted the analysis and drafted the manuscript. NS assisted in the collection and maintenance of the data was involved in drafting of the manuscript. GH assisted in the collection and maintenance of the data and revised the manuscript for intellectual content. SR assisted in analysis of the data and revised the manuscript for intellectual content. MK assisted in the collection and maintenance of the data and revised the manuscript for intellectual content. YX assisted in interpretation of the analysis and revised the manuscript for intellectual content. CK assisted in the design of the study and revised the manuscript for intellectual content. JK assisted in the design of the study and revised the manuscript for intellectual content. Z-MM assisted in the design of the study and revised the manuscript for intellectual content. CZ assisted in the collection of the data and revised the manuscript for intellectual content. PC assisted in the design of the study and revised the manuscript for intellectual content. MP provided guidance on drafting of the manuscript and revised the manuscript for intellectual content. VW designed and conceptualized the study, provided guidance on interpretation of the data, and revised the manuscript for intellectual content. All authors contributed to the article and approved the submitted version.

FUNDING
The authors are grateful to the Swiss Multiple Sclerosis Society for funding the Swiss MS Registry and supporting this work.