Potential Fluid Biomarkers and a Prediction Model for Better Recognition Between Multiple System Atrophy-Cerebellar Type and Spinocerebellar Ataxia

Objective This study screened potential fluid biomarkers and developed a prediction model based on the easily obtained information at initial inspection to identify ataxia patients more likely to have multiple system atrophy-cerebellar type (MSA-C). Methods We established a retrospective cohort with 125 ataxia patients from southwest China between April 2018 and June 2020. Demographic and laboratory variables obtained at the time of hospital admission were screened using Least Absolute Shrinkage and Selection Operator (LASSO) regression and logistic regression to construct a diagnosis score. The receiver operating characteristic (ROC) and decision curve analyses were performed to assess the accuracy and net benefit of the model. Also, independent validation using 25 additional ataxia patients was carried out to verify the model efficiency. Then the model was translated into a visual and operable web application using the R studio and Shiny package. Results From 47 indicators, five variables were selected and integrated into the prediction model, including the age of onset (AO), direct bilirubin (DBIL), aspartate aminotransferase (AST), eGFR, and synuclein-alpha. The prediction model exhibited an area under the curve (AUC) of 0.929 for the training cohort and an AUC of 0.917 for the testing cohort. The decision curve analysis (DCA) plot displayed a good net benefit for this model, and external validation confirmed its reliability. The model also was translated into a web application that is freely available to the public. Conclusion The prediction model that was developed based on laboratory and demographic variables obtained from ataxia patients at admission to the hospital might help improve the ability to differentiate MSA-C from spinocerebellar ataxia clinically.


INTRODUCTION
Multiple system atrophy (MSA) is a sporadic and continuously progressive neurodegenerative disorder (Gilman et al., 2008). MSA includes two primary subtypes, predominant parkinsonism (MSA-P) and cerebellar ataxia (MSA-C), of which MSA-C is the most common subtype in the East-Asian population (Watanabe et al., 2002;Gilman et al., 2005;Yabe et al., 2006). Presently, there is no effective treatment for MSA-C, but clinical intervention in the early stages of the disease might improve patients' quality of life and prolong their survival (Klockgether et al., 1998;Wenning et al., 2013;Jacobi et al., 2015;Fanciulli et al., 2019). Therefore, early diagnosis of MSA-C is the central focus of current research.
No specific and objective biomarkers are known for MSA-C. Disease history, clinical manifestations, neurological examinations, and some neuroimaging features are currently common methodologies used to diagnose MSA-C. However, due to individual patient differences and the disease stage, it is typically challenging to diagnose MSA-C accurately based on these conventional characteristics, and it is easy to confuse MSA-C with other ataxia diseases, specifically hereditary spinocerebellar ataxia (SCA) (Palma et al., 2018). Therefore, objective biomarkers properly useful for distinguishing between these two diseases would be of great help when initial clinical features are similar. Currently, numerous studies have focused on identifying candidate disease biomarkers for MSA-C from cerebrospinal fluid (CSF) and peripheral blood (Jellinger, 2017). CSF is an ideal biological sample because it is more likely to reflect specific neurophysiological changes, but it must be obtained through invasive surgery (lumbar puncture). On the other hand, peripheral blood is safer and easier to obtain. The various biomarkers in the blood including proteins, lipids, and many other metabolites could serve as potential diagnostic and prognostic markers for the disease.
The liquid biomarkers selected in our study were mainly divided into two groups. One group is related metabolic indicators which are actually clinical basic indicators routinely tested for diagnostic use. Previous studies have shown that abnormal metabolites change may exist in neurodegenerative diseases including Alzheimer's disease (AD), Parkinson's disease (PD), as well as MSA (Zhou et al., 2016;Nam et al., 2018;Takae et al., 2018;Nho et al., 2019). Notably, several studies have shown that the levels of metabolic related markers including uric acid (URIC) and homocysteine are aberrant in MSA patients (Lee et al., 2011;Chen et al., 2015;Zhou et al., 2016). Therefore, the screening of those markers reflecting the metabolic status of patients which are also widely available in clinical laboratories may provide potential clues for diagnosis and pathogenesis study of MSA. The other group includes proteins that are associated with inflammation, neurodegeneration, regeneration, and so on. Previous studies have indicated that the glial inflammation may play a role in MSA disease progression (Yokoyama et al., 2007). A study showed CSF cytokine/chemokine/growth factor profiles in MSA-C and SCA in which pro-inflammatory cytokines like IL-6, GM-CSF, and MCP-1 displayed specific correlation with the disease stage in MSA-C (Yamasaki et al., 2017). Besides, several proteins including calbindin D, amyloid precursor protein (APP), S100B, and synuclein-alpha (α-synuclein) have been ascertained in neurodegenerative diseases such as AD, Huntington's disease (HD), multiple sclerosis, and MSA (Steiner et al., 2011;Stefanits et al., 2014;van Waalwijk van Doorn et al., 2016;Mavroudis et al., 2020). Meanwhile, the investigation of other proteins such as carbonic anhydrase, CD117/c-kit, proganulin, and kallikreins which may play roles in neural circuit development and maintenance, stress response, innate immunity, and aging as well as brain innate immunity may open a new avenue for the study of MSA (Greco et al., 2012;Dukic et al., 2016;Chitramuthu et al., 2017;Gennarini et al., 2017;Hsieh et al., 2019).
Despite the continuous exploration of specific biomarkers, recent efforts have been made on establishing clinical prediction models integrating demographic characteristics, clinical variables, and laboratory indicators for improving the diagnosis or predicting survival prognosis of neurological diseases with an output of quantitative risk estimate using limited number of relatively objective predictors. Therefore, we screened potential fluid biomarkers of MSA-C and combined mainly demographics characteristics to establish a clinical prediction model to improve the early identification and diagnosis of MSA-C.

Participants
Seventy-nine MSA-C patients and 46 hereditary ataxia patients were enrolled in the Department of Neurology, West China Hospital, Sichuan University, between April 2018 and June 2020. The MSA-C patients were assessed and defined based on the second consensus statement on the diagnosis of MSA, which is universally adopted (Gilman et al., 2008). Briefly, the MSA-C patients exhibited specific features: (1) sporadic, progressive, adult-onset disease signs (age > 30 years) with predominant cerebellar syndromes, including gait ataxia, dysarthria, limb ataxia, or cerebellar oculomotor dysfunction; (2) autonomic failure involving urinary incontinence, erectile dysfunction and orthostatic hypotension, or parkinsonism with a poor levodopa response; and (3) no common genetic diagnosis of hereditary ataxia. The patients diagnosed with hereditary ataxia were assessed based on the diagnostic criteria associated with SCA (Muzaimi et al., 2004;Klockgether et al., 2019). The diagnostic guidelines for hereditary SCA included (1) onset of symptoms that occurred in patients older than 18 and presented predominantly progressive cerebellar ataxia with a disease duration longer than 1 year; and (2) cases with a family history of the presence of a similar disorder, and after passing molecular genetic testing, it was determined that the patients carried SCA-related mutant genes. We have screened the gene for SCA1, SCA2, SCA3, SCA6, SCA7, SCA8, SCA10, SCA12, SCA17, and DRPLA. The results turned out that there were only SCA1, SCA2, SCA3, and SCA6 patients in our study.
Individuals were not included in the study if they exhibited secondary ataxia caused by cerebrovascular disease, tumors, alcoholism, vitamin B 1 or B 12 deficiency, folate deficiency, drug use, neurosyphilis, multiple sclerosis, paraneoplastic cerebellar degeneration, immune-mediated cerebellitis, or hypothyroidism.
From August 2019 to October 2020, we included an additional 25 patients with undiagnosed ataxia in an independent verification cohort for evaluation and analysis. The schematic diagram for the research design is shown in Figure 1.

Information on the Collection and Detection of the Fluid Biomarkers
Information was collected for each patient concerning their demographic and clinical characteristics as well as laboratory examination results when they were first admitted and before any treatment had occurred. The laboratory examination namely as related metabolic or biochemical indicators included total bilirubin (TBIL), direct bilirubin (DBIL), indirect bilirubin (IBIL), alanine aminotransferase (ALT), aspartate aminotransferase (AST), total protein (TP), albumin (ALB), globulin (GLB), urea (UREA), creatinine (CREA), cystatin C (CysC), URIC, triglyceride (TG), cholesterol (CHOL), highdensity lipoprotein cholesterol (HDLC), low-density lipoprotein cholesterol (LDLC), alkaline phosphatase (ALP), glutamyl transpeptidase (GGT), estimated glomerular filtration rate (eGFR), sodium (NA), potassium (K), lactate dehydrogenase (LDH), hydroxybutyrate dehydrogenase (HBDH), creatine kinase (CK), and glucose (GLU). They are actually clinical basic indicators routinely tested for diagnostic use. These analytes were tested by qualified laboratory personnel following standard operating procedures established by the Department of Laboratory Medicine in West China Hospital of Sichuan University (WCH-LM-CHE-SOP-T1). Also, they were measured using Roche Cobas 702 automatic biochemical analyzer (Roche, Mannheim, Germany) with the corresponding reagents, calibrators, and quality control materials. The specific method for each analyte is listed in Supplementary Table 1.
Additional testing for 20 proteins included C-C motif ligand (CCL)2/macrophage chemoattractant protein-1 (MCP-1), CCL11, CD117/c-kit, α-synuclein, contactin-1, interleukin-1 receptor antagonist (IL-1ra), IL-1β, IL-6, IL-15, IL-7, GM-CSF, carbonic anhydrase, S100B, APP, calbindin D, proganulin, kallikrein 3, kallikrein 5, kallikrein 6/neurosin, and urokinase. These proteins were detected using Human Magnetic Luminex Screening Assay (LXSAHM; R&D Systems, Minneapolis, MN, United States) on Bio-Plex 200 detection platform (Bio-Rad, California, United States) according to the manufacturer's instructions. The serum samples for Luminex assays were the residuals of blood samples obtained from patients for routine clinical experiments at first admission. They were centrifuged for 15 min at 1,000 × g then were stored at −80 • C until used. On the day the samples were assessed, previously frozen serum samples were centrifuged at 16,000 × g for 4 min immediately and 50 µl of serum samples were handled in twofold dilutions with Calibrator Diluent RD6-52 provided in the kit. The sample concentration was calculated based on the standard curve determined for each analyte, which was derived from the serial dilution concentration of the standard. No sample exceeded the upper detection limit or fell below the lower detection limit. The standards were tested in duplicate. As for the standard curve, the coefficient of variation Frontiers in Aging Neuroscience | www.frontiersin.org (CV) was calculated and did not exceed 20% and the recovery rate was between 80 and 120%. The detailed principles and protocols are introduced in Supplementary Sheet 1.

Core Variable Selection and Identification of the Established Model
The Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis was performed to select core variables that could decrease the regression coefficient for each variable within a specific range and eliminate the feature with a coefficient of 0, independent of statistical significance (Tibshirani, 1997). Forty-seven possible indicators including AO, gender, 25 related metabolic markers, and 20 proteins were included in LASSO analysis at first. This protocol identified variables that were more representative for disease outcomes that allowed the identification of an optimally refined generalized linear model without overfitting, which was better suited for the variable analysis of studies with small sample numbers (Corey et al., 2018). The remaining core variables were integrated to establish a model using logistic regression. Shiny R Package was used to build interactive web applications. The steps described previously were accomplished using R, version 3.5.0, for Mac.

Statistical Analysis
The distributions of variables were assessed using Kolmogorov-Smirnov tests and quantile-quantile plots. Continuous variables with normal distribution were presented as mean ± SDs. Continuous variables not following the normal distribution and categorical variables were presented as medians (upper and lower quartiles) and in terms of frequency, respectively. The χ 2 test for categorical variables and Student's t-test or Mann-Whitney U test for continuous variables were applied to compare the two groups. The diagnostic performance of the equation was displayed using receiver operating characteristic (ROC) analysis and quantified using the area under the curve (AUC). Decision curve analysis (DCA) was used to measure the net clinical benefits. All statistical analyses were carried out using SPSS, version 25.0, and R, version 3.5.0, for Mac. All statistical tests were two-tailed, and P <0.05 indicated statistical significance.

Standard Protocol Approvals, Registrations, and Patient Consent
The protocols used in this study were approved by the West China Hospital, Sichuan University Medical Ethics Committee. Written informed consent was obtained from all participants.

Demographic and Clinical Information
One hundred twenty-five patients were included in a derivation cohort, among which 82 patients (31 SCA vs. 51 MSA-C) were enrolled randomly in a training cohort, and 43 patients (15 SCA vs. 28 MSA-C) were enrolled randomly in a testing cohort. The frequency of MSA-C in the training cohort (62.20%) was not significantly different from the testing cohort (65.12%). Also, medical information from an additional 25 ataxia-like patients was collected using the same criteria for external independent validation. The demographic and clinical characteristics of participants in the derivation cohort are shown in Table 1. The median age of onset (AO) for MSA-C and SCA was significantly different. The information of different subtypes of SCA patients are displayed in Supplementary Table 2.
Among the fluid markers assessed in the training set, we observed only IL-7 as a neuroinflammation-related cytokine that was significantly differentially expressed between MSA-C patients and SCA patients, with higher levels in SCA patients ( Table 2 and Supplementary Figure 1). Similarly, four metabolites exhibited different levels between the two groups, including relatively increased AST, GLU, and CysC, while the level of eGFR was lower in MSA-C patients. However, different trends of expressed markers were observed in the testing set where additional metabolomic changes existed ( Table 2).

Core Variable Selection and Establishment of the Identification Model
We investigated the possibility of identifying MSA-C patients based on candidate variables. Using the Lasso regression analysis for multivariate analysis, five core variables (AO, DBIL, AST, eGFR, and α-synuclein) were selected out of 47 possible indicators to formulate a disease panel. DBIL presented no significant differences between the two groups when assessed in univariate analysis. However, higher AST level (P = 0.027) and lower level of eGFR were observed in the MSA-C patients (P < 0.001). The remaining five core variables with favorable identification efficiency were integrated into a logistic identification model and simultaneously credited with weighting coefficients. Afterward, the five core variables were combined according to the weighting coefficients to obtain a scoring formula.

The Performance of the Model
The ROC curve was displayed to validate the predictive accuracy of the model. The ROC illustrated that an AUC of 0.929 (95% CI: 0.872-0.985) was present in the training set, and an AUC of 0.917 (95% CI: 0.829-0.995) was present for the testing set, revealing good concordance and reliable ability. The cutoff value for the training set was 0.707. The DCA quantitatively demonstrated a high clinical net benefit over the entire probability threshold (Figures 2A,B).

External Independent Validation
We included 25 suspected ataxia patients to independently validate the model before obtaining their final definite diagnosis information. According to the suggestive prediction results from the model, 15 individuals were identified as MSA-C patients. Subsequently, we compared the model prediction results after obtaining the final diagnoses, which were confirmed using a combination of clinical evaluation, neuroimaging results, and genetic testing. The comparison revealed that 13 MSA-C patients were confirmed to have MSA-C compared with the predicted results of 15 individuals (13/15, positive predictive value = 86.67%). Two patients who were not recognized by the model were confirmed as MSA-C patients after the comprehensive diagnostic evaluation (8/10, negative predictive value = 80%).

Construction of the Web Application
The Shiny R Package was used to transform the prediction model into a visualizing and operational web application 1 , which integrated all five selected factors. By dragging the slider below each of the variables, the corresponding parameter could change, and the sum of the points calculated represented the predictive probability of the risk for MSA-C ( Figure 2C).

DISCUSSION
Over the past decade, many clinicians have summarized disease characteristics and conducted research with the goal of better defining and diagnosing MSA-C (Koga and Dickson, 2018). In fact, on account of heterogeneity in clinical characteristics due to different stages of disease and individual variation, it is easy to misdiagnose MSA-C as other similar diseases such as SCA. Meanwhile, with the lack of pedigree and genetic information, the certain diagnosis of SCA can also be difficult. However, little has been gained due to a lack of sufficient specific biomarkers of the disease. Unfortunately, no specific biomarkers for MSA-C have been found in this study or previous studies. Even though some potential specific biomarkers in our study exhibited significant differences, their specificity for a diagnosis of MSA-C was not convincing.
When specific biomarkers cannot meet the requirements for adequate disease diagnosis, a clinical prediction model based on information, including multiple demographic characteristics, clinical variables, and laboratory indicators, might improve the diagnostic efficiency for some neurological diseases, avoid specific biases, and provide relatively objective predictions. For example, a nomogram developed by Wei et al. based on seven predictive factors (the AO, rate of disease progression, hemoglobin A1c level, body mass index, creatinine, creatine kinase, and noninvasive positive pressure ventilation) was used to predict the possibility of longer survival of amyotrophic lateral sclerosis patients and attained an AUC of 0.92 (95% CI: 0.88-0.96) 1 https://guoshuo.shinyapps.io/shuo/ (Wei et al., 2018). Such advances also have been proposed and proved sufficient in the diagnosis and subsequent health care management of many diseases. Therefore, we hypothesized that combining variables from different assessment parameters could be used to develop successful predictive models to identify MSA-C patients.
In this study, we screened five predictors (AO, DBIL, AST, eGFR, and α-synuclein) as a panel that were combined to construct a predictive diagnosis model for MSA-C. These five predictors were essential for improving the identification of MSA-C patients. AO was an independent positive indicator for MSA-C, which matched the natural baseline information reported for MSA-C and SCA, as the peak AO of MSA-C was later than SCA (Jellinger and Wenning, 2016). Both AST and eGFR presented significant differences between the MSA-C and SCA groups, whereas there was no difference for DBIL between the two groups. Accumulative evidence has suggested misfolded αsynuclein could be a key component in the pathogenic pathway leading to neurodegeneration and the pathological presence in autopsy results of α-synuclein-containing protein aggregates, also known as glial cytoplasmic inclusion (GCI) bodies, was regarded as the crucial method for a definitive diagnosis of MSA (Trojanowski et al., 2007;Ubhi et al., 2011;Jellinger and Wenning, 2016;Woerman et al., 2018). Therefore, numerous studies have focused on CSF or blood α-synuclein levels in the diagnosis of MSA, but the results have been inconsistent. Interestingly, α-synuclein alone did not exhibit a significant difference between the two groups in our study. Nevertheless, it remained as one of the core variables suitable to be added into the model construction. The vast majority of MSA-C patients do not have a familial predisposition, and the family history of some patients were unclear or missing, so we did not include family history as a parameter in the variable-based prediction model.
The performance evaluation and external clinical validation for this model demonstrated good reliability and accuracy, with a satisfactory AUC of 0.929 and 0.917 in the training and testing sets, respectively. Only minor differences were observed between the two sets, all of which revealed the good discrimination accuracy of this model. Moreover, we performed a DCA evaluation in this study, and it indicated that the model had an overall high net clinical benefit at different threshold probabilities, suggesting that the judgments made in the model will benefit patients in most cases.   The decision curve shows that using the identification model to identify MSA-C yields more benefits than total or no relative treatment. If the patient has a personal threshold probability of 60% (i.e., if the patient has a MSA-C probability of 60%, the patient will choose corresponding treatment), then the net benefit is 0.453 when the decision is made using the model. (C) Application example of the identification model. A 52-year-old male patient with suspected ataxia was admitted to the Department of Neurology, West China Hospital. We entered the corresponding parameters of each marker. Then, the model showed his probability of MSA-C was 0.79. The follow-up clinical comprehensive evaluation, neuroimaging examination, and genetic testing confirmed the speculation of our model.
Also, determination of the true clinical application ability was of utmost importance, for which we enrolled 25 suspected ataxia patients as an independent validation cohort. The model results were compared with the comprehensive assessments for the 25 individuals, including family history, clinical manifestations, neuroimaging features, and genetic sequencing results. The model identification results demonstrated a relatively high predictive accuracy value, suggesting promising use in clinical practice. However, four patients were misclassified, among whom there was one ataxia patient with an undefined cause and one SCA patient.
It was notable that the five core variables, which may not present with statistic differences as single biomarkers in univariate analysis, were automatically chosen by the Lasso as a group with the best performance for differential identification. Lasso helped screen the potential predictors as well as maintain the objectivity, comprehensiveness, and accuracy, in view of balancing the number of variables and sample size at the same time. The inconsistency observed between the univariate analysis and multivariate analysis might result from the sample size, the number of variables, the interaction of multiple markers as a whole, or other factors. Therefore, in the future, we need to combine more elements and research as well as enroll more ataxia patients with other probable causes to improve the performance of the model.
At present, the prediction model cannot prove the causality between markers and the pathogenesis of the disease, but it theoretically and statistically displayed a certain correlation between markers and disease, which provide the clue for further fundamental researches. In our study, five core variables were integrated in a multi-parameter combination. Biologically speaking, bilirubin is related with oxidative stress. It plays a role in defending against the increased oxidative stress and some studies have suggested that low bilirubin levels and oxidative stress could occur in some neuroinflammatory diseases and neurodegenerative diseases (Ilzecka and Stelmasiak, 2003;Vitek, 2013). Previous study showed that TBIL and IBIL were lower in MSA patients than in healthy controls (Zhou et al., 2016). ALT and eGFR are indicators reflecting kidney and liver function respectively, and their roles in neurodegenerative diseases have also been reported (Nam et al., 2019;Nho et al., 2019;Palma et al., 2020). In fact, previous studies have suggested that chronic diseases such as diabetes mellitus (DM), hypertension, and depression may be associated with an increased risk of developing PD (Ascherio and Schwarzschild, 2016). However, none of the similar study for MSA has been found. Therefore, the metabolic and hormonal disturbances may be a topic of interest for further research of MSA-C.
Furthermore, based on the results mentioned previously, we translated the prediction model into a visual and operational web application, which can be applied to mobile devices. By dragging the slider to change the corresponding parameters, the point total is displayed automatically, which represents the probability of a diagnosis of MSA-C. The short time taken to detect the factors needed, the ease of use, and the capability for continuous optimization have made this application accessible and convenient for users.
However, this study presented several limitations. Because the study was restricted by the morbidity of MSA-C patients, the number of participants included in our study from a single center was small and might not accurately represent MSA-C patients as a whole. Even though we enrolled the MSA-C patients exclusively based on clinical diagnostic criteria without postmortem evidence, some bias could have been introduced when we chose the patients that were included in our study. The candidate biomarkers were limited. Additional biomarkers combined with neuroimaging features or other types of objective markers might provide a better process for the differential diagnosis of MSA-C. As for the SCA patients enrolled, due to the low prevalence of SCA, only the subtypes of SCA1, SCA2, SCA3, and SCA6 were included as a whole. Although SCA3 patients were in the majority of the controls, still the existence of heterogeneity might have a certain influence on the comparison of variables between two groups afterward on the efficiency and generalization of the model. The impact of the diversity of SCA subtypes can be further analyzed for the optimization of the model. Also, other types of ataxia-like sporadic adult-onset ataxia could be included as disease controls to improve the specificity of the model for MSA-C diagnosis. Therefore, we intend to add and analyze more variables from diverse aspects to accurately and efficiently differentiate MSA-C from other kinds of diseases to perfect this model. The model also needs to be validated using a larger population followed by a series of consistent development actions to expand the usability and reliability for application. After the dynamic detection of candidate biomarkers, this model also should be of considerable benefit to monitor and predict disease development.

CONCLUSION
To our knowledge, this is the first study to establish a clinical prediction model based on demographic and laboratory variables selected by LASSO regression analysis, including AO, DBIL, AST, eGFR, and α-synuclein, for better differentiation between MSA-C and SCA, and the model presented excellent overall availability in our specific study group. It is highly anticipated that after continued improvement of the model and its validation in a larger population, it will be applied clinically as an integral auxiliary tool to assist in the differential diagnosis of MSA-C and advance related healthcare management.

DATA AVAILABILITY STATEMENT
The datasets analyzed in this article are anonymous to protect patient privacy and are not publicly available. Request to access the datasets should be directed to email the corresponding author.

ETHICS STATEMENT
The protocol of this study was approved by the West China Hospital, Sichuan University Medical Ethics Committee. Written informed consent was obtained from all participants.

AUTHOR CONTRIBUTIONS
SG and MW designed the research and wrote the manuscript. BZ and YuZ responsible for the recruitment of patients with ataxia and neurological testing. YA and ZM responsible for the detection of candidate biomarkers. YaZ and MZ responsible for collecting and organizing data. DY responsible for neuroimaging assessment. BY supervised the experiment and revised the manuscript. All authors contributed to the article and approved the submitted version.