Discrimination between Alzheimer’s Disease and Late Onset Bipolar Disorder Using Multivariate Analysis

Background Late onset bipolar disorder (LOBD) is often difficult to distinguish from degenerative dementias, such as Alzheimer disease (AD), due to comorbidities and common cognitive symptoms. Moreover, LOBD prevalence in the elder population is not negligible and it is increasing. Both pathologies share pathophysiological neuroinflammation features. Improvements in differential diagnosis of LOBD and AD will help to select the best personalized treatment. Objective The aim of this study is to assess the relative significance of clinical observations, neuropsychological tests, and specific blood plasma biomarkers (inflammatory and neurotrophic), separately and combined, in the differential diagnosis of LOBD versus AD. It was carried out evaluating the accuracy achieved by classification-based computer-aided diagnosis (CAD) systems based on these variables. Materials A sample of healthy controls (HC) (n = 26), AD patients (n = 37), and LOBD patients (n = 32) was recruited at the Alava University Hospital. Clinical observations, neuropsychological tests, and plasma biomarkers were measured at recruitment time. Methods We applied multivariate machine learning classification methods to discriminate subjects from HC, AD, and LOBD populations in the study. We analyzed, for each classification contrast, feature sets combining clinical observations, neuropsychological measures, and biological markers, including inflammation biomarkers. Furthermore, we analyzed reduced feature sets containing variables with significative differences determined by a Welch’s t-test. Furthermore, a battery of classifier architectures were applied, encompassing linear and non-linear Support Vector Machines (SVM), Random Forests (RF), Classification and regression trees (CART), and their performance was evaluated in a leave-one-out (LOO) cross-validation scheme. Post hoc analysis of Gini index in CART classifiers provided a measure of each variable importance. Results Welch’s t-test found one biomarker (Malondialdehyde) with significative differences (p < 0.001) in LOBD vs. AD contrast. Classification results with the best features are as follows: discrimination of HC vs. AD patients reaches accuracy 97.21% and AUC 98.17%. Discrimination of LOBD vs. AD patients reaches accuracy 90.26% and AUC 89.57%. Discrimination of HC vs LOBD patients achieves accuracy 95.76% and AUC 88.46%. Conclusion It is feasible to build CAD systems for differential diagnosis of LOBD and AD on the basis of a reduced set of clinical variables. Clinical observations provide the greatest discrimination. Neuropsychological tests are improved by the addition of biomarkers, and both contribute significantly to improve the overall predictive performance.


INTRODUCTION
Bipolar disorder (BD) is a chronic mood disorder associated with cognitive, affective, and functional impairment, often appearing at youth (around age 20 years), or even earlier, whose age of onset may be determined by environmental conditions (Bauer et al., 2014b(Bauer et al., , 2015aMartinez-Cengotitabengoa et al., 2014). Dementia syndrome arising after a lifetime history of bipolarity (Lebert et al., 2008;Ng et al., 2008) does not match the criteria of Alzheimer's disease (AD) (Forcada et al., 2014). On the other hand, late onset (i.e., age > 50 years) of BD (LOBD) (Depp and Jeste, 2004;Prabhakar and Balon, 2010;Besga et al., 2011;Carlino et al., 2013;Po-Han et al., 2015) may be difficult to differentiate from behavioral impairment associated with Alzheimer's disease (AD), because of overlapping symptoms and neuropathology. Though AD and LOBD are considered distinct and unrelated clinical entities, there is a trend in recent years to question whether there is a link between both disorders based on the overlapping symptoms and the increased successful use of well-established BD treatments, i.e., Lithium, to treat dementia (Takeshi et al., 2006).

Common Traits Between LOBD and AD
Most studies focus on the differences and commonalities between BD and schizophrenia (García-Bueno et al., 2014), and depression (Azorin et al., 2015); however, some recent studies report comparisons between BD and AD patients (Berridge, 2013) due to either late onset or BD aging population. Inflammation and oxidative stress have been found as common pathophysiological processes underlying AD (Akiyama et al., 2000;Kamer et al., 2008;Sardi et al., 2011) and LOBD (Goldstein et al., 2009;Konradi et al., 2012;Leboyer et al., 2012;Lee et al., 2013;Bauer et al., 2014a;Hope et al., 2015), as well as many other neuropsychological illness, such as depression and mania (Brydon et al., 2009;Dickerson et al., 2013;Castanon et al., 2014;Singhal et al., 2014). These disorders seem to be epigenetically linked to decrease transcriptional activity. It has been reported that the frontal cortex of both LOBD and AD patients exhibits an altered epigenetic regulation related to neuroinflammation, synaptic integrity, and neuroprotection (Rao et al., 2012). Oxidative stress contributes to the pathogenesis of both diseases through similar mechanisms of neuroinflammation, excitotoxicity, and upregulated brain metabolism (Rao et al., 2010(Rao et al., , 2011. Mood and cognition impairment are considered core problems in LOBD and AD, respectively. However, in recent years, clinical features of AD, as well as cognitive deficits in LOBD, have received more attention (Ng et al., 2008). Increased agitation and aggression with cognitive and independence decline in AD can be easily confused with LOBD (Zahodne et al., 2015). The following psychiatric symptoms have been reported in AD in common with the profile observed in LOBD: agitation, euphoria, disinhibition overactivity without agitation, aggression, affective liability, dysphoria, apathy, impaired self-regulation, and psychosis (Albert and Blacker, 2006).

Description of the Study
The study was registered as an observation trial 1 in the ISRCTN registry. It involved nearly one hundred subjects of age at recruitment above 64 years, including healthy controls and patients with diagnosis of AD or BD. The study included neuroimage data, neuropsychological tests, and blood sample biomarkers. Classification results based on neuroimage data have been reported elsewhere Besga et al., 2012), showing that features extracted from fractional anisotropy coefficients of diffusionweighted images provided very high classification performance between LOBD and AD patients. The hypothesis explored in the work reported here is the feasibility of AD and LOBD discrimination using multivariate machine learning-based computer-aided diagnosis (CAD) tools on a reduced set of clinical, cognitive, and biological biomarker variables. We evaluated the classification performance achieved using various feature sets composed of combinations of variable categories, as well as a feature selection based on the Welch's t-test. Predictive CAD systems have been proposed to improve diagnostic accuracy complementing the neuropsychological assessments carried out by expert clinicians (Sigut et al., 2007;Graña et al., 2011;Savio et al., 2011;Westman et al., 2011;Termenon et al., 2013). Accurate diagnosis is crucial to mitigate negative effects of inappropriate treatments.

MATERIALS AND METHODS
Multivariate analysis methods (Westman et al., 2011) allow to assess the joint significance of groups of biomarker measures. They have been successfully applied to large dimensionality neuroimage data (Fung and Stoeckel, 2007;Salas-Gonzalez et al., 2009). Achieved cross-validation classification accuracy, sensitivity, specificity, and area under the ROC curve (AUC) provide the significance value for each combination of variables considered as classification features. Approaches applying feature extraction by functional transformations of the data, such as orthogonal partial least squares (Ramirez et al., 2010;Westman et al., 2011), have achieved high classification performances. However, these transformations do not allow to back-project the contribution of each variable to classification success. In order to reason about disease mechanisms and quality of biomarkers, we follow a feature selection approach, where variables are selected according to their expected contribution to the classification success. Moreover, we report variable importance computed on the basis of the contribution of each variable to the construction of a specific CART classifier, as well as the Welch's t-test statistical significance of variable differences between groups.

Subjects
Patients included in the study were referred to the psychiatric unit at Alava University Hospital, Vitoria, from its catchment recruitment area for clinical assessment of memory complaints. The BD patients were in the euthymic state. No patient has previous BD diagnosis. These patients were all living in the community. Selected subjects underwent a standard protocol, including clinical, cognitive, and neuropsychological evaluations. Ninetyfive elderly subjects were included in the present study; Table 1 presents demographic details of the cohort. Sample size was conditioned by the availability of funding to carry out MRI neuroimaging and biochemical tests. Reports on neuroimage results are published elsewhere (Besga et al., 2011). The LOBD group fulfills the DSM IV criteria and the AD group fulfills the NINDS-ADRDA criteria for probable AD. Subjects with psychiatric disorders (i.e., major depression) or other conditions (i.e., brain tumors) were not considered for this study. The exclusion criteria were ongoing infections, fever, allergies, or the presence of other serious medical conditions (autoimmune, cardiac, pulmonary, endocrine, and chronic infectious diseases, and neoplasms). Neither the patients nor the healthy control subjects were receiving immunosuppressive drugs or vaccinations for at least 6 months prior to inclusion in the study or anti-inflammatory analgesics 2 days prior to the extraction of the blood sample. The ethics committee of the Alava University Hospital, Spain, approved this study. All patients gave their written consent to participate in 3.92 ± 1.14 3.33 ± 1.00 3.29 ± 1.14 the study, which was conducted according to the provisions of the Helsinki declaration. After written informed consent was obtained, venous blood samples (10 mL) were collected from the volunteers, after which all the mood scales and cognitive tests were performed.

Variable Description
For each subject in the study, we have measured the following 3 categories of variables.

Neuropsychological Variables (NEURO)
Cognitive performance has been assessed with a battery of neuropsychological tests covering the following cognitive domains: executive function, learning and memory, and attention. The index for each cognitive domain is the mean of the z-scores of the tests covering that domain.

Clinical Observations (CLIN)
The Neuropsychiatric Inventory (NPI) 2 (Cummings, 1997) was developed to provide a means of assessing neuropsychiatric symptoms and psychopathology of patients with Alzheimer's disease and other neurodegenerative disorders. The NPI assesses 10 (10item NPI) or 12 (2-item NPI) behavioral domains common in dementia. These include Hallucinations, Delusions, Agitation/aggression, Dysphoria/depression, Anxiety, Irritability, Disinhibition, Euphoria, Apathy, Aberrant motor behavior, Sleep, and night-time behavior change (12-item version only), Appetite and eating change (12-item version only). Each NPI domain is scored based on a standardized interview administered by the clinician for frequency, severity, and associated caregiver distress.

Functional Assessment
Patients were functionally assessed by the Functional Assessment Staging procedure (FAST) (Reisberg, 1988). Patients with greater functional impairment show increments in cognitive loss. FAST ranks patients in 16 stages. Stage 1 marks subjects without difficulties, while Stage 7(f) marks patients unable to hold up his/her head. The last eleven stages are subdivisions of FAST between the late stages 6 and 7. FAST was administered by the clinician leading the study.

Population Comparison
We used the Welch's t-test to assess the statistical significance of variables differences between groups. It is a two-sample test used to check the hypothesis that two populations have equal means. Welch's t-test is the adaptation of Student's t-test for the case of two population samples may have different variances. These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping.

Classification Algorithms
Classification experiments were carried out in the Python programming language, using specific classifier implementations provided by scikit-learn 3 (see text footnote 4) python package.
We have applied the Support Vector Machines (SVM) (Vapnik, 1998), CART Decision Trees, and Random Forest (RF). Briefly described, SVM build a discriminating function with optimal generalization properties, which is a hyperplane built on the basis of the support vectors at the boundaries between classes. The kernel trick [e.g., radial basis function (RBF)] allows to deal with not linearly separable classes. Parameter tuning (i.e., Gaussian function width) is performed independently at each cross-validation fold when carrying out an assessment of classifier performance. SVM and its libSVM 4 implementation (Chang and Lin, 2011) have become a standard classifier in the neuroscience community (Burges, 1998;Tao et al., 2006;Fung and Stoeckel, 2007). CART Decision Trees (Breiman et al., 1984;Quinlan, 1993) are built by recursive data space partitions. A univariate (single attribute) split is defined at each tree node using some criterion (e.g., mutual information, gain-ratio, impurity gini index). Tree leaves correspond to class a posteriori distribution of the training data samples falling in this leave. Random Forests (RF) (Breiman, 2001) algorithm is an ensemble of classifiers, which has been successfully applied in a wide variety of classification tasks (Barandiaran et al., 2010). RF is a collection of decorrelated randomly generated decision tree predictors, in which each tree casts a unit vote to decide the most popular class of input x. A bootstrapped training dataset is used to grow each individual tree. RF model parameters are the number of trees, their maximum depth, and the ratio of dimensionality reduction at each node. Finally, to assess variable importance, we have computed for each variable the average Gini impurity index (Breiman et al., 1984) of all the nodes in a CART classifier where this variable is used for the split, normalizing it such that the most important variable has value 1.

Experimental Design
All variables are normalized computing their z-scores previous to classification experiments. In order to reduce circularity effects, variable normalization was carried out independently at each cross-validation folder. In order to evaluate the effect of neuropshycological measures (NEURO), biological markers (BIO), and clinical variables (CLIN) in differentiating HC from individuals with AD and LOBD, we have applied multivariate machine learning classification methods to each of the possible contrasts in the study: (1) healthy controls versus Alzheimer's disease patients (HC vs. AD), (2) healthy controls versus Bipolar disorder patients (HC vs. LOBD), and (3) Bipolar disorder versus Alzheimer's disease patients (LOBD vs. AD). We evaluated the performance of the classifier using a leave-one-out (LOO) cross-validation algorithm, applying a 3 × 2-fold cross-validation grid search for classifier optimal parameters tuning. To quantify the results, we measured the following performance measures:   Figure 1 shows a plot of the feature importance of the variables considered in the study for the LOBD vs. AD classification contrast. The most important variable in all experiments is FAST. Overall, the feature importance values are in agreement with the statistically significant differences presented in Table 2, providing a more precise ranking. Cognitive tests are the second most important feature in the discrimination of HC vs. AD, whereas the clinical are more important discriminating HC vs. LOBD, with memory domain ranking high. For the critical discrimination of LOBD vs. AD, the clinical variables are the most informative; however, biological marker MDA ranks third while cognitive memory domain ranks fifth. This is the only instance of high importance ranking biological marker.

Classification
Classification performance results are presented in Tables 3 and 4 reporting accuracy, and AUC, respectively, for each classifier, combination of variables and classification contrast. Comparing classifier results, the CART provides the best results, though the improvement relative to other classifiers does not achieve statistical significance (F-test, p > 0.01). This may be due to the fact that small sample size penalizes the construction of large classifiers, which are overparameterized. Comparison of results according to variable category shows that clinical variables (CLIN) provide the best results or contribute to them. On the other hand, the biological biomarkers (BIO) are the ones that contribute less to classification performance. Considering the contrast LOBD vs. AD, the clinical variables provide the best results, though for some classifiers, such as RBF SVM, the neuropsychological variables contribute to improve results. The best results are obtained using the set of variables selected according to their significance in a Welch's t-test (denoted CLIN + NEURO + BIO-Wt in the tables).

DISCUSSION
This study was designed to investigate the feasibility of discriminating between AD and LOBD (Lebert et al., 2008;Carlino et al., 2013;Grande et al., 2014) using a wide range of clinical, neuropsychological, and biological (inflammatory, oxido-nitrosative, and neurotrophic) measures for this purpose. Previous studies have attempted to discriminate between subjects with AD and HC and between LOBD and HC, but there is no other study  to our knowledge dealing with LOBD patients compared with AD, when the differential diagnosis is more difficult (Aprahamian et al., 2014). We have included HC vs. AD and HC vs. LOBD contrasts in Tables 2-4 to assess if the biomarkers are also useful to discriminate them. We find that the results obtained are according to the literature. We have found that the clinical variables carry most of the diagnostic value; however, classification performance can be improved by the consideration of the neuropsychological variables and biological markers.

Clinical Variables
It has been observed that cognitive deficits affect the functionality and global prognosis of LOBD patients (Kawas et al., 2003) as occurs in patients with dementia. Besides cognitive performance, behavioral disorders are also closely related to the overall functionality of the patients. Non-cognitive symptoms have to be considered as they may help the discrimination between LOBD and AD. Our results in Figure 1 show that agitation, euphoria, and disinhibition are the non-cognitive neuropsychological variables having the greatest discrimination power in the case AD vs. LOBD. Nevertheless, the clinical variable that differentiates more strongly between LOBD and AD is overall patient behavioral functionality measured by FAST.

Neuropsychological Variables
Neuropsychological assessment is typically used for both descriptive and diagnostic purposes. When used diagnostically, tests provide information about how likely is that a particular individual has or will have a cognitive disorder. In relation to BD, various studies have revealed cognitive impairment as part of its clinical expression. In fact, some authors suggested that having been diagnosed with BD is a significant predictor of cognitive decline over time, further, cognitive dysfunction increases in the long term (Lewandowski et al., 2011;Torrent et al., 2012). Although there are limited data on the cognitive profile of LOBD (Carlino et al., 2013;Grande et al., 2014), cognitive deficits affecting memory, attention, and executive function have been reported (Robinson et al., 2006;Osher et al., 2011;Aprahamian et al., 2014). Accordingly, when comparing LOBD patients with HC we found that all these variables have statistically significant differences, as shown in Table 2. Similar cognitive degradation is well known in AD (Kawas et al., 2003;Albert and Blacker, 2006), and it is confirmed by results in Table 2. The classification experiments confirm that the set of neuropsychological variables (NEURO) is useful to discriminate AD patients from controls, achieving high accuracy (93.65%) and AUC (92.88%). However, they are much less effective to discriminate LOBD patients from controls and AD patients. The results of Welch's t-test in Table 2 and the variable importance results in Figure 1 confirm that memory cognitive domain is essential in clinical practice for the detection and diagnosis of AD (Weintraub et al., 2012). Accordingly, Figure 1 shows that memory domain tests have a high importance for AD vs. LOBD classification. Finally, we found that memory is a key factor in differential diagnosis between LOBD and AD.

Blood Biomarkers
Besides the similarity of some symptoms, AD and LOBD share pathophysiological features that might difficult differential diagnosis. Peripheral markers related to inflammation, oxidative stress, and neurotrophins have been related to clinical symptoms, cognitive decline, and illness severity in BD (Barbosa et al., 2012;Martinez-Cengotitabengoa et al., 2014), as well as in AD (Berridge, 2013). In our study, all blood biomarkers, except IL1, were lower in the plasma of LOBD group than in AD group, although only MDA levels revealed statistical significant difference in the Welch's t-test. This finding agrees with a significant decrease in BDNF and IL-6 in BD patients at later stage compared to its early stage, while, inversely, TNFα has a significant increase at the BD later stage (Kauer-Sant' Anna et al., 2009;Grande et al., 2014). All these findings may suggest that the group of LOBD patients have more inflammation. No discriminant variable has been found from the collection of biological biomarkers (BIO) in our classification experiments. There are numerous reports of inflammation and excess oxidation within the brain of patients, but outside the CNS, the evidence is less definite and results of studies are often contradictory. It has been suggested that inflammation and oxidative stress do not cause AD or LOBD by themselves, but probably during aging, they reinforce many interdependent factors related to these complex neuropsychiatric disorders (Forcada et al., 2014). It is well known that brain aging involves complex structural and molecular processes that provide a misbalance between protective and degenerative factors, predisposing the brain to higher risk of acquiring neurodegenerative diseases (Lewandowski et al., 2011). Nevertheless, the inclusion of MDA in the Welch's t-test features produces a great improvement in classification performance, reaching accuracy 90.26 and AUC 89.57. This is a surprising fact because inflammation is a common effect not a differential effect.

Limitations
The sample is not well balanced; there are diverse numbers of AD, LOBD, and HC. The feminine LOBD sample is much larger.
Old age patients suffer from multimorbidity, which is a source of confusion for blood plasma biomarkers.

CONCLUSION
We have found that a small set of variables, including an oxidative stress biomarker (i.e., MDA), allows good discrimination of LOBD and AD. Besides the potential construction of a CAD system upon larger databases, these findings could help in identifying new therapeutic routes for treatment and diagnosis.

AUTHOR CONTRIBUTIONS
AB, IG, AG-P, and MG have made substantial contributions to the conception or design of the work; AB, JM, JL, DC, BA, AS, MG, and EE contributed to the acquisition, analysis, or interpretation of data for the work; all authors contributed in drafting the work and revising it critically for important intellectual content; all authors gave final approval of the version to be published; all authors agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.