Accuracy of Support-Vector Machines for Diagnosis of Alzheimer's Disease, Using Volume of Brain Obtained by Structural MRI at Siriraj Hospital

Background: The determination of brain volumes using visual ratings is associated with an inherently low accuracy for the diagnosis of Alzheimer's disease (AD). A support-vector machine (SVM) is one of the machine learning techniques, which may be utilized as a classifier for various classification problems. This study exploratorily investigated the accuracy of SVM classification models for AD subjects using brain volume and various clinical data as features. Methods: The study was designed as a retrospective chart review. A total of 201 eligible subjects were recruited from the Memory Clinic at Siriraj Hospital, Thailand. Eighteen cases were excluded due to incomplete MRI data. Subjects were randomly assigned to a training group (AD = 46, normal = 46) and testing group (AD = 45, normal = 46) for SVM modeling and validation, respectively. The results in terms of accuracy and a receiver operating characteristic curve analysis are reported. Results: The highest accuracy for brain volumetry (62.64%) was found using the hippocampus as a single feature. A combination of clinical parameters as features provided accuracy ranging between 83 and 90%. However, a combination of brain volumetry and clinical parameters as features to the SVM models did not improve the accuracy of the result. Conclusions: In our study, the use of brain volumetry as SVM features provided low classification accuracy with the highest accuracy of 62.64% using the hippocampus volume alone. In contrast, the use of clinical parameters [Thai mental state examination score, controlled oral word association tests (animals; and letters K, S, and P), learning memory, clock-drawing test, and construction-praxis] as features for SVM models provided good accuracy between 83 and 90%.


INTRODUCTION
Alzheimer's disease (AD) is a common condition that is diagnosed in ∼5-7% of the general population (1). Current treatment can improve the quality of life of both Alzheimer's patients and their relatives and caregivers (2). Consequently, a tool that provides high sensitivity and specificity is needed for the accurate diagnosis of this disease.
The current diagnostic tool for AD is the use of clinical criteria, such as DSM-V (3). Despite imaging studies not being included in such criteria, many clinicians have noticed that the size of the brain volume obtained from structural imaging appears to be associated with AD to some extent. Although visual rating of the brain volume is generally used to guide a diagnosis of AD, its interpretation differs vastly among clinicians, and the method lacks specificity and sensitivity (4). It would be beneficial if there was a reliable tool that could interpret imaging results accurately and consistently.
A support-vector machine (SVM), a mathematical function, is designed to classify complex data. This function has the ability to learn the distribution of data and provide a proper classification line (or optimal hyperplane) that is not restricted to a linear fashion (Figures 1, 2).
Several studies have investigated SVM as a diagnostic tool for AD, and a number have shown good levels of accuracy (5)(6)(7)(8). However, those results were based on international imaging data obtained from the Alzheimer's Disease Neuroimaging Initiative, which included a different population from the Thai cohort used in the current study. Hence, this study focused on the accuracy of SVM as a diagnostic tool for the Thai population. Moreover, we investigated clinical data in order to determine if it is possible to further increase the accuracy of SVM.

Study Design
This study was a retrospective, chart review and was approved by the Siriraj Institutional Review Board. Everybody had signed informed consent as part of data registration at the Memory Clinic at Siriraj Hospital. The subjects were recruited by clinical coordinators from the Memory Clinic at Siriraj Hospital. Some normal subjects had brain magnetic resonance imaging scan, which was supported by the Thailand Research Fund. All were living in the community. Their identities (name and identification number) were hidden from the staff.
For the present study, the inclusion criteria consisted of the following: aged 60 years or older, a diagnosis of AD, in accordance with the (9) criteria, and the performance of an MRI brain scan within 1 year of the AD diagnosis. The exclusion criteria were having an untreated psychiatric condition; having a history of stroke (either ischemic or hemorrhagic), CNS infection, drug abuse, epilepsy, severe head trauma, and/or repetitive head trauma; and having been diagnosed with another type of dementia (such as frontotemporal dementia, Parkinson's disease with dementia, Lewy body dementia, vascular dementia, and other secondary dementia).

Subjects
A total number of 201 subjects in this study consists of 101 AD subjects recruited from the memory clinic at Siriraj Hospital, and 100 normal subjects were enlisted from volunteers from the dementia and disability project (10) and caregivers from the memory clinic. These normal controls did not meet the criteria of dementia or mild cognitive impairment. Eighteen subjects were removed due to incomplete MRI data necessary for further analysis. Eligible 183 subjects then were randomly divided into the training group (N = 92 with AD = 46, normal = 46) and testing group (N = 91 with AD = 45, normal = 46) for the SVM study using brain volumetry alone. However, due to the subjects' incomplete clinical data, 73 subjects were removed resulting to 55 subjects for the training group (AD = 28, normal = 27) and 55 subjects for the testing group (AD = 27, normal = 28) of the SVM study using a combination of clinical data and brain volumetry. The randomization technique was used for assigning subjects into groups in order to minimize selection bias and ensures against the accidental uncontrolled bias (11) (Figure 3).

Clinical Data
The Thai mental state examination (TMSE) (12) is a cognitive assessment modified from the Mini mental status examination. Thai mental state examination consists of seven subdomains: orientation, immediate memory (registration), calculation, attention, language, picture copy, and recalled memory. The total score is 30. The score of <24 is suggestive of having dementia. The higher the TMSE score, the better the cognitive function.
The Neuropsychiatric Inventory (NPI-Q) (13) is used to evaluate neuropsychiatric symptoms of all subjects. The NPI-Q is an informant-based instrument that measures the presence and severity of 12 neuropsychiatric symptoms in patients with dementia and normal controls, and to measure informant distress according to individual neuropsychiatric symptom of patients with dementia.
The cognitive data: the examination scores of TMSE and other neuropsychological assessments, namely, Logical Memory (LM), Controlled Oral Word Association Test-animals (COWAanimals), Controlled Oral Word Association Test-letters K, S, P (COWA-KSP), Clock drawing, and Construction-praxis were used as clinical parameters for further SVM model developments. Neuropsychological evaluation is part of clinical criteria to determine if subjects have dementia.

Analysis
The MR images were processed using FreeSurfer to seek for brain volumes (14). Brain MR image analysis was performed using the Freesurfer software suite to extract the brain cortical and subcortical brain volume (15).
Because the individuals had different cranial volumes, all specific regions of the brain volume were calculated relative to the whole brain volume, using the following normalization formula: Specific volume of the brain × 100/intracranial volume The specific volumes used to develop SVM models comprised of both sides of the hippocampus, nucleus accumbens, amygdala, caudate, thalamus, total white matter volume, total gray matter volume, ventricle volume, and a combination of those values.
The WEKA software suite (available at https://www.cs. waikato.ac.nz/ml/weka/) was utilized in this study for SVM classification. Radial basis function was used to seek for the highest accuracy as suggested in a prior study (15). The SVM models then were trained and tested using the data from the training and testing groups (11). The leave-one-out validation technique was applied due to the small data set. The C and gamma values were adjusted to maximize the accuracy and area under the curve of the ROC for all SVM models as shown in Table 3.
The demographic data were analyzed and compared by descriptive statistics, using the unpaired t-test or Chi-square-test, according to each variable's type. We considered any p-value < 0.05 as statistically significant. The results were reported as true positive rate (TP rate), false positive rate (FP rate), accuracy, and the area under the curve (AUC) in a receiver operating characteristic curve (ROC) analysis.

RESULTS
An analysis of the demographic data ( Table 1) revealed an average age of 73.09 years (SD = 7.28) for the AD group, and a lower average of 69.72 years for the normal controlled group, with a significant difference between those averages. Education levels and baseline TMSE scores also differed between the two groups. Even if there are some baseline differences, the randomization method we used with SVM and the leave one out method have something to help to compromise these baseline differences.
As to the descriptive statistical analysis of the brain volumetry of the two groups, only the ventricle volume demonstrated a significant difference, being higher for the AD group ( Table 2). Other parameters showed no significant differences.
From the SVM modeling performance analysis results in Table 3, two models with equally highest accuracy (90.74%) were the model of clinical parameters (TMSE, LM, COWAanimals, COWA-KSP, Clock drawing, and Construction-praxis scores) (item 16 in Table 3) and the model of all brain volumetry combined with the scores of TMSE, LM, COWA-animals, COWA-KSP, Clock drawing, and Construction-praxis scores (   The values below are calculated from "specific brain volume × 100/intracranial volume". *p < 0.05 = statistical significant; AD, Alzheimer's disease. models were found with these parameters: hippocampus with TMSE, LM, COWA-animals, COWA-KSP, clock drawing, and Construction-praxis (AUC = 0.927, Table 3, item 17). We also performed analyses on various combinations of features, but those revealed lower accuracies and lower AUC values as shown in Table 3.

DISCUSSION
Our study with SVM classification revealed that utilizing the scores of TMSE, LM, COWA-animals, COWA-KSP, Clock drawing, and Construction-praxis can best differentiate AD from normal controlled subjects. Neuropsychiatric symptoms alone could not provide accurate results. The demographic data in this study showed that the individuals in the AD group had a slightly older mean age (73.09 years) than those in the normal control group (69.72 years). Those with AD also had a lower formal educational level, as indicated by their comparatively smaller number of years of schooling. Previous research revealed that individuals with a lower number of years of formal education might have a smaller cognitive reserve. As a consequence, the incidence of AD has been reported to be higher in those individuals with lower education (16). Previous studies showed that digital health data, cognitive performance such as memory, and neuropsychiatric symptoms can help identify those with dementia from normal subjects (17)(18)(19)(20)(21). Some research groups (19) have suggested that a diagnosis of dementia can be made from health recording data.
Despite using the highest accuracy that the SVM could provide, brain volumetry alone provided suboptimal accuracy for the diagnosis of AD. The highest accuracy for brain volumetry was found with the hippocampus region, which reached an accuracy of 62.64%. The inclusion of other regions of the brain did not seem to increase the accuracy level any further. This also reflects previous research findings that the hippocampus volume is associated with AD (22). Given that the mean baseline TMSE score of the AD group was not low, one possibility is that the SVM failed to classify AD accurately using hippocampal volume alone in our study because our AD subjects tended to have a mild-to-moderate degree of severity of the disease. Consequently, the difference in the hippocampal volumes of each group was not pronounced. Individuals in Asian counties are known to have a high prevalence of cerebral small vessel disease, which increases with age (23). This cerebral small vessel disease can cause smaller hippocampal volume in subjects with normal cognition. However, in a recent review of 111 studies, the majority of the studies assessed Alzheimer's disease compared with healthy controls, using AD Neuroimaging Initiative data, support vector machines, and only T1-weighted sequences (24). Accuracy was highest for differentiating Alzheimer's disease from healthy controls and poor for differentiating healthy controls vs. mild cognitive impairment vs. Alzheimer's disease.
Our study with SVM classification suggested that brain volumetry alone seems to be a suboptimal parameter for AD diagnosis; a different situation is found with the clinical parameters. COWA had a high degree of accuracy, especially COWA-KSP (note that KSP come from the letters " " in the Thai language; in English-speaking countries, the letters "F, A, and S" or "C, F, and L" are normally used). These results are consistent with the knowledge that individuals with AD also have executive function impairment of varying severities (25). Word fluency relies on the executive functions that will enable an individual to produce a number of words quickly in a limited time. It follows that COWA might be adapted as a good screening tool for the diagnosis of AD. A combination of other neuropsychiatric tests also afforded a very high degree of accuracy. However, utilizing neuropsychiatric symptoms to aid the diagnosis of AD resulted in poor ROC value in the tested data. This indicated that neuropsychiatric symptoms alone could not differentiate AD from norms or other dementia.
In the previous study, both structural T1-weighted MRI brain studies and neuropsychological measures of individuals were used to train and optimize an artificial intelligence classifier to diagnose mild-AD patients (26). Similar to our study, the classifier was able to distinguish between the two groups before AD definite diagnosis using a combination of MRI brain studies

LIMITATIONS AND FUTURE STUDY
An important limitation was that the severity of the AD of our subjects seemed to be low to moderate. Although, the hippocampus alone was rather a good choice for use as a classification parameter, our study failed to demonstrate that it was. The low education of individuals in our study may play some role on brain volume. The second limitation was that this study did not utilize age-matched group due to the limited number of subject recruitments and the small number of subjects for the SVM training and testing groups. Generalization of the results to clinical use should be done with caution. The third limitation was that, we used dementia subjects from the memory clinic together with normal subjects from the community survey or community study, which could lead to selection bias. When applied to the broader general population, our method of using the SVM might not produce the same degree of accuracy. Future studies should include a larger sample size in both the training and testing groups. It is known that the Asian population has a prevalence of small vessel disease in the brain. The burden of cerebral small vessel disease can be included to predict the diagnosis of dementia in the future. Exploration with other biomarkers to predict the diagnosis of Alzheimer's disease or prodromal Alzheimer's disease in a Thai cohort could also be done.

CONCLUSIONS
Based on data drawn from the Memory Clinic at Siriraj Hospital, clinical parameters (including TMSE, COWAanimals, COWA-KSP, LM, clock drawing, and Constructionpraxis) provided good accuracy (83-90%) using SVM as a classifier. COWA-KSP alone might be the easiest tool to utilize in clinical situations, and it had an accuracy of 83.33% when using the SVM. Our study failed to demonstrate a good degree of accuracy when using brain volumetry alone. The most accurate results were found using the hippocampus alone as a classifier that revealed an accuracy of 62.64%.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by IRB Faculty of Medicine Siriraj Hospital. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
VS: research planning, primary investigators of the awarding grants, conducting the field study, data analysis and interpretation of the results, and preparing manuscript. YV and AK: assisted data analysis and interpretation of the results. AR, SC, and NA: conducted field study, assessment of cognition and neuropsychiatricproblems, and preparing the data for statistical analysis. All authors read and approved the final manuscript.