AUTHOR=Ion-Mărgineanu Adrian , Kocevar Gabriel , Stamile Claudio , Sima Diana M. , Durand-Dubief Françoise , Van Huffel Sabine , Sappey-Marinier Dominique TITLE=Machine Learning Approach for Classifying Multiple Sclerosis Courses by Combining Clinical Data with Lesion Loads and Magnetic Resonance Metabolic Features JOURNAL=Frontiers in Neuroscience VOLUME=Volume 11 - 2017 YEAR=2017 URL=https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2017.00398 DOI=10.3389/fnins.2017.00398 ISSN=1662-453X ABSTRACT=Purpose. The purpose of this study is classifying multiple sclerosis (MS) patients in the four clinical forms as de□ned by the McDonald criteria using machine learning algorithms trained on clinical data combined with lesion loads and magnetic resonance metabolic features. Materials and Methods. Eighty-seven MS patients (12 Clinically Isolated Syndrome (CIS), 30 Relapse Remitting (RR), 17 Primary Progressive (PP) and 28 Secondary Progressive (SP)) and eighteen healthy controls were included in this study. Longitudinal data available for each MS patient included clinical (e.g. age, disease duration, Expanded Disability Status Scale), conventional magnetic resonance imaging and spectroscopic imaging. We extract N-acetyl-aspartate (NAA), Choline (Cho), and Creatine (Cre) concentrations, and we compute three features for each spectroscopic grid by averaging metabolite ratios (NAA/Cho, NAA/Cre, Cho/Cre) over good quality voxels. We built linear mixed-e□ects models to test for statistically signi□cant di□erences between MS forms. We test nine binary classi□cation tasks on clinical data, lesion loads, and metabolic features, using a leaveone- patient-out cross-validation method based on 100 random patient-based bootstrap selections. We compute F1-scores and BAR values after tuning Linear Discriminant Analysis (LDA), Support Vector Machines with gaussian kernel (SVM-rbf), and Random Forests. Results. Statistically signi□cant di□erences were found between the disease starting points of each MS form using four di□erent response variables: Lesion Load, NAA/Cre, NAA/Cho, and Cho/Cre ratios. Training SVM-rbf on clinical and lesion loads yields F1-scores of 71-72% for CIS vs. RR and CIS vs. RR+SP, respectively. For RR vs. PP we obtained good classi□cation results (maximum F1- score of 85%) after training LDA on clinical and metabolic features, while for RR vs. SP we obtained slightly higher classi□cation results (maximum F1-score of 87%) after training LDA and SVM-rbf on clinical, lesion loads and metabolic features. Conclusions. Our results suggest that metabolic features are better at di□erentiating between relapsing-remitting and primary progressive forms, while lesion loads are better at di□erentiating between relapsing-remitting and secondary progressive forms. Therefore, combining clinical data with magnetic resonance lesion loads and metabolic features can improve the discrimination between relapsing-remitting and progressive forms.