<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neurol.</journal-id>
<journal-title>Frontiers in Neurology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neurol.</abbrev-journal-title>
<issn pub-type="epub">1664-2295</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fneur.2021.681140</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neurology</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Using Base-ml to Learn Classification of Common Vestibular Disorders on DizzyReg Registry Data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Vivar</surname> <given-names>Gerome</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1220679/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Strobl</surname> <given-names>Ralf</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/502958/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Grill</surname> <given-names>Eva</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/893175/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Navab</surname> <given-names>Nassir</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1309579/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Zwergal</surname> <given-names>Andreas</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/101213/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Ahmadi</surname> <given-names>Seyed-Ahmad</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/842684/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>German Center for Vertigo and Balance Disorders, University Hospital Munich, Ludwig-Maximilians-University</institution>, <addr-line>Munich</addr-line>, <country>Germany</country></aff>
<aff id="aff2"><sup>2</sup><institution>Computer Aided Medical Procedures, Department of Informatics, Technical University Munich</institution>, <addr-line>Munich</addr-line>, <country>Germany</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Biometry and Epidemiology, Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig-Maximilians-University</institution>, <addr-line>Munich</addr-line>, <country>Germany</country></aff>
<aff id="aff4"><sup>4</sup><institution>Department of Neurology, University Hospital Munich, Ludwig-Maximilians-University</institution>, <addr-line>Munich</addr-line>, <country>Germany</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Carey David Balaban, University of Pittsburgh, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Marcos Rossi-Izquierdo, Lucus Augusti University Hospital, Spain; Denise Utsch Gon&#x000E7;alves, Federal University of Minas Gerais, Brazil; Marty Slade, Yale University, United States</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Seyed-Ahmad Ahmadi <email>ahmadi&#x00040;cs.tum.edu</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Neuro-Otology, a section of the journal Frontiers in Neurology</p></fn>
<fn fn-type="other" id="fn002"><p>&#x02020;These authors share senior authorship</p></fn></author-notes>
<pub-date pub-type="epub">
<day>02</day>
<month>08</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>12</volume>
<elocation-id>681140</elocation-id>
<history>
<date date-type="received">
<day>16</day>
<month>03</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>30</day>
<month>06</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2021 Vivar, Strobl, Grill, Navab, Zwergal and Ahmadi.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Vivar, Strobl, Grill, Navab, Zwergal and Ahmadi</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p><bold>Background:</bold> Multivariable analyses (MVA) and machine learning (ML) applied on large datasets may have a high potential to provide clinical decision support in neuro-otology and reveal further avenues for vestibular research. To this end, we build base-ml, a comprehensive MVA/ML software tool, and applied it to three increasingly difficult clinical objectives in differentiation of common vestibular disorders, using data from a large prospective clinical patient registry (DizzyReg).</p>
<p><bold>Methods:</bold> Base-ml features a full MVA/ML pipeline for classification of multimodal patient data, comprising tools for data loading and pre-processing; a stringent scheme for nested and stratified cross-validation including hyper-parameter optimization; a set of 11 classifiers, ranging from commonly used algorithms like logistic regression and random forests, to artificial neural network models, including a graph-based deep learning model which we recently proposed; a multi-faceted evaluation of classification metrics; tools from the domain of &#x0201C;Explainable AI&#x0201D; that illustrate the input distribution and a statistical analysis of the most important features identified by multiple classifiers.</p>
<p><bold>Results:</bold> In the first clinical task, classification of the bilateral vestibular failure (<italic>N</italic> = 66) vs. functional dizziness (<italic>N</italic> = 346) was possible with a classification accuracy ranging up to 92.5% (Random Forest). In the second task, primary functional dizziness (<italic>N</italic> = 151) vs. secondary functional dizziness (following an organic vestibular syndrome) (<italic>N</italic> = 204), was classifiable with an accuracy ranging from 56.5 to 64.2% (k-nearest neighbors/logistic regression). The third task compared four episodic disorders, benign paroxysmal positional vertigo (<italic>N</italic> = 134), vestibular paroxysmia (<italic>N</italic> = 49), Meni&#x000E8;re disease (<italic>N</italic> = 142) and vestibular migraine (<italic>N</italic> = 215). Classification accuracy ranged between 25.9 and 50.4% (Na&#x000EF;ve Bayes/Support Vector Machine). Recent (graph-) deep learning models classified well in all three tasks, but not significantly better than more traditional ML methods. Classifiers reliably identified clinically relevant features as most important toward classification.</p>
<p><bold>Conclusion:</bold> The three clinical tasks yielded classification results that correlate with the clinical intuition regarding the difficulty of diagnosis. It is favorable to apply an array of MVA/ML algorithms rather than a single one, to avoid under-estimation of classification accuracy. Base-ml provides a systematic benchmarking of classifiers, with a standardized output of MVA/ML performance on clinical tasks. To alleviate re-implementation efforts, we provide base-ml as an open-source tool for the community.</p></abstract>
<kwd-group>
<kwd>chronic vestibular disorders</kwd>
<kwd>classification</kwd>
<kwd>machine learning</kwd>
<kwd>multivariable statistics</kwd>
<kwd>clinical decision support (cdss)</kwd>
<kwd>episodic vestibular symptoms</kwd>
</kwd-group>
<contract-num rid="cn001">01 EO 0901 (DSGZ)</contract-num>
<contract-sponsor id="cn001">Bundesministerium f&#x000FC;r Bildung und Forschung<named-content content-type="fundref-id">10.13039/501100002347</named-content></contract-sponsor>
<counts>
<fig-count count="6"/>
<table-count count="4"/>
<equation-count count="2"/>
<ref-count count="70"/>
<page-count count="17"/>
<word-count count="12042"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>Multivariable statistical analysis (MVA), and modern machine learning (ML) methods have the potential to serve as clinical decision support systems (CDSS) (<xref ref-type="bibr" rid="B1">1</xref>&#x02013;<xref ref-type="bibr" rid="B3">3</xref>), including the computer-aided diagnosis (CADx) of vestibular disorders (<xref ref-type="bibr" rid="B4">4</xref>&#x02013;<xref ref-type="bibr" rid="B8">8</xref>). In combination with large datasets and multi-site cohorts, MVA/ML classification algorithms allow for investigating interactions between patient variables, which is why recent works advocate that these methods should be used more widely in neuro-otology and vestibular neuroscience (<xref ref-type="bibr" rid="B9">9</xref>). However, there is a wide variety of MVA/ML methods available, and recent advances in deep learning (DL) with artificial neural networks (ANN) (<xref ref-type="bibr" rid="B10">10</xref>) add to the complexity of the field.</p>
<p>In this work, we followed three clinical three clinical scenarios in the differential diagnosis of vestibular disorders, and defined three respective classification problems with increasing difficulty. We applied a wide variety of MVA/ML/DL methods to investigate the suitability of automated classification for these clinical questions, and to compare the algorithmic outcomes with clinical expert intuition, both from the perspective of supposed task difficulty, and from the perspective of how the algorithms weighted feature importances toward diagnostic classification. For validation, we took advantage of the DizzyReg dataset, a large prospective registry of patients with vestibular disorders (<xref ref-type="bibr" rid="B11">11</xref>). The dataset is multimodal and contains three main categories of variables, namely patient characteristics, symptom characteristics, and quantitative parameters from vestibular function tests.</p>
<p>The first classification problem addresses two groups of patients, suffering either from bilateral damage to peripheral vestibular afferents (i.e., bilateral vestibular failure), or functional dizziness without evidence for relevant structural or functional vestibular deficits. Both syndromes present with the chief complaint of persistent dizziness. However, additional symptom features (e.g., triggers, extent of concomitant anxiety and discomfort) may vary considerably. We expected that machine learning can reliably differentiate both disorders based on patient characteristics (e.g., different age spectra), symptom characteristics, and vestibular function test (e.g., head impulse test or caloric testing).</p>
<p>The second classification task is, whether patients with primary functional dizziness (based on psychological triggers and stressors) can be separated against patients with secondary functional dizziness following a preceding organic vestibular disorder (such as acute unilateral vestibulopathy, or benign paroxysmal positional vertigo) (<xref ref-type="bibr" rid="B8">8</xref>). This setting is more complex, as patient and symptom characteristics may be similar, but the vestibular function tests may differ.</p>
<p>The third problem is directed to the differentiation of four episodic vestibular disorders, namely benign paroxysmal positional vertigo (BPPV), vestibular paroxysmia (VP), Meni&#x000E8;re disease (MD) and vestibular migraine (VM). This multi-class problem is supposed to be the most complex, because the demographic characteristics of patients and the spectrum of symptoms can be diverse and may overlap (e.g., between MD and VM), and vestibular function tests may be normal (e.g., in VP or VM).</p>
<p>To investigate classification on these three clinical objectives, we developed base-ml, a comprehensive test-bench for initial ML experimentation on clinical data. With this tool, we aim to provide clinical experts with a better intuitive feeling for the range of ML outcomes that can be expected on the given data. For better transparency, several methods can and should be investigated at the same time, subject to a comparable data pre-processing and cross-validation strategy. To this end, we compare several linear, non-linear and neural-network based ML algorithms, along with a novel graph deep learning method that we recently proposed (<xref ref-type="bibr" rid="B6">6</xref>, <xref ref-type="bibr" rid="B12">12</xref>, <xref ref-type="bibr" rid="B13">13</xref>). Following insights from multiple classification experiments for diagnostic decision support in our research over the last few years (<xref ref-type="bibr" rid="B4">4</xref>, <xref ref-type="bibr" rid="B6">6</xref>, <xref ref-type="bibr" rid="B13">13</xref>, <xref ref-type="bibr" rid="B14">14</xref>), we also provide a multi-faceted analysis of algorithm outcomes, including an examination of class imbalance, multiple classification metrics, patient feature distributions, and feature importances as rated by the classifiers. To alleviate the implementation burden for multi-algorithm comparison and multivariate evaluation, we provide base-ml as an open-source tool<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> to the vestibular research community, as a starting point for further studies in this direction.</p>
</sec>
<sec sec-type="materials and methods" id="s2">
<title>Materials and Methods</title>
<sec>
<title>DizzyReg Registry and Dataset</title>
<p>The objective of the DizzyReg patient registry is to provide a basis for epidemiological and clinical research on common and rare vertigo syndromes, to examine determinants of functioning and quality of life of patients, to identify candidate patients for future clinical research, to integrate information of the different apparative measurements into one data source, and to help understanding the etiology of the vestibular disorders.</p>
<p>The DizzyReg patient registry is an ongoing prospective clinical patient registry which collects all information currently stored in electronical health records and medical discharge letters to create a comprehensive clinical database of patient characteristics, symptoms, diagnostic procedures, diagnosis, therapy, and outcomes in patients with vertigo or dizziness (<xref ref-type="bibr" rid="B11">11</xref>). Study population includes patients with symptoms of vertigo and dizziness referred to the specialized out-patient center for vertigo and balance disorders. Recruitment into the registry commenced in December 2015 at the German Center for Vertigo and Balance Disorders (DSGZ), Munich University Hospital of the Ludwig-Maximilians-Universit&#x000E4;t. Inclusion criteria into the registry are symptoms of vertigo and dizziness, age 18 years and above, signed informed consent and sufficient knowledge of German.</p>
<p>Questionnaires were issued on first day of presentation to the study center to assess lifestyle and sociodemographic factors as well as self-reported perception of vertigo symptoms, attack duration and the time since first occurrence. Lifestyle and sociodemographic factors assessed using questionnaires include age, gender, education, physical activity, alcohol, smoking, sleep quality. The type of symptoms of patients included: vertigo, dizziness, postural instability, problems while walking, blurred vision, double vision, impaired vision, nausea, vomiting. Concomitant ontological or neurological symptom are documented with a focus on otological symptoms, i.e., hearing loss, tinnitus, aural fullness, pressure, hyperakusis, and neurological symptoms, i.e., headache, type of headache, photo-/phonophobia, double vision, other symptoms (ataxia, sensory loss, paresis, aphasia).</p>
<p>The evolution of symptoms was reconstructed by the frequency and duration of attacks. All aspects of history taking in the DizzyReg follow established concepts such as &#x0201C;So stoned&#x0201D; (<xref ref-type="bibr" rid="B15">15</xref>), the &#x0201C;Five Keys&#x0201D; (<xref ref-type="bibr" rid="B16">16</xref>) and the &#x0201C;Eight questions&#x0201D; (<xref ref-type="bibr" rid="B17">17</xref>). Frequency or time of onset of symptoms was included as a categorial variable with the following categories: &#x0201C;less than 3 month,&#x0201D; &#x0201C;3 months to 2 years,&#x0201D; &#x0201C;more than 2 years,&#x0201D; &#x0201C;more than 5 years,&#x0201D; and &#x0201C;more than 10 years.&#x0201D; The duration of symptoms is registered in the categories &#x0201C;seconds to minutes,&#x0201D; &#x0201C;minutes to hours,&#x0201D; &#x0201C;hours to days,&#x0201D; &#x0201C;days to weeks,&#x0201D; &#x0201C;weeks to months,&#x0201D; &#x0201C;continuous.&#x0201D;</p>
<p>The registry further collects information on symptoms, quality of life (EQ5D) and functioning (DHI and VAP) in a few standardized questionnaires. Information on triggers is gathered by the respective categories of the Dizziness Handicap Inventory and by elements of the Vertigo Activity and Participation Questionnaire (VAP) (e.g., head movement, position change, physical activity etc).</p>
<sec>
<title>DHI</title>
<p>The Dizziness Handicap Inventory (DHI) is a well-known and widely used measure to assess self-perceived limitations posed by vertigo and dizziness (<xref ref-type="bibr" rid="B18">18</xref>). A total of 25 questions are used to evaluate functional, physical, and emotional aspects of disability. Total score ranging from 0 to 100 is derived from the sum of responses (0 = No, 2 = sometimes, 4 = Yes).</p>
</sec>
<sec>
<title>Quality of Life</title>
<p>Health-related quality of life was assessed with the generic EuroQol five-dimensional questionnaire (EQ-5D-3L). This is subdivided into five health state dimensions namely mobility, self-care, usual activities, pain/discomfort, and anxiety/depression, with each dimension assessed in three levels: no problem, some problem, extreme problems. These health states were converted into EQ5D scores using the German time trade-off scoring algorithm (<xref ref-type="bibr" rid="B19">19</xref>). The resulting total EQ5D score ranges from 0 to 1 with higher scores indicating better quality of life.</p>
</sec>
<sec>
<title>Vertigo Activity and Participation Questionnaire (VAP)</title>
<p>Functioning and participation were assessed based on the Vertigo Activity and Participation Questionnaire (VAP). The VAP is specifically designed for persons with Vertigo and Dizziness and can be used for people of different countries (<xref ref-type="bibr" rid="B20">20</xref>&#x02013;<xref ref-type="bibr" rid="B22">22</xref>). The VAP measures functioning and participation in two scales consisting of six items each. Using weights derived from Rasch analysis the first scale has a range of 0&#x02013;23 points and the second of 0&#x02013;20 points with higher scores indicating more restrictions.</p>
<p>Data protection clearance and institutional review board approval has been obtained (Nr. 414-15).</p>
</sec>
</sec>
<sec>
<title>Classification Tasks and Cohorts</title>
<p>As mentioned in the introduction, three classification problems with increasing complexity were tested: (1) bilateral vestibular failure vs. functional dizziness; (2) primary vs. secondary functional dizziness; (3) BPPV vs. VP vs. MD vs. VM. <xref ref-type="table" rid="T1">Table 1</xref> provides information about the group cohorts for each task.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Clinical tasks with respective classes of chronic/episodic vestibular disorders, and respective cohort details.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th/>
<th valign="top" align="left"><bold>Diagnosis abbreviation</bold></th>
<th valign="top" align="center"><bold><italic>N</italic></bold></th>
<th valign="top" align="center"><bold>Age mean (s.d.)</bold></th>
<th valign="top" align="center"><bold>EQ5D</bold></th>
<th valign="top" align="center"><bold>DHI</bold></th>
<th valign="top" align="center"><bold>Female/Male</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="7"><bold>Task 1</bold></td>
</tr>
<tr>
<td valign="top" align="left">Bilateral vestibular failure</td>
<td valign="top" align="left">BVF</td>
<td valign="top" align="center">66</td>
<td valign="top" align="center">65.0 (17.0)</td>
<td valign="top" align="center">0.8 (0.2)</td>
<td valign="top" align="center">46.2 (22.6)</td>
<td valign="top" align="center">27/39</td>
</tr>
<tr>
<td valign="top" align="left">Functional dizziness</td>
<td valign="top" align="left">FD</td>
<td valign="top" align="center">346</td>
<td valign="top" align="center">47.2 (14.5)</td>
<td valign="top" align="center">0.8 (0.2)</td>
<td valign="top" align="center">43.3 (18.4)</td>
<td valign="top" align="center">178/168</td>
</tr>
<tr>
<td valign="top" align="left" colspan="7"><bold>Task 2</bold></td>
</tr>
<tr>
<td valign="top" align="left">Functional dizziness (Secondary)</td>
<td valign="top" align="left">FDS</td>
<td valign="top" align="center">204</td>
<td valign="top" align="center">52.1 (14.7)</td>
<td valign="top" align="center">0.8 (0.2)</td>
<td valign="top" align="center">48.0 (18.8)</td>
<td valign="top" align="center">130/74</td>
</tr>
<tr>
<td valign="top" align="left">Functional dizziness (Primary)</td>
<td valign="top" align="left">FDP</td>
<td valign="top" align="center">151</td>
<td valign="top" align="center">45.4 (14.6)</td>
<td valign="top" align="center">0.8 (0.2)</td>
<td valign="top" align="center">42.6 (17.6)</td>
<td valign="top" align="center">77/74</td>
</tr>
<tr>
<td valign="top" align="left" colspan="7"><bold>Task 3</bold></td>
</tr>
<tr>
<td valign="top" align="left">Benign Parox. Pos. Vertigo</td>
<td valign="top" align="left">BPPV</td>
<td valign="top" align="center">134</td>
<td valign="top" align="center">57.0 (12.1)</td>
<td valign="top" align="center">0.8 (0.2)</td>
<td valign="top" align="center">45.0 (19.6)</td>
<td valign="top" align="center">88/46</td>
</tr>
<tr>
<td valign="top" align="left">Meni&#x000E8;re disease</td>
<td valign="top" align="left">MM</td>
<td valign="top" align="center">142</td>
<td valign="top" align="center">53.4 (13.3)</td>
<td valign="top" align="center">0.9 (0.2)</td>
<td valign="top" align="center">43.9 (19.8)</td>
<td valign="top" align="center">78/64</td>
</tr>
<tr>
<td valign="top" align="left">Vestibular migraine</td>
<td valign="top" align="left">VM</td>
<td valign="top" align="center">215</td>
<td valign="top" align="center">44.5 (14.0)</td>
<td valign="top" align="center">0.8 (0.2)</td>
<td valign="top" align="center">41.8 (18.6)</td>
<td valign="top" align="center">145/70</td>
</tr>
<tr>
<td valign="top" align="left">Vestibular paroxysmia</td>
<td valign="top" align="left">VP</td>
<td valign="top" align="center">49</td>
<td valign="top" align="center">51.6 (14.2)</td>
<td valign="top" align="center">0.9 (0.2)</td>
<td valign="top" align="center">38.8 (22.5)</td>
<td valign="top" align="center">20/29</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Classification Pipeline</title>
<p>A typical machine learning pipeline comprises several steps that interplay toward a high-accuracy prediction (<xref ref-type="bibr" rid="B23">23</xref>). After data import, a set of pre-processing routines are applied to patient features, before data is split into several folds for training and testing, using one or several classification algorithms. The classifier performance is evaluated using several quantitative metrics, and finally presented and explained to a clinical expert on vestibular disorders, for a critical review. <xref ref-type="fig" rid="F1">Figure 1</xref> presents an overview of our methodological pipeline in this work.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Components and methods of the classification workflow applied to vestibular data in DizzyReg. Raw tabular data is pre-processed and split into 10-folds for stratified cross-validation and for estimation of prospective classification performance. Various linear, non-linear and neural classifiers are repeatedly trained on all folds in the data, the evaluation is performed with various classification metrics. The metrics, along with model explanations, are presented to experts in form of a report panel, who can review the classification outcome and model performance. All pipeline components are implemented in base-ml, a comprehensive software tool which we provide open-source to the vestibular community as a starting point for similar studies. Implemented in Python, and centered around scikit-learn, it comprises various modules for data science, machine learning, descriptive statistics, explainable AI and visualization. Details on base-ml are described in section Base-ml Framework.</p></caption>
<graphic xlink:href="fneur-12-681140-g0001.tif"/>
</fig>
<sec>
<title>Pre-processing</title>
<p>Multimodal medical datasets commonly pose several challenges for CADx algorithms, including noisy or missing patient features with spurious outliers (<xref ref-type="bibr" rid="B24">24</xref>&#x02013;<xref ref-type="bibr" rid="B26">26</xref>), a mixture of categorical and continuous variables (<xref ref-type="bibr" rid="B27">27</xref>), and different statistical distribution of variables (<xref ref-type="bibr" rid="B23">23</xref>). To account for outliers and different data ranges in DizzyReg variables with continuous distributions, we perform a 90% winsorization which sets extreme values to the 5th and 95th percentiles, before applying a z-transformation (<xref ref-type="bibr" rid="B27">27</xref>) which normalizes all variables into a comparable zero-mean and unit-variance data range. Categorical variables are binarized where possible, or represented in form of a one-hot encoding (a.k.a. one-of-K encoding), which creates a binary column for each category and sparsely represents the categories with a value of 1 in the respective column and 0 in all the other columns. To account for missing values, we perform a mean-imputation (<xref ref-type="bibr" rid="B24">24</xref>) if &#x0003C;50% of values are missing in the population, otherwise the feature is omitted from the patient representation.</p>
</sec>
<sec>
<title>Data Splitting</title>
<p>In predictive statistics, in particular in the machine learning community, it is common to assess the prediction performance via hold-out test datasets, which are often randomly sampled and kept separate from the training dataset until the time of pseudo-prospective evaluation (<xref ref-type="bibr" rid="B27">27</xref>). Sampling a single test set could result in a biased selection and thus in an overly optimistic or pessimistic test evaluation. To avoid this, it is recommendable to evaluate with multiple test sets, which are sampled either through random shuffling, or through a k-fold splitting. Following common recommendations, we set k to 10 in this work (<xref ref-type="bibr" rid="B28">28</xref>). This yields exactly one prediction for each subject in DizzyReg, and exactly ten estimates for the prospective classification performance of each classifier. As recommended by Kohavi in (<xref ref-type="bibr" rid="B29">29</xref>), we additionally apply a stratified cross-validation to make sure that each fold has approximately the same percentage of subjects from each class, which is important especially in the case of class imbalance in the dataset. To ensure that individual classifiers are being trained in a suitable parametrization, we additionally perform hyper-parameter optimization using random search, in a nested cross-validation setup (for details, see section <xref ref-type="app" rid="A3">Appendix C</xref>).</p>
</sec>
<sec>
<title>Classification Algorithms and Metrics</title>
<p>Intuitively, ML classifiers try to assign class labels to samples (e.g., patients, represented as multivariable numerical vectors), by fitting separation boundaries between classes in high-dimensional space. Mathematically, these boundaries are expressed in form of a classification function &#x0003D; <italic>f</italic>(<italic>x</italic>), which separate the statistical distributions of classes <italic>C</italic> in the input space <italic>X</italic>. The past decades of ML research have yielded a diverse set of mathematical models for separation boundaries, and algorithms to fit them to a set of training data <italic>X</italic>, including linear regression boundaries, rule-based, instance-based, tree-based, kernel-based or Bayesian methods (<xref ref-type="bibr" rid="B23">23</xref>), as well as the recent renaissance of artificial neural networks and deep learning (<xref ref-type="bibr" rid="B10">10</xref>). Importantly, no single method is guaranteed to perform best on all datasets (<xref ref-type="bibr" rid="B30">30</xref>), which is why it is recommendable to test multiple algorithms and let their performances be compared and critically reviewed by a domain expert, instead of deciding on a single algorithm a priori. Therefore, as described in the introduction, we compare several linear, non-linear and neural-network based ML algorithms, along with a novel graph deep learning method that we recently proposed (<xref ref-type="bibr" rid="B6">6</xref>, <xref ref-type="bibr" rid="B12">12</xref>, <xref ref-type="bibr" rid="B13">13</xref>). Details on all classifier models and their parametrization are given in section Overview of Selected Classification Algorithms. We quantitatively evaluate the classification performance with three metrics: area-under-the-curve of a receiver-operating-characteristic (ROC-AUC), as well as accuracy and f1-score, defined as (TP/TN/FP/FN denote true or false positives or negatives):</p>
<disp-formula id="E1"><mml:math id="M1"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mtext>Accuracy</mml:mtext></mml:mtd><mml:mtd><mml:mo>=</mml:mo></mml:mtd><mml:mtd><mml:mtext>&#x000A0;</mml:mtext><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>T</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mo>;</mml:mo><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;f</mml:mtext><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mtext>score</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:mtext>&#x000A0;</mml:mtext><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:mfrac><mml:mo>;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi></mml:mtd><mml:mtd><mml:mo>=</mml:mo></mml:mtd><mml:mtd><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:mfrac><mml:mo>;</mml:mo><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
</sec>
<sec>
<title>Model Explanation</title>
<p>A necessary tradeoff in predictive statistics and ML is to choose between model accuracy and model interpretability (<xref ref-type="bibr" rid="B31">31</xref>). While linear methods like logistic regression are typically more interpretable, non-linear models, depending on their complexity, are often compared to black boxes. By now, however, &#x0201C;Explainable AI&#x0201D; is a dedicated branch in ML research, and numerous model-specific and model-agnostic methods are available that can partially explain ML prediction outcomes (<xref ref-type="bibr" rid="B32">32</xref>). Two common ways to explain model performance is to analyze the distribution of input samples (<xref ref-type="bibr" rid="B4">4</xref>, <xref ref-type="bibr" rid="B33">33</xref>), and to analyze feature importance (<xref ref-type="bibr" rid="B34">34</xref>), especially in a clinical setting (<xref ref-type="bibr" rid="B35">35</xref>).</p>
<p>First, we perform a non-linear mapping of the <italic>d</italic>-dimensional input distribution after pre-processing onto the 2D plane, and we visualize whether class distributions were already visible in the input data, or whether the input data distribution has unexpected or undesired properties, a technique which has been elucidating in our research before, e.g., in the mapping of posturography data (<xref ref-type="bibr" rid="B4">4</xref>). To this end, we utilize &#x0201C;Uniform Manifold Approximation and Projection&#x0201D; (UMAP) (<xref ref-type="bibr" rid="B33">33</xref>), a topology-preserving manifold learning technique for visualization and general non-linear dimensionality reduction.</p>
<p>Second, we analyze which patient features contributed to classification outcomes the most, which is a clinically interesting aspect of classifiers. We obtain the &#x0201C;feature importances&#x0201D; for non-ANN-based models and &#x0201C;feature attributions&#x0201D; for ANN-based models. For linear classifiers (see section Linear Classifiers), these can be obtained through the model coefficients (<xref ref-type="bibr" rid="B27">27</xref>). For non-linear classifiers (see section Non-linear Classifiers), such as tree-based models, we obtain their feature importance using the Gini-impurity criterion (<xref ref-type="bibr" rid="B36">36</xref>). For neural-network based models such as MLP and MGMC (see section Neural Network and Deep Learning Classifiers), we use the Integrated Gradients algorithm (<xref ref-type="bibr" rid="B37">37</xref>) and calculate the feature importance by taking the feature attributions of every sample in the training dataset toward their respective ground truth class labels. Obviously, not every classification algorithm yields the same ranking for feature importances. It is argued that a combination of several feature importance rankings can provide more reliable and trustworthy (<xref ref-type="bibr" rid="B34">34</xref>). Therefore, for our report to the expert, we aim at presenting a single table with the top 10 most important features for the given classification problem. To merge the feature importance rankings of the different classifiers into a single list, we propose and apply a heuristic for Relative Aggregation of Feature Importance (RAFI), which comprises the following three steps. First, we take the absolute values of all feature importances, to account for algorithms with negative weights (e.g., negative coefficients in linear regression). Second, we normalize the range of importance scores across different classifiers, by computing the percentual importance. Third, we aggregate all normalized global importances by summation, and report the top 10 most important features across all classifiers to the experts for review. In detail, for each feature &#x003C6;<sub><italic>i</italic></sub> (<italic>i &#x003F5;</italic> [1, &#x02026;, <italic>d</italic>]), and across <italic>F</italic> different classifiers, each with feature importances <italic>I</italic><sub><italic>j</italic></sub>(&#x003C6;<sub><italic>i</italic></sub>) (<italic>j &#x003F5;</italic> [1, &#x02026;, <italic>F</italic>]), we calculate the global feature importance <italic>I</italic><sub>0</sub>(&#x003C6;<sub><italic>i</italic></sub>) as follows:</p>
<disp-formula id="E2"><mml:math id="M2"><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>&#x003C6;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>F</mml:mi></mml:munderover><mml:mrow><mml:mfrac><mml:mrow><mml:mi>a</mml:mi><mml:mi>b</mml:mi><mml:mi>s</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>&#x003C6;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x000A0;</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:munderover><mml:mstyle mathsize='140%' displaystyle='true'><mml:mo>&#x02211;</mml:mo></mml:mstyle><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>d</mml:mi></mml:munderover ><mml:mi>a</mml:mi><mml:mi>b</mml:mi><mml:mi>s</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>&#x003C6;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x000A0;</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x000A0;</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
</sec>
</sec>
<sec>
<title>Overview of Selected Classification Algorithms</title>
<p>In this work, we apply and compare the outcomes for a total of 11 classification methods, which we chose to represent a wide range of algorithmic approaches. This collection is larger than what is typically encountered in CDSS research, as mentioned, to provide the expert with a better intuitive feeling for the range of outcomes that can be expected on the given data. The algorithms are grouped into three general categories: linear, non-linear, and ANN-based classifiers. Since explaining the inner workings of all methods in detail is out of scope for this work, each algorithm will be outlined only briefly in the following, with its most important parametrizations (if any), and a reference to explanatory material for the interested reader.</p>
<sec>
<title>Linear Classifiers</title>
<p>As linear classifiers, we apply <italic>Linear Discriminant Analysis (LDA), Logistic Regression (LR)</italic> and <italic>Support Vector Classifiers (SVC)</italic>. All three methods try to fit a set of linear hyperplanes between the <italic>d</italic>-dimensional distributions of the classes. <italic>LDA</italic> [(<xref ref-type="bibr" rid="B19">19</xref>), chapter 4.3] models the distribution for each class with a Gaussian and calculates the probability of belonging to a class as the maximum posterior probability in a Bayesian manner. We apply LDA in a default parametrization, without additional regularizations such as shrinkage. <italic>LR</italic> [(<xref ref-type="bibr" rid="B19">19</xref>), chapter 4.4] directly learns the posterior distribution of the target class and models it using a sigmoid-activated linear function. We apply LR with simple L2 regularization to avoid overfitting the parameters of the model on the training set. <italic>SVC</italic> (<xref ref-type="bibr" rid="B38">38</xref>) is a support-vector machine (SVM) with a linear kernel, which learns a hyperplane that maximizes the gap between the classes, giving slack to key samples (&#x0201C;support vectors&#x0201D;) to account for class overlap in the joint distribution. To avoid overfitting, we apply a standard squared l2 penalty term using a regularization parameter of 0.25.</p>
</sec>
<sec>
<title>Non-linear Classifiers</title>
<sec>
<title>Gaussian Na&#x000EF;ve Bayes (GNB)</title>
<p>GNB [(<xref ref-type="bibr" rid="B19">19</xref>), chapter 6.6.3] is a variant of Na&#x000EF;ve Bayes (NB) that allows continuous input features, under the assumption of Gaussian distribution and mutual independence. Class posterior probabilities for new samples are calculated using Bayes Rule. We parametrize <italic>GNB</italic> to estimate class prior probabilities directly from training data, rather than imposing them a-priori.</p>
</sec>
<sec>
<title>Gaussian Process Classifier (GP)</title>
<p>GP (<xref ref-type="bibr" rid="B39">39</xref>) are a Bayesian alternative to kernel methods like non-linear SVMs. In classification, it models and approximates the class posterior probability as a Gaussian distribution. We set the initial kernel used for GP fitting to a zero-mean, unit-variance radial basis function (RBF), which is then refined during the fitting to training data.</p>
</sec>
<sec>
<title>K-Nearest Neighbors Classifier (KNN)</title>
<p>KNN [(<xref ref-type="bibr" rid="B19">19</xref>), chapter 2.3.2] classification is an instance-based method, where a sample&#x00027;s class is determined by the majority class label vote of the sample&#x00027;s k-nearest neighbors. We compute similarity as Euclidean distance between two patients&#x00027; feature vectors, and we use 10 nearest neighbors in the training set to predict the class label of a test input.</p>
</sec>
<sec>
<title>Decision Tree Classifier (DT)</title>
<p>DT (<xref ref-type="bibr" rid="B36">36</xref>) are a form of rule-based classifiers. A tree represents a hierarchical set of rules or decisions, each decision splitting the feature space in a single feature dimension, using an optimal splitting threshold which is calculated using information-theoretic criteria. Each new sample is passed down the tree, following splitting rules, until a leaf is hit in which a class distribution and majority class is stored. In this work, we use trees with Gini impurity as the splitting criterion, and we allow trees to expand up to a maximum depth of five.</p>
</sec>
<sec>
<title>Random Forest Classifier (RF)</title>
<p>RF (<xref ref-type="bibr" rid="B40">40</xref>) are an ensemble of multiple decision trees, where each tree is trained using a random subset of training data and a random subset of features. Due to the randomization, the individual trees are highly uncorrelated. Therefore, the ensemble output, which is calculated as an average vote from all trees, weighted by their confidences, is highly robust against various data challenges, such as high dimensional input spaces, noisy data, or highly different data distributions across variables. In this work, we use an ensemble of 10 trees, each with a maximum depth of 5 decision levels.</p>
</sec>
<sec>
<title>Adaptive Boosting Classifier (AB)</title>
<p>AB (<xref ref-type="bibr" rid="B41">41</xref>), similar to RF, is another ensemble method that combines multiple &#x0201C;weak&#x0201D; classifiers in order to form a much &#x0201C;stronger&#x0201D; classifier. A key difference is the boosting mechanism, i.e., the ensemble is allowed to iteratively add new weak classifiers, which are trained with a higher weight on those input instances that are still being misclassified. In this work, we use decision stubs (i.e., decision trees with a depth of (1) as the weak base classifiers, and we allow the maximum number of classifiers to reach up to 50.</p>
</sec>
</sec>
<sec>
<title>Neural Network and Deep Learning Classifiers</title>
<sec>
<title>Multi-Layer Perceptron (MLP)</title>
<p>MLP [(<xref ref-type="bibr" rid="B19">19</xref>), chapter 11] consider input features as activated neurons followed by one or several fully connected layers (so-called hidden layers) of artificial neurons which weight and sum incoming neuronal connections, before applying a non-linear activation function. The network weights are estimated using the backpropagation algorithm. In this work, we parametrized an ANN with two hidden layers (64 and 32 neurons), and protect every layer against overfitting, as is commonly achieved by applying dropout (<italic>p</italic> = 0.3) (<xref ref-type="bibr" rid="B42">42</xref>), followed by batch normalization (<xref ref-type="bibr" rid="B43">43</xref>).</p>
</sec>
<sec>
<title>Multi-Graph Geometric Matrix Completion (MGMC)</title>
<p>MGMC (<xref ref-type="bibr" rid="B13">13</xref>) is a graph-based neural network (GNN) model which we proposed recently, as an extension to our previously published geometric matrix completion approach for multimodal CADx (<xref ref-type="bibr" rid="B12">12</xref>). It models the classification problem as a transductive geometric matrix completion problem. Importantly, MGMC is designed to deal with the common problem of missing values in large medical datasets (<xref ref-type="bibr" rid="B25">25</xref>), by simultaneously learning an optimal imputation of missing values, along with the optimal classification of patients. MGMC models the patients as nodes in a graph, and computes the edges in the graph through a similarity metric between patients. The similarity is based on a few meta-features (e.g., sex, age, genetic markers etc.), which allows MGMC to span a graph between patients akin to a social network. In previous works, GNNs have shown promising results and a complementary approach in the field of CADx. In this work, we compute multiple patient graphs, each based on similarity measures of a single meta-feature, namely gender (same gender), age (age difference &#x000B1; 6 years), EQ5D score (score difference of &#x000B1; 0.06), and DHI score (score difference of &#x000B1; 11). As advanced model parameters, we use five timesteps for the recurrent graph convolutional network, Chebyshev Polynomials of order five, and a single hidden layer before the output (16, 32, or 64 neurons, depending on the classification task).</p>
</sec>
</sec>
</sec>
<sec>
<title>Statistical Methods</title>
<p>The most important features detected by RAFI (cf. section Classification Pipeline) are presented for expert review and interpretation. Each of these features is compared across patient classes via hypothesis tests, to provide a first glance whether there are significant differences across groups. For continuous variables, and in the case of two classes, we first test each variable for normal distribution in each of the patient group with a Shapiro-Wilk test (<xref ref-type="bibr" rid="B44">44</xref>). If so, we apply an unpaired two-tailed <italic>t</italic>-test (<xref ref-type="bibr" rid="B27">27</xref>), if not, we apply a Mann-Whitney U test (<xref ref-type="bibr" rid="B45">45</xref>). For more than two classes, we apply a one-way ANOVA test (<xref ref-type="bibr" rid="B27">27</xref>), or a Kruskal-Wallis (<xref ref-type="bibr" rid="B46">46</xref>) as an alternative for non-parametric testing, and report the group-level <italic>p</italic>-value. For categorical values, we apply a Chi-squared independence test (<xref ref-type="bibr" rid="B47">47</xref>). We report <italic>p</italic>-values for hypothesis tests on all variables, and assume significance at an alpha-level of <italic>p</italic> &#x0003C; 0.05.</p>
</sec>
<sec>
<title>Base-ml Framework</title>
<p>As described in the previous sections Classification Pipeline-Statistical Methods numerous methods are necessary to imple1ment a full data science and machine learning pipeline, for a multimodal clinical problem like vestibular classification, and in a multi-site dataset like DizzyReg. Naturally, re-implementing this stack of methods is a time-consuming effort, which should ideally be avoided across research groups. To alleviate future classification experiments similar to this work, and to provide the community with a starting point, we developed base-ml, an open-source Python package<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> provided by the German Center of Vertigo and Balance Disorders. The package can enable a rapid evaluation of machine learning models for prototyping or research. As illustrated in <xref ref-type="fig" rid="F1">Figure 1</xref> (lower panel), it is built around scikit-learn (<xref ref-type="bibr" rid="B48">48</xref>) as a backbone, which is a reference toolkit for state-of-the-art machine learning and datascience. We complement scikit-learn with various Python modules: <italic>pandas</italic> (<xref ref-type="bibr" rid="B49">49</xref>) for data IO and analysis; scipy and numpy (<xref ref-type="bibr" rid="B50">50</xref>) for fast linear algebra on array-shaped data; <italic>PyTorch</italic> (<xref ref-type="bibr" rid="B51">51</xref>) for implementation of ANNs and more advanced deep learning models like MGMC; <italic>skorch</italic><xref ref-type="fn" rid="fn0002"><sup>2</sup></xref> for integration of PyTorch models into the scikit-learn ecosystem; the Captum<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref> library for model interpretability and understanding, which we use for calculation of feature importance in ANNs using Integrated Gradients (<xref ref-type="bibr" rid="B37">37</xref>); UMAP (<xref ref-type="bibr" rid="B33">33</xref>) for non-linear 2D mapping and visualization of the patients&#x00027; input distribution; statsmodels (<xref ref-type="bibr" rid="B52">52</xref>) and pingouin (<xref ref-type="bibr" rid="B53">53</xref>), two Python libraries for descriptive statistics and hypothesis testing; and matplotlib for plotting and scientific visualization. Importantly, using skorch, we enable potential adopters of base-ml to integrate both inductive and transductive neural training workflows and even deep learning models into a comparative benchmark with more traditional ML methods. Skorch combines the ease of use of scikit-learn training workflows and PyTorch&#x00027;s GPU-enabled neural network models. In addition, with base-ml, one can easily evaluate graph-based neural network models.</p>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<p>The following sections reproduce the classification reports produced by base-ml on the three clinical tasks described in the introduction. It is important to note that base-ml is not restricted to vestibular classification scenarios. As a sanity check for base-ml, regarding classification outcomes, and comparability to baseline results in literature, we perform two additional experiments. Those two base-ml experiments are performed on non-vestibular datasets, i.e., one artificially generated dataset, and one Alzheimer&#x00027;s disease classification dataset, which has been widely studied in literature. To keep the main body of this manuscript dedicated to vestibular analysis, we report on non-vestibular results in the <xref ref-type="app" rid="A1">Appendix</xref>.</p>
<sec>
<title>Results on Task 1 (Bilateral Vestibular Failure vs. Functional Dizziness)</title>
<p>The results panel for this classification task, as produced by the base-ml framework, is visible in <xref ref-type="fig" rid="F2">Figure 2</xref>. The boxplots with metrics illustrate a wide range of classification performances for all classifiers, with an accuracy over the 10 folds between 78.7% &#x000B1; 6.4% (AdaBoost) and 93.0% &#x000B1; 3.5% (RF), an f1-score between 0.683 &#x000B1; 0.144 (DecisionTree) and 0.848 &#x000B1; 0.091 (GaussianProcess), and an average ROC-AUC between 0.727 &#x000B1; 0.145 (DecisionTree) and 0.937 &#x000B1; 0.050 (GaussianProcess), followed closely by a ROC-AUC of 0.921 &#x000B1; 0.056 (RF). Quantitatively, Gaussian Process classifiers are the top-performing model on this task, and slightly outperform the best-performing neural network model MGMC (mean accuracy/f1-score/ROC-AUC: 90.8%/0.782/0.893). In fact, on this task, even one of the best linear models, LR, performs better than MGMC and almost as good as RF (mean accuracy/f1-score/ROC-AUC: 91.3%/0.831/0.917). The confusion matrices reveal that the group with functional dizziness was detected with a very high sensitivity between 95% (LR) and 98% (MGMC/RF), compared to a much lower sensitivity between 53% (MGMC) and 71% (LR) for patients with bilateral vestibular failure. Notably, hyper-parameter optimization had a positive effect on the outcomes of Task 1, and the average accuracy of all classifiers increased from 87.0 to 89.6% after parameter tuning.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Results panel produced by base-ml for Task 1. It comprises: boxplots for the three classification metrics, accuracy, f1-score and ROC-AUC; a pie chart to highlight potential class imbalances for this task; the UMAP 2D embedding of all patients&#x00027; input feature vectors; and a more detailed overview of classification outcomes in form of confusion matrices, for three classifiers, LR, MGMC and RF.</p></caption>
<graphic xlink:href="fneur-12-681140-g0002.tif"/>
</fig>
<p>Regarding class imbalance, which is important to consider in context with classification performance, the pie chart (cf. <xref ref-type="fig" rid="F2">Figure 2</xref>, bottom left) shows that BVF is strongly under-represented in this DizzyReg subset, at 66 vs. 346 patient samples (16.0% of patients). Finally, the UMAP embedding shows that the FV subjects (colored in yellow) are already clustered and topologically separated from the BVF subjects (colored in purple) at the level of normalized input data. This underlines that the patients have clearly separate characteristics at a feature level, and classifiers have a good chance at fitting decision boundaries between the two groups. The UMAP plot reveals another interesting point, namely that the input data is clearly separated into two clusters, the implications of which are discussed below.</p>
<p>The base-ml output also produces <xref ref-type="table" rid="T2">Table 2</xref>, with feature importance scores aggregated with the RAFI heuristic (cf. section Classification Pipeline). Among the top ten features, six features are related to (Video-) Head Impulse Testing (HIT/vHIT; HIT left/right abnormal, vHIT normal result, vHIT gain left/right) or caloric testing, all of which are also statistically significantly different between the two groups at a level of <italic>p</italic> &#x0003C; 0.001. The most important feature is patient age, also with a significantly different expression between the two groups (63.8 &#x000B1; 15.6 vs. 47.3 &#x000B1; 14.1 years, <italic>p</italic> &#x0003C; 0.0001). The remaining three features are related to subjective judgement of disability by patients, namely the depression score in EQ5D (<italic>p</italic> &#x0003C; 0.001), a perceived handicap in DHI (<italic>p</italic> &#x0003C; 0.01), and the actual perceived health condition (<italic>p</italic> = 0.133).</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Top 10 most important features in Task 1, aggregated over multiple classifiers.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Rank</bold></th>
<th valign="top" align="left"><bold>Feature</bold></th>
<th valign="top" align="left"><bold>Feature Type</bold></th>
<th valign="top" align="center"><bold>Bilateral vestibular failure</bold></th>
<th valign="top" align="center"><bold>Functional dizziness</bold></th>
<th valign="top" align="center"><bold><italic>P</italic>-Value</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="left">Age (yrs)</td>
<td valign="top" align="left">Questionnaire</td>
<td valign="top" align="center">63.83 &#x000B1; 15.64</td>
<td valign="top" align="center">47.33 &#x000B1; 14.12</td>
<td valign="top" align="center">&#x0003C;0.0001</td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="left">HIT: right, abnormal</td>
<td valign="top" align="left">Neurological investigation P1</td>
<td valign="top" align="center">77.40%</td>
<td valign="top" align="center">3.40%</td>
<td valign="top" align="center">&#x0003C;0.0001</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="left">HIT: left, abnormal</td>
<td valign="top" align="left">Neurological investigation P1</td>
<td valign="top" align="center">77.40%</td>
<td valign="top" align="center">2.30%</td>
<td valign="top" align="center">&#x0003C;0.0001</td>
</tr>
<tr>
<td valign="top" align="left">4</td>
<td valign="top" align="left">vHIT: normal result</td>
<td valign="top" align="left">Apparative tests</td>
<td valign="top" align="center">14.30%</td>
<td valign="top" align="center">92.20%</td>
<td valign="top" align="center">&#x0003C;0.0001</td>
</tr>
<tr>
<td valign="top" align="left">5</td>
<td valign="top" align="left">vHIT: gain left</td>
<td valign="top" align="left">Apparative tests</td>
<td valign="top" align="center">0.8 &#x000B1; 0.04</td>
<td valign="top" align="center">0.97 &#x000B1; 0.12</td>
<td valign="top" align="center">&#x0003C;0.0001</td>
</tr>
<tr>
<td valign="top" align="left">6</td>
<td valign="top" align="left">EQ5D: fear, depression</td>
<td valign="top" align="left">Questionnaire</td>
<td valign="top" align="center">28.60%</td>
<td valign="top" align="center">66.40%</td>
<td valign="top" align="center">&#x0003C;0.0001</td>
</tr>
<tr>
<td valign="top" align="left">7</td>
<td valign="top" align="left">Caloric: normal result</td>
<td valign="top" align="left">Apparative tests</td>
<td valign="top" align="center">31.90%</td>
<td valign="top" align="center">91.80%</td>
<td valign="top" align="center">&#x0003C;0.0001</td>
</tr>
<tr>
<td valign="top" align="left">8</td>
<td valign="top" align="left">vHIT: gain right</td>
<td valign="top" align="left">Apparative tests</td>
<td valign="top" align="center">0.71 &#x000B1; 0.09</td>
<td valign="top" align="center">0.92 &#x000B1; 0.15</td>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">9</td>
<td valign="top" align="left">DHI: Q21, perceived handicap</td>
<td valign="top" align="left">DHI</td>
<td valign="top" align="center">81.20%</td>
<td valign="top" align="center">92.60%</td>
<td valign="top" align="center">&#x0003C;0.01</td>
</tr>
<tr>
<td valign="top" align="left">10</td>
<td valign="top" align="left">LIFEQ: Q7, Actual perceived health condition</td>
<td valign="top" align="left">LIFEQ</td>
<td valign="top" align="center">62.51 &#x000B1; 18.48</td>
<td valign="top" align="center">58.11 &#x000B1; 18.9</td>
<td valign="top" align="center">0.133</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Results on Task 2 (Primary vs. Secondary Functional Dizziness)</title>
<p>Compared to task 1, the performance of the 11 classifiers on task 2 is more homogeneous (cf. <xref ref-type="fig" rid="F3">Figure 3</xref>), i.e., all classifiers classify with a within a similar accuracy range between 55.2% (DecisionTree) and 62.8% (GaussianProcess), a f1-score range between 0.498 (MLP) and 0.596 (SVC), and ROC-AUC range between 0.571 (DecisionTree) and 0.689 (SVC). Overall, this classification task is dominated by the linear classification algorithm SVC and the non-linear GaussianProcess classifiers, while the DecisionTree and neural network classifier MLP/ANN are the worst-performing algorithms in terms of accuracy and f1-score. The graph neural network method MGMC and RF had an accuracy of 60.6 and 62.2%, both are close to the average accuracy of all classifiers (60.4%). The confusion matrices reveal that LR and RF have an equally high sensitivity for secondary functional dizziness (77%), compared to MGMC (65%), but a comparably lower sensitivity for primary function dizziness (LR/RF: 42%, MGMC: 54%). Notably, hyper-parameter optimization had very little effect on the outcomes of Task 2, as the average accuracy of all classifiers stayed at 60.4% both with and without the parameter tuning.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Results panel produced by base-ml for Task 2.</p></caption>
<graphic xlink:href="fneur-12-681140-g0003.tif"/>
</fig>
<p>Again, the lower classification performance could partly be due to class imbalance, i.e., a slight underrepresentation of primary functional dizziness in this DizzyReg subset (42.5% primary vs. 57.5% secondary), however the class imbalance is not as severe as in task 1. The UMAP feature embedding shows that after pre-processing, two clearly separated clusters emerge in the topology of the data. Again, the source for this data separation is not clear and will be discussed further below. However, in the smaller cluster, most of patients are from the group with secondary functional dizziness (purple points), while in the larger cluster, there is a mix of both groups, and this mix is not clearly separable by data topology alone. The classification algorithms still can achieve a certain level of data separability in high-dimensional space, but it is noteworthy that the UMAP embedding reflects that task 2 is more challenging compared to task 1, even before the classifiers are applied.</p>
<p>The top 10 most important features for task 2 (cf. <xref ref-type="table" rid="T3">Table 3</xref>) are largely different from task 1. Expectedly, a normal caloric result (rank 1) and the vHIT gain left/right (ranks 4 and 2) and abnormal HIT result on the right (rank 9) differ in both groups. Patients with primary functional dizziness are younger (rank 3) and tend to drink more alcohol (&#x02265;1 drink in the last week, rank 6). One item from the DHI plays an important role for separation, related to problems turning over while in bed (rank 7), and another life quality factor, LIFEQ Q7, i.e., the actual perceived health condition, is relevant as well (rank 8). The duration of vertigo is important as well, in particular whether the duration is between 20 and 60 min (rank 6). Finally, the depression/fear score in the EQ5D questionnaire is relevant (rank 10). All features except EQ5D fear/depression and LIFEQ Q7 are significantly different between the two groups. It is important to note though that multivariable classifiers do not need to depend on univariate feature significance. In high-dimensional space, these two univariately non-significant features may still contribute to a better separation boundary.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Top 10 most important features in Task 2, aggregated over multiple classifiers.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Rank</bold></th>
<th valign="top" align="left"><bold>Feature</bold></th>
<th valign="top" align="left"><bold>Feature type</bold></th>
<th valign="top" align="center"><bold>Functional dizziness (secondary)</bold></th>
<th valign="top" align="center"><bold>Functional dizziness (primary)</bold></th>
<th valign="top" align="center"><bold><italic>P</italic>-Value</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="left">Caloric: normal result</td>
<td valign="top" align="left">Apparative tests</td>
<td valign="top" align="center">73.10%</td>
<td valign="top" align="center">96.20%</td>
<td valign="top" align="center">&#x0003C;0.0001</td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="left">vHIT: gain right</td>
<td valign="top" align="left">Apparative tests</td>
<td valign="top" align="center">0.87 &#x000B1; 0.18</td>
<td valign="top" align="center">0.92 &#x000B1; 0.19</td>
<td valign="top" align="center">&#x0003C;0.0001</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="left">Age (yrs)</td>
<td valign="top" align="left">Questionnaire</td>
<td valign="top" align="center">51.79 &#x000B1; 13.91</td>
<td valign="top" align="center">45.61 &#x000B1; 14.21</td>
<td valign="top" align="center">&#x0003C;0.0001</td>
</tr>
<tr>
<td valign="top" align="left">4</td>
<td valign="top" align="left">vHIT: gain left</td>
<td valign="top" align="left">Apparative tests</td>
<td valign="top" align="center">0.92 &#x000B1; 0.13</td>
<td valign="top" align="center">0.97 &#x000B1; 0.12</td>
<td valign="top" align="center">&#x0003C;0.0001</td>
</tr>
<tr>
<td valign="top" align="left">5</td>
<td valign="top" align="left">Vertigo time: 20&#x02013;60 min</td>
<td valign="top" align="left">Questionnaire</td>
<td valign="top" align="center">13.20%</td>
<td valign="top" align="center">5.30%</td>
<td valign="top" align="center">&#x0003C;0.05</td>
</tr>
<tr>
<td valign="top" align="left">6</td>
<td valign="top" align="left">&#x0003E;= 1 alcoholic drink last week</td>
<td valign="top" align="left">Questionnaire</td>
<td valign="top" align="center">43.60%</td>
<td valign="top" align="center">58.30%</td>
<td valign="top" align="center">&#x0003C;0.01</td>
</tr>
<tr>
<td valign="top" align="left">7</td>
<td valign="top" align="left">DHI: Q13, problems turning over in bed</td>
<td valign="top" align="left">DHI</td>
<td valign="top" align="center">43.80%</td>
<td valign="top" align="center">25.70%</td>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">8</td>
<td valign="top" align="left">LIFEQ: Q7, Actual perceived health condition</td>
<td valign="top" align="left">LIFEQ</td>
<td valign="top" align="center">57.28 &#x000B1; 19.61</td>
<td valign="top" align="center">59.34 &#x000B1; 18.53</td>
<td valign="top" align="center">0.111</td>
</tr>
<tr>
<td valign="top" align="left">9</td>
<td valign="top" align="left">HIT: right, abnormal</td>
<td valign="top" align="left">Neurological investigation P1</td>
<td valign="top" align="center">13.70%</td>
<td valign="top" align="center">1.40%</td>
<td valign="top" align="center">&#x0003C;0.0005</td>
</tr>
<tr>
<td valign="top" align="left">10</td>
<td valign="top" align="left">EQ5D: fear, depression</td>
<td valign="top" align="left">Questionnaire</td>
<td valign="top" align="center">60.0%</td>
<td valign="top" align="center">70.0%</td>
<td valign="top" align="center">0.069</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Results on Task 3 (BPPV vs. VP vs. MD vs. VM)</title>
<p>Already at first glance (cf. <xref ref-type="fig" rid="F4">Figure 4</xref>), and as clinical intuition suggested, task 3 is the most challenging of the three classification tasks. Compared to the average classifier accuracy of task 1 (89.6%) and task 2 (60.4%), the accuracy on task 3 is much lower (48.0%). Individually, the classifiers have an accuracy range between 40.6% (DecisionTree) and 54.3% (LDA), a f1-score range between 0.269 (DecisionTree) and 0.461 (LDA), and a ROC-AUC range between 0.564 (DecisionTree) and 0.764 (LDA). Overall on task 3, linear classifiers, and LDA in particular, classify with the highest accuracy. The RF classifier, on the other hand, only has an average performance on task 3 (accuracy/f1-score/ROC-AUC: 48.5%/0.372/0.702), in comparison to tasks 1 and 2. The confusion matrices reveal that the disorders VM, BPPV, MD and VP can be classified with a decreasing order of classification sensitivity (e.g., for LR approximately: 70%, 50%, 40%, 20%). On task 3, hyper-parameter optimization had a much higher effect on the classifierd outcomes than in tasks 1 and 2, i.e., after parameter tuning, the average classification accuracy of all models increased from 44.2 to 48.0%.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Results panel produced by base-ml for Task 3.</p></caption>
<graphic xlink:href="fneur-12-681140-g0004.tif"/>
</fig>
<p>Class imbalance probably plays a role here as well, as this ordering almost coincides with the class representation in the dataset (VM: 39.8%, BPPV: 24.8%, MD: 26.3%, VP: 9.1%). Looking at the UMAP embedding, the same separation of the data cloud into two clusters is clearly visible, and the four episodic vestibular disorders are visually not clearly separable within the two clusters, which again anticipates the difficulty of the classification task.</p>
<p>Regarding the 10 most important features (cf. <xref ref-type="table" rid="T4">Table 4</xref>), mean patient age ranks on the top (BPPV oldest, VM youngest). Second most important is vertigo time &#x0003C;2 min (which is most frequent in BPPV and VP). Expectedly, several features are related to body relocation, e.g., problems getting into, out of, or turning over inside the bed (DHI Q13, rank 3; VAP Q2, rank 4), bending over (DHI Q25, rank 7), or vertical climbing (VAP Q7, rank 10). Accompanying headache is ranked in 6th position and indicative for VM. There is only one apparative feature relevant for task 3 (normal caloric test, rank 5), with MD being the only group with relevantly abnormal results.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Top 10 most important features in Task 3, aggregated over multiple classifiers.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Rank</bold></th>
<th valign="top" align="left"><bold>Feature</bold></th>
<th valign="top" align="left"><bold>Feature type</bold></th>
<th valign="top" align="center"><bold>BBPV</bold></th>
<th valign="top" align="center"><bold>MD</bold></th>
<th valign="top" align="center"><bold>VM</bold></th>
<th valign="top" align="center"><bold>VP</bold></th>
<th valign="top" align="center"><bold><italic>P</italic>-Value</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="left">Age (yrs)</td>
<td valign="top" align="left">Questionnaire</td>
<td valign="top" align="center">56.6 &#x000B1; 11.4</td>
<td valign="top" align="center">53.3 &#x000B1; 13.0</td>
<td valign="top" align="center">44.7 &#x000B1; 13.3</td>
<td valign="top" align="center">51.6 &#x000B1; 13.6</td>
<td valign="top" align="center">&#x0003C;0.0001</td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="left">Vertigo time: &#x0003C;2 min</td>
<td valign="top" align="left">Questionnaire</td>
<td valign="top" align="center">44.80%</td>
<td valign="top" align="center">12.70%</td>
<td valign="top" align="center">17.20%</td>
<td valign="top" align="center">71.40%</td>
<td valign="top" align="center">&#x0003C;0.0001</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="left">DHI: Q13, problems turning over in bed</td>
<td valign="top" align="left">DHI</td>
<td valign="top" align="center">87.90%</td>
<td valign="top" align="center">47.50%</td>
<td valign="top" align="center">44.20%</td>
<td valign="top" align="center">34.70%</td>
<td valign="top" align="center">&#x0003C;0.0001</td>
</tr>
<tr>
<td valign="top" align="left">4</td>
<td valign="top" align="left">VAP: Q2, problems to get in/out/turn over in bed.</td>
<td valign="top" align="left">VAP</td>
<td valign="top" align="center">93.30%</td>
<td valign="top" align="center">68.60%</td>
<td valign="top" align="center">58.50%</td>
<td valign="top" align="center">49.00%</td>
<td valign="top" align="center">&#x0003C;0.0001</td>
</tr>
<tr>
<td valign="top" align="left">5</td>
<td valign="top" align="left">Caloric: normal result</td>
<td valign="top" align="left">Apparative tests</td>
<td valign="top" align="center">85.90%</td>
<td valign="top" align="center">49.50%</td>
<td valign="top" align="center">84.80%</td>
<td valign="top" align="center">100.00%</td>
<td valign="top" align="center">&#x0003C;0.0001</td>
</tr>
<tr>
<td valign="top" align="left">6</td>
<td valign="top" align="left">Accompanying headache</td>
<td valign="top" align="left">Questionnaire</td>
<td valign="top" align="center">16.80%</td>
<td valign="top" align="center">19.00%</td>
<td valign="top" align="center">53.50%</td>
<td valign="top" align="center">15.00%</td>
<td valign="top" align="center">&#x0003C;0.0001</td>
</tr>
<tr>
<td valign="top" align="left">7</td>
<td valign="top" align="left">DHI: Q25, bending over increases problems</td>
<td valign="top" align="left">DHI</td>
<td valign="top" align="center">76.10%</td>
<td valign="top" align="center">60.30%</td>
<td valign="top" align="center">61.20%</td>
<td valign="top" align="center">61.20%</td>
<td valign="top" align="center">&#x0003C;0.05</td>
</tr>
<tr>
<td valign="top" align="left">8</td>
<td valign="top" align="left">DHI: Q6, restricted participation in social activities</td>
<td valign="top" align="left">DHI</td>
<td valign="top" align="center">71.40%</td>
<td valign="top" align="center">82.90%</td>
<td valign="top" align="center">75.50%</td>
<td valign="top" align="center">65.30%</td>
<td valign="top" align="center">&#x0003C;0.05</td>
</tr>
<tr>
<td valign="top" align="left">9</td>
<td valign="top" align="left">DHI: Q22, increased stress on family/friend relationships</td>
<td valign="top" align="left">DHI</td>
<td valign="top" align="center">23.10%</td>
<td valign="top" align="center">48.90%</td>
<td valign="top" align="center">45.60%</td>
<td valign="top" align="center">38.80%</td>
<td valign="top" align="center">&#x0003C;0.0001</td>
</tr>
<tr>
<td valign="top" align="left">10</td>
<td valign="top" align="left">VAP: Q7, Vertical climbing (stairs/lift)</td>
<td valign="top" align="left">VAP</td>
<td valign="top" align="center">60.00%</td>
<td valign="top" align="center">64.90%</td>
<td valign="top" align="center">62.00%</td>
<td valign="top" align="center">45.70%</td>
<td valign="top" align="center">0.139</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<p>In this paper, we have described several approaches for multivariable analysis and machine learning classification of three different patient cohorts from the vestibular registry dataset DizzyReg, i.e., functional dizziness vs. bilateral vestibular failure, primary vs. secondary functional dizziness, and BPPV vs. Meni&#x000E9;re&#x00027;s disease vs. vestibular migraine vs. vestibular paroxysmia. Clinically, the three tasks were rated with an increasing difficulty and the machine learning classifier performances reflected this grading, with an average accuracy of 87.0, 60.5, and 44.3%, respectively. Using results produced by base-ml, we put these accuracy scores into context with class imbalance, input feature embedddings, confusion matrices and sensitivity scores, as well as tables with the top 10 most important features, aggregated over several classifiers using the proposed RAFI heuristic. In the following, we are going to discuss these results, both from a technical and clinical perspective.</p>
<sec>
<title>Technical Aspects</title>
<p>The results of the three classification experiments highlight several important points. We believe it to be apparent from the results that it is beneficial to run and benchmark several classification algorithms, ideally from different categories, such as linear, non-linear and neural models. Even a supposedly easy task from a medical perspective does not necessarily lead to a matching classifier performance, depending on which model is used (e.g., 78% classification accuracy in task 1 with Na&#x000EF;ve Bayes), hence an a-priori selection could result in too pessimistic an assessment of classification potential using machine learning. Therefore, a wide range of methods in one comprehensive framework might benefit research groups that are new to the field of ML on clinical data. Further, linear models should always be tested along with non-linear and neural network models, as the best linear model (e.g., in task 1, SVC with mean accuracy/f1-score/ROC-AUC: 91.7%/0.819/0.926) may match or even outperform the performance of more complex models, especially if the task has a wide, rather than long data matrix, or if the classes are clearly separable.</p>
<p>Analyzing classifier performance purely using quantitative metrics provides only a narrow view, however. Our analysis reports additionally provide plots on class imbalance, input data distribution, and confusion matrices, all of which provide different insights into the experiment. Class representation in the dataset correlated with the sensitivity for each class in all three experiments, which the confusion matrices highlighted. The input data distribution additionally revealed that DizzyReg data in our study had a fundamental separation into two clusters (cf. UMAP embeddings in <xref ref-type="fig" rid="F2">Figures 2</xref>&#x02013;<xref ref-type="fig" rid="F4">4</xref>). At least in task 1 this did not affect classification outcomes to match the clinical intuition, however, for future ML-based studies, this separation would need to be investigated further. Counteracting such a data separation, e.g., with input data transforms (<xref ref-type="bibr" rid="B54">54</xref>), or more advanced techniques like domain adaptation (<xref ref-type="bibr" rid="B55">55</xref>), could improve classification results further. As such, the results obtained through the base-ml tool provide not only information about which machine learning models to pursue further, but they also indicate starting points regarding the optimization of the input data with classical data science and statistical methods. For clinicians, an important part of the results are the most important features selected by the classifiers, which we present in an aggregated form using the proposed RAFI heuristic. These features will be discussed in more detail and put into a clinical context in section Clinical Implications.</p>
<p>The method presented in this work, and comprised in the base-ml tool have several noteworthy limitations. In general, base-ml is intended as a first screening tool for ML experiments, rather than as a complete ML solution that leads to a trained model for prospective studies and/or deployment. It has been shown previously that hyper-parameter optimization using nested cross-validation can lead to significant improvements of classification performance (<xref ref-type="bibr" rid="B6">6</xref>, <xref ref-type="bibr" rid="B12">12</xref>, <xref ref-type="bibr" rid="B13">13</xref>). In our study, while hyper-parameter tuning had no noticeable effects on Task 2, there were noticeable improvements in the average classification outcomes across all models in Tasks 1 and 3. Further, not only the models themselves have hyper-parameters, but every part of the ML pipeline in base-ml could be individually optimized further. This could include alternative input normalization strategies [e.g., power transforms (<xref ref-type="bibr" rid="B54">54</xref>, <xref ref-type="bibr" rid="B56">56</xref>)] and imputation methods [e.g., kNN imputation or multiple imputation by chained equations, MICE (<xref ref-type="bibr" rid="B57">57</xref>, <xref ref-type="bibr" rid="B58">58</xref>)] or the inclusion of feature selection methods (e.g., based on univariate hypothesis testing), all of which are important toward optimal classifier performance (<xref ref-type="bibr" rid="B9">9</xref>). A default treatment made in our experiments, for example, is to discard variables that were recorded for &#x0003C;50% of the population. In clinical practice, however, some variables may be missing because the according examinations or apparative tests were not ordered by the physician, maybe due to time, cost, lack of indication, or expected inefficacy toward diagnosis. In that case, individual rules for variable rejection, imputation and/or normalization may be necessary. For base-ml, we chose to avoid such in-depth treatment, in favor of an ease-of-use at the exploratory stage. However, base-ml is built on top of scikit-learn and already provides an interface to modern deep learning methods with skorch, and explainable AI solutions through Captum. This makes it easy to include many further methods for feature selection, imputation and normalization, as well as further classification explainable AI algorithms (<xref ref-type="bibr" rid="B32">32</xref>). However, at a certain level of complexity that aims at deployment rather than exploration, it is recommendable to consider more in-depth analyses and tool, ideally in close collaboration with data science and ML experts, and potentially starting off from insights obtained with base-ml. A particularly interesting avenue is the current research direction of Automated Machine Learning (AutoML), which aims at an optimization of the entire classification pipeline end-to-end (<xref ref-type="bibr" rid="B59">59</xref>). Importantly though, small to medium-size datasets might not provide enough data samples to train such complex pipelines. Until more cross-institutional vestibular registry datasets like DizzyReg come to existence, and with sufficient data to apply AutoML, the methods which we wrapped in base-ml and presented in this work still provide a solid starting point for ML-based analysis. As such, and for the time being, we believe these tools to be a valuable contribution for the vestibular research community.</p>
</sec>
<sec>
<title>Clinical Implications</title>
<p>Clinical reasoning in the diagnostic differentiation of common vestibular disorders is based on a &#x0201C;mental aggregation&#x0201D; of information from patient characteristics (such as age and gender), symptom characteristics (namely quality, duration, triggers, accompanying symptoms), clinical examination (e.g., positioning maneuvers), and quantitative tests of vestibular function (such as vHIT, calorics) (<xref ref-type="bibr" rid="B16">16</xref>). It is an open and relevant question, whether ML-based methods are able to identify features from a multimodal vestibular patient registry, which resemble this clinical thinking and feature weighting. In the current study, we tested three clinical scenarios of different complexity on the DizzyReg database to further address this issue.</p>
<p>The first classification task represented two groups of patients suffering from chronic dizziness of almost diametrical etiology. In bilateral vestibular failure, imbalance can be directly assigned to an organic damage of vestibular afferents, which is accompanied by a low degree of balance-related anxiety (<xref ref-type="bibr" rid="B60">60</xref>, <xref ref-type="bibr" rid="B61">61</xref>), while in functional dizziness the vestibular system is physiologically intact, but the subjective perception of balance is severely disturbed due to fearful introspection (<xref ref-type="bibr" rid="B62">62</xref>). It can be expected that ML-based algorithms will predominantly select features as most important for the segregation of both disorders, which represent either measurements of vestibular function or scales for anxiety and perceived disability. Indeed, the top 10 important features exactly meet this assumption with six of them reflecting low and high frequency function of the vestibular-ocular reflex (HIT left/right normal, vHIT gain left/right, bilateral vHIT normal, caloric response normal), and further three features healthy-related quality of life, depression and fear. Furthermore, age was an important differential feature, which is in good accordance to the fact that bilateral vestibular failure appears more frequently in older patients and functional dizziness in younger and mid-aged patients.</p>
<p>In the second classification task, two groups of patients with functional dizziness were compared, who were presumably very similar in their symptomatic presentation, but differed in the evolution of their symptoms: patients with primary functional dizziness, where chronic psychological stress or anxiety is the driving force, and patients with secondary functional dizziness, which develops after a preceding somatic vestibular disorders (e.g., BPPV) due to altered balance perception and strategies of postural control (<xref ref-type="bibr" rid="B8">8</xref>). Accordingly, top 10 features for classification included vestibular function tests (such as vHIT gain left/right and caloric response normal). The subtle differences between groups may speak for a partially recovered acute unilateral vestibulopathy or MD as some causes underlying secondary functional dizziness. Furthermore, symptom provocation by position changes in bed may point to BPPV as another vestibular disorder triggering secondary functional dizziness. This findings agree with previous literature (<xref ref-type="bibr" rid="B8">8</xref>). Interestingly, patients with primary functional dizziness had higher fear and depression scales, which may indicate a more intense psychological symptom burden. Indeed, previous studies have shown a psychiatric comorbidity in primary functional dizziness in 75 vs. 42% in secondary functional dizziness (<xref ref-type="bibr" rid="B63">63</xref>). The more frequent consumption of alcohol in primary functional dizziness may also show that those patients subjectively profit from its relaxing effects to a higher extent than patients with secondary functional dizziness, who have some degree of vestibular deficits, which may exacerbate on alcohol (e.g., partially compensated unilateral vestibulopathy or vestibular migraine).</p>
<p>The third classification task was designed to differentiate common episodic vestibular disorders like BPPV, MD, vestibular migraine and vestibular paroxysmia. Expectedly, a set of features was most indicative for BPPV, namely short attack duration and provocation by position changes. MD as compared to the other vestibular disorders was associated with the highest rate of pathological vestibular function tests (caloric test abnormal). It is well-known that long-standing MD can cause vestibular function deficits (<xref ref-type="bibr" rid="B64">64</xref>), while this is less frequent in vestibular migraine (<xref ref-type="bibr" rid="B65">65</xref>). The latter was associated with the highest frequency of headache and the youngest mean patient age, in accordance to literature (<xref ref-type="bibr" rid="B66">66</xref>). Vestibular paroxysmia was mostly defined by a short-symptom duration. The overall moderate accuracy for classification of the four episodic vestibular disorders can be explained by several factors: (i) one methodological explanation could be that this was a multi-class task, which is more challenging; (ii) despite the exhaustive history taking and examination details for patients recorded in DizzyReg, it is possible that not all relevant information is included. For example, systematic audiological test results are only available for patients with Meni&#x000E8;re&#x00027;s disease and vestibular migraine, but not for BPPV or vestibular paroxysmia. Therefore, audiological test results could not be generally included in the third classification task as a variable; (iii) there are potential overlaps of symptom characteristics and features. A prominent example is an overlap syndrome of MD and vestibular migraine, which could point toward a common pathophysiology (<xref ref-type="bibr" rid="B67">67</xref>); (iv) although the guidelines &#x0201C;International Classification of Vestibular Disorders (ICVD)&#x0201D; of the Barany Society give clear criteria for diagnosis mostly based on history taking, complex clinical constellations such as overlapping syndromes or atypical presentations appear regularly in the practice of a tertiary referral center, which may cause some difficulties in clear-cut classification. Limited classification accuracy may be partly explained by this selection bias, and further testing in primary care settings will be needed; (v) given the difficulty of task 3, the low ML classification performance is neither surprising nor a sign of a failure of ML classification approaches. Instead, our results suggest that ML algorithms, even given considerable data to learn from, may not automatically be able to solve difficult clinical tasks. The wide range of tuned ML algorithm performances presented by base-ml can reveal such difficulty better than a narrow selection of ML results without tuning; (vi) previous studies suggest that expert consensus may not always be unanimous, and may indicate the difficulty of patient diagnosis, despite clear guidelines and diagnostic criteria. For example, authors in (<xref ref-type="bibr" rid="B68">68</xref>) tried to validate diagnostic classifications through multi-rater agreement between several experienced otoneurological raters, and an acceptable consensus was achieved only in 62% of the patients. This study indicates that some diagnostic inaccuracy persists in the clinical setting, despite established international classification criteria. This could be taken as a further argument to augment clinical decision making by ML-based support systems.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="s5">
<title>Conclusion</title>
<p>Analysis of large multimodal datasets by novel ML/MVA-methods may contribute to clinical decision making in neuro-otology. Important features for classification can be identified and aligned with expert experience and diagnostic guidelines. The optimal ML/MVA-method depends on the classification task and data structure. Base-ml provides an innovative open source toolbox to test different methods and clinical tasks in parallel. The multi-faceted presentation of results and explainable AI features, including an identification of clinically relevant features and their statistical analysis, enables clinicians to better understand ML/MVA outcomes, and identify avenues for further investigation. Future research needs to be extended to larger multicenter datasets and new data sources to improve the performance of automated diagnostic support tools.</p>
</sec>
<sec sec-type="data-availability-statement" id="s6">
<title>Data Availability Statement</title>
<p>The data analyzed in this study was obtained from the DSGZ DizzyReg, the following licenses/restrictions apply: The DSGZ provides application forms that must be completed before the data in the DizzyReg may be accessed. Please contact the DSGZ for more details on the application process. Requests to access these datasets should be directed to Ralf Strobl, <email>ralf.strobl&#x00040;med.uni-muenchen.de</email>.</p>
</sec>
<sec id="s7">
<title>Ethics Statement</title>
<p>The studies involving human participants were reviewed and approved by Institutional Review Board University Hospital Munich Ludwig Maximilian University Munich, Germany. The patients/participants provided their written informed consent to participate in this study.</p>
</sec>
<sec id="s8">
<title>Author Contributions</title>
<p>GV, AZ, RS, and S-AA contributed to conception and design of the study and wrote the first draft of the manuscript. NN and EG contributed to study refinement. RS and GV organized the database. GV and S-AA developed base-ml and performed the data and statistical analyses. S-AA, AZ, NN, and EG provided funding. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dagliati</surname> <given-names>A</given-names></name> <name><surname>Tibollo</surname> <given-names>V</given-names></name> <name><surname>Sacchi</surname> <given-names>L</given-names></name> <name><surname>Malovini</surname> <given-names>A</given-names></name> <name><surname>Limongelli</surname> <given-names>I</given-names></name> <name><surname>Gabetta</surname> <given-names>M</given-names></name> <etal/></person-group>. <article-title>Big data as a driver for clinical decision support systems: a learning health systems perspective</article-title>. <source>Front Digit Humanit.</source> (<year>2018</year>) <volume>5</volume>:<fpage>8</fpage>. <pub-id pub-id-type="doi">10.3389/fdigh.2018.00008</pub-id></citation></ref>
<ref id="B2">
<label>2.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dash</surname> <given-names>S</given-names></name> <name><surname>Shakyawar</surname> <given-names>SK</given-names></name> <name><surname>Sharma</surname> <given-names>M</given-names></name> <name><surname>Kaushik</surname> <given-names>S</given-names></name></person-group>. <article-title>Big data in healthcare: management, analysis and future prospects</article-title>. <source>J Big Data.</source> (<year>2019</year>) <volume>6</volume>:<fpage>54</fpage>. <pub-id pub-id-type="doi">10.1186/s40537-019-0217-0</pub-id><pub-id pub-id-type="pmid">32244343</pub-id></citation></ref>
<ref id="B3">
<label>3.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gamache</surname> <given-names>R</given-names></name> <name><surname>Kharrazi</surname> <given-names>H</given-names></name> <name><surname>Weiner</surname> <given-names>J</given-names></name></person-group>. <article-title>Public and population health informatics: the bridging of big data to benefit communities</article-title>. <source>Yearb Med Inform.</source> (<year>2018</year>) <volume>27</volume>:<fpage>199</fpage>&#x02013;<lpage>206</lpage>. <pub-id pub-id-type="doi">10.1055/s-0038-1667081</pub-id><pub-id pub-id-type="pmid">30157524</pub-id></citation></ref>
<ref id="B4">
<label>4.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ahmadi</surname> <given-names>S-A</given-names></name> <name><surname>Vivar</surname> <given-names>G</given-names></name> <name><surname>Frei</surname> <given-names>J</given-names></name> <name><surname>Nowoshilow</surname> <given-names>S</given-names></name> <name><surname>Bardins</surname> <given-names>S</given-names></name> <name><surname>Brandt</surname> <given-names>T</given-names></name> <etal/></person-group>. <article-title>Towards computerized diagnosis of neurological stance disorders: data mining and machine learning of posturography and sway</article-title>. <source>J Neurol.</source> (<year>2019</year>) <volume>266</volume>:<fpage>108</fpage>&#x02013;<lpage>17</lpage>. <pub-id pub-id-type="doi">10.1007/s00415-019-09458-y</pub-id><pub-id pub-id-type="pmid">31286203</pub-id></citation></ref>
<ref id="B5">
<label>5.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pradhan</surname> <given-names>C</given-names></name> <name><surname>Wuehr</surname> <given-names>M</given-names></name> <name><surname>Akrami</surname> <given-names>F</given-names></name> <name><surname>Neuhaeusser</surname> <given-names>M</given-names></name> <name><surname>Huth</surname> <given-names>S</given-names></name> <name><surname>Brandt</surname> <given-names>T</given-names></name> <etal/></person-group>. <article-title>Automated classification of neurological disorders of gait using spatio-temporal gait parameters</article-title>. <source>J Electromyogr Kinesiol.</source> (<year>2015</year>) <volume>25</volume>:<fpage>413</fpage>&#x02013;<lpage>22</lpage>. <pub-id pub-id-type="doi">10.1016/j.jelekin.2015.01.004</pub-id><pub-id pub-id-type="pmid">25725811</pub-id></citation></ref>
<ref id="B6">
<label>6.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ahmadi</surname> <given-names>S-A</given-names></name> <name><surname>Vivar</surname> <given-names>G</given-names></name> <name><surname>Navab</surname> <given-names>N</given-names></name> <name><surname>M&#x000F6;hwald</surname> <given-names>K</given-names></name> <name><surname>Maier</surname> <given-names>A</given-names></name> <name><surname>Hadzhikolev</surname> <given-names>H</given-names></name> <etal/></person-group>. <article-title>Modern machine-learning can support diagnostic differentiation of central and peripheral acute vestibular disorders</article-title>. <source>J Neurol.</source> (<year>2020</year>) <volume>267</volume>:<fpage>143</fpage>&#x02013;<lpage>52</lpage>. <pub-id pub-id-type="doi">10.1007/s00415-020-09931-z</pub-id><pub-id pub-id-type="pmid">32529578</pub-id></citation></ref>
<ref id="B7">
<label>7.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Groezinger</surname> <given-names>M</given-names></name> <name><surname>Huppert</surname> <given-names>D</given-names></name> <name><surname>Strobl</surname> <given-names>R</given-names></name> <name><surname>Grill</surname> <given-names>E</given-names></name></person-group>. <article-title>Development and validation of a classification algorithm to diagnose and differentiate spontaneous episodic vertigo syndromes: results from the DizzyReg patient registry</article-title>. <source>J Neurol.</source> (<year>2020</year>) <volume>267</volume>:<fpage>160</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1007/s00415-020-10061-9</pub-id><pub-id pub-id-type="pmid">33241443</pub-id></citation></ref>
<ref id="B8">
<label>8.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Habs</surname> <given-names>M</given-names></name> <name><surname>Strobl</surname> <given-names>R</given-names></name> <name><surname>Grill</surname> <given-names>E</given-names></name> <name><surname>Dieterich</surname> <given-names>M</given-names></name> <name><surname>Becker-Bense</surname> <given-names>S</given-names></name></person-group>. <article-title>Primary or secondary chronic functional dizziness: does it make a difference? A DizzyReg study in 356 patients</article-title>. <source>J Neurol.</source> (<year>2020</year>) <volume>267</volume>:<fpage>212</fpage>&#x02013;<lpage>22</lpage>. <pub-id pub-id-type="doi">10.1007/s00415-020-10150-9</pub-id><pub-id pub-id-type="pmid">32852579</pub-id></citation></ref>
<ref id="B9">
<label>9.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>PF</given-names></name> <name><surname>Zheng</surname> <given-names>Y</given-names></name></person-group>. <article-title>Applications of multivariate statistical and data mining analyses to the search for biomarkers of sensorineural hearing loss, tinnitus, and vestibular dysfunction</article-title>. <source>Front Neurol.</source> (<year>2021</year>) <volume>12</volume>:<fpage>627294</fpage>. <pub-id pub-id-type="doi">10.3389/fneur.2021.627294</pub-id><pub-id pub-id-type="pmid">33746881</pub-id></citation></ref>
<ref id="B10">
<label>10.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>LeCun</surname> <given-names>Y</given-names></name> <name><surname>Bengio</surname> <given-names>Y</given-names></name> <name><surname>Hinton</surname> <given-names>G</given-names></name></person-group>. <article-title>Deep learning</article-title>. <source>Nature.</source> (<year>2015</year>) <volume>521</volume>:<fpage>436</fpage>&#x02013;<lpage>44</lpage>. <pub-id pub-id-type="doi">10.1038/nature14539</pub-id></citation></ref>
<ref id="B11">
<label>11.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grill</surname> <given-names>E</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>T</given-names></name> <name><surname>Becker-Bense</surname> <given-names>S</given-names></name> <name><surname>G&#x000FC;rkov</surname> <given-names>R</given-names></name> <name><surname>Heinen</surname> <given-names>F</given-names></name> <name><surname>Huppert</surname> <given-names>D</given-names></name> <etal/></person-group>. <article-title>DizzyReg: the prospective patient registry of the German center for vertigo and balance disorders</article-title>. <source>J Neurol.</source> (<year>2017</year>) <volume>264</volume>:<fpage>34</fpage>&#x02013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1007/s00415-017-8438-7</pub-id><pub-id pub-id-type="pmid">28271410</pub-id></citation></ref>
<ref id="B12">
<label>12.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Vivar</surname> <given-names>G</given-names></name> <name><surname>Zwergal</surname> <given-names>A</given-names></name> <name><surname>Navab</surname> <given-names>N</given-names></name> <name><surname>Ahmadi</surname> <given-names>S-A</given-names></name></person-group>. <article-title>Multi-modal disease classification in incomplete datasets using geometric matrix completion</article-title>, In: <person-group person-group-type="editor"><name><surname>Stoyanov</surname> <given-names>D</given-names></name> <name><surname>Taylor</surname> <given-names>Z</given-names></name> <name><surname>Ferrante</surname> <given-names>E</given-names></name> <name><surname>Dalca</surname> <given-names>AV</given-names></name> <name><surname>Martel</surname> <given-names>A</given-names></name> <name><surname>Maier-Hein</surname> <given-names>L</given-names></name> <etal/></person-group>. editors. <source>Graphs in Biomedical Image Analysis Integrating Medical Imaging Non-Imaging Modalities</source>. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name> (<year>2018</year>). p. <fpage>24</fpage>&#x02013;<lpage>31</lpage>.</citation></ref>
<ref id="B13">
<label>13.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vivar</surname> <given-names>G</given-names></name> <name><surname>Kazi</surname> <given-names>A</given-names></name> <name><surname>Burwinkel</surname> <given-names>H</given-names></name> <name><surname>Zwergal</surname> <given-names>A</given-names></name> <name><surname>Navab</surname> <given-names>N</given-names></name> <name><surname>Ahmadi</surname> <given-names>S-A</given-names></name></person-group>. <article-title>Simultaneous imputation and classification using multigraph geometric matrix completion (MGMC): application to neurodegenerative disease classification</article-title>. <source>Artif Intell Med.</source> (<year>2021</year>) <volume>117</volume>:<fpage>102097</fpage>. <pub-id pub-id-type="doi">10.1016/j.artmed.2021.102097</pub-id><pub-id pub-id-type="pmid">34127236</pub-id></citation></ref>
<ref id="B14">
<label>14.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Vivar</surname> <given-names>G</given-names></name> <name><surname>Mullakaeva</surname> <given-names>K</given-names></name> <name><surname>Zwergal</surname> <given-names>A</given-names></name> <name><surname>Navab</surname> <given-names>N</given-names></name> <name><surname>Ahmadi</surname> <given-names>S-A</given-names></name></person-group>. <article-title>Peri-diagnostic decision support through cost-efficient feature acquisition at test-time</article-title>, In: <person-group person-group-type="editor"><name><surname>Martel</surname> <given-names>AL</given-names></name> <name><surname>Abolmaesumi</surname> <given-names>P</given-names></name> <name><surname>Stoyanov</surname> <given-names>D</given-names></name> <name><surname>Mateus</surname> <given-names>D</given-names></name> <name><surname>Zuluaga</surname> <given-names>MA</given-names></name> <name><surname>Zhou</surname> <given-names>SK</given-names></name> <etal/></person-group>. editors. <source>Medical Image Computing Computer Assisted Intervention &#x02013; MICCAI 2020 Lecture Notes in Computer Science</source>. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name> (<year>2020</year>). p. <fpage>572</fpage>&#x02013;<lpage>81</lpage>.</citation></ref>
<ref id="B15">
<label>15.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wuyts</surname> <given-names>FL</given-names></name> <name><surname>Van Rompaey</surname> <given-names>V</given-names></name> <name><surname>Maes</surname> <given-names>LK</given-names></name></person-group>. <article-title>&#x0201C;SO STONED&#x0201D;: common sense approach of the dizzy patient</article-title>. <source>Front Surg.</source> (<year>2016</year>) <volume>3</volume>:<fpage>32</fpage>. <pub-id pub-id-type="doi">10.3389/fsurg.2016.00032</pub-id><pub-id pub-id-type="pmid">27313999</pub-id></citation></ref>
<ref id="B16">
<label>16.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brandt</surname> <given-names>T</given-names></name> <name><surname>Strupp</surname> <given-names>M</given-names></name> <name><surname>Dieterich</surname> <given-names>M</given-names></name></person-group>. <article-title>Five keys for diagnosing most vertigo, dizziness, and imbalance syndromes: an expert opinion</article-title>. <source>J Neurol.</source> (<year>2014</year>) <volume>261</volume>:<fpage>229</fpage>&#x02013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.1007/s00415-013-7190-x</pub-id><pub-id pub-id-type="pmid">24292642</pub-id></citation></ref>
<ref id="B17">
<label>17.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Strobl</surname> <given-names>R</given-names></name> <name><surname>Gr&#x000F6;zinger</surname> <given-names>M</given-names></name> <name><surname>Zwergal</surname> <given-names>A</given-names></name> <name><surname>Huppert</surname> <given-names>D</given-names></name> <name><surname>Filippopulos</surname> <given-names>F</given-names></name> <name><surname>Grill</surname> <given-names>E</given-names></name></person-group>. <article-title>A set of eight key questions helps to classify common vestibular disorders&#x02014;results from the DizzyReg patient registry</article-title>. <source>Front Neurol.</source> (<year>2021</year>) <volume>12</volume>:<fpage>670944</fpage>. <pub-id pub-id-type="doi">10.3389/fneur.2021.670944</pub-id><pub-id pub-id-type="pmid">33995265</pub-id></citation></ref>
<ref id="B18">
<label>18.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jacobson</surname> <given-names>GP</given-names></name> <name><surname>Newman</surname> <given-names>CW</given-names></name></person-group>. <article-title>The development of the dizziness handicap inventory</article-title>. <source>Archiv Otolaryngol Head Neck Surg.</source> (<year>1990</year>) <volume>116</volume>:<fpage>424</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1001/archotol.1990.01870040046011</pub-id><pub-id pub-id-type="pmid">32009340</pub-id></citation></ref>
<ref id="B19">
<label>19.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Greiner</surname> <given-names>W</given-names></name> <name><surname>Weijnen</surname> <given-names>T</given-names></name> <name><surname>Nieuwenhuizen</surname> <given-names>M</given-names></name> <name><surname>Oppe</surname> <given-names>S</given-names></name> <name><surname>Badia</surname> <given-names>X</given-names></name> <name><surname>Busschbach</surname> <given-names>J</given-names></name> <etal/></person-group>. <article-title>A single European currency for EQ-5D health states</article-title>. <source>Eur J Health Eco.</source> (<year>2003</year>) <volume>4</volume>:<fpage>222</fpage>&#x02013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.1007/s10198-003-0182-5</pub-id><pub-id pub-id-type="pmid">15609189</pub-id></citation></ref>
<ref id="B20">
<label>20.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alghwiri</surname> <given-names>AA</given-names></name> <name><surname>Whitney</surname> <given-names>SL</given-names></name> <name><surname>Baker</surname> <given-names>CE</given-names></name> <name><surname>Sparto</surname> <given-names>PJ</given-names></name> <name><surname>Marchetti</surname> <given-names>GF</given-names></name> <name><surname>Rogers</surname> <given-names>JC</given-names></name> <etal/></person-group>. <article-title>The development and validation of the vestibular activities and participation measure</article-title>. <source>Archiv Phys Med Rehabil.</source> (<year>2012</year>) <volume>93</volume>:<fpage>1822</fpage>&#x02013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.1016/j.apmr.2012.03.017</pub-id><pub-id pub-id-type="pmid">22465405</pub-id></citation></ref>
<ref id="B21">
<label>21.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grill</surname> <given-names>E</given-names></name> <name><surname>Furman</surname> <given-names>JM</given-names></name> <name><surname>Alghwiri</surname> <given-names>AA</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>M</given-names></name> <name><surname>Whitney</surname> <given-names>SL</given-names></name></person-group>. <article-title>Using core sets of the international classification of functioning, disability and health (ICF) to measure disability in vestibular disorders: study protocol</article-title>. <source>J Vestib Res.</source> (<year>2013</year>) <volume>23</volume>:<fpage>297</fpage>&#x02013;<lpage>303</lpage>. <pub-id pub-id-type="doi">10.3233/VES-130487</pub-id><pub-id pub-id-type="pmid">24447970</pub-id></citation></ref>
<ref id="B22">
<label>22.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mueller</surname> <given-names>M</given-names></name> <name><surname>Whitney</surname> <given-names>SL</given-names></name> <name><surname>Alghwiri</surname> <given-names>A</given-names></name> <name><surname>Alshebber</surname> <given-names>K</given-names></name> <name><surname>Strobl</surname> <given-names>R</given-names></name> <name><surname>Alghadir</surname> <given-names>A</given-names></name> <etal/></person-group>. <article-title>Subscales of the vestibular activities and participation questionnaire could be applied across cultures</article-title>. <source>J Clin Epidemiol.</source> (<year>2015</year>) <volume>68</volume>:<fpage>211</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1016/j.jclinepi.2014.10.004</pub-id><pub-id pub-id-type="pmid">25500318</pub-id></citation></ref>
<ref id="B23">
<label>23.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bishop</surname> <given-names>CM</given-names></name></person-group>. <source>Pattern Recognition and Machine Learning</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name> (<year>2006</year>).</citation></ref>
<ref id="B24">
<label>24.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jerez</surname> <given-names>JM</given-names></name> <name><surname>Molina</surname> <given-names>I</given-names></name> <name><surname>Garc&#x000ED;a-Laencina</surname> <given-names>PJ</given-names></name> <name><surname>Alba</surname> <given-names>E</given-names></name> <name><surname>Ribelles</surname> <given-names>N</given-names></name> <name><surname>Mart&#x000ED;n</surname> <given-names>M</given-names></name> <etal/></person-group>. <article-title>Missing data imputation using statistical and machine learning methods in a real breast cancer problem</article-title>. <source>Artif Intell Med.</source> (<year>2010</year>) <volume>50</volume>:<fpage>105</fpage>&#x02013;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1016/j.artmed.2010.05.002</pub-id><pub-id pub-id-type="pmid">20638252</pub-id></citation></ref>
<ref id="B25">
<label>25.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Little</surname> <given-names>RJ</given-names></name> <name><surname>D&#x00027;Agostino</surname> <given-names>R</given-names></name> <name><surname>Cohen</surname> <given-names>ML</given-names></name> <name><surname>Dickersin</surname> <given-names>K</given-names></name> <name><surname>Emerson</surname> <given-names>SS</given-names></name> <name><surname>Farrar</surname> <given-names>JT</given-names></name> <etal/></person-group>. <article-title>The prevention and treatment of missing data in clinical trials</article-title>. <source>N Engl J Med.</source> (<year>2012</year>) <volume>367</volume>:<fpage>1355</fpage>&#x02013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.1056/NEJMsr1203730</pub-id></citation></ref>
<ref id="B26">
<label>26.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pesonen</surname> <given-names>E</given-names></name> <name><surname>Eskelinen</surname> <given-names>M</given-names></name> <name><surname>Juhola</surname> <given-names>M</given-names></name></person-group>. <article-title>Treatment of missing data values in a neural network based decision support system for acute abdominal pain</article-title>. <source>Artif Intell Med.</source> (<year>1998</year>) <volume>13</volume>:<fpage>139</fpage>&#x02013;<lpage>46</lpage>. <pub-id pub-id-type="doi">10.1016/S0933-3657(98)00027-X</pub-id><pub-id pub-id-type="pmid">9698150</pub-id></citation></ref>
<ref id="B27">
<label>27.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hastie</surname> <given-names>T</given-names></name> <name><surname>Tibshirani</surname> <given-names>R</given-names></name> <name><surname>Friedman</surname> <given-names>J</given-names></name></person-group>. <source>The Elements of Statistical Learning</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer New York</publisher-name> (<year>2009</year>).</citation></ref>
<ref id="B28">
<label>28.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Molinaro</surname> <given-names>AM</given-names></name> <name><surname>Simon</surname> <given-names>R</given-names></name> <name><surname>Pfeiffer</surname> <given-names>RM</given-names></name></person-group>. <article-title>Prediction error estimation: a comparison of resampling methods</article-title>. <source>Bioinformatics.</source> (<year>2005</year>) <volume>21</volume>:<fpage>3301</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bti499</pub-id><pub-id pub-id-type="pmid">15905277</pub-id></citation></ref>
<ref id="B29">
<label>29.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kohavi</surname> <given-names>R</given-names></name></person-group>. <article-title>A study of cross-validation and bootstrap for accuracy estimation and model selection</article-title>. In: <source>Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2</source>. <publisher-loc>Montreal, QC</publisher-loc> (<year>1995</year>). p. <fpage>1137</fpage>&#x02013;<lpage>43</lpage>.</citation></ref>
<ref id="B30">
<label>30.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wolpert</surname> <given-names>DH</given-names></name> <name><surname>Macready</surname> <given-names>WG</given-names></name></person-group>. <article-title>No free lunch theorems for optimization</article-title>. <source>IEEE Trans Evol Computat.</source> (<year>1997</year>) <volume>1</volume>:<fpage>67</fpage>&#x02013;<lpage>82</lpage>. <pub-id pub-id-type="doi">10.1109/4235.585893</pub-id></citation></ref>
<ref id="B31">
<label>31.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Breiman</surname> <given-names>L</given-names></name></person-group>. <article-title>Statistical modeling: the two cultures (with comments and a rejoinder by the author)</article-title>. <source>Statist Sci.</source> (<year>2001</year>) <volume>16</volume>:<fpage>199</fpage>&#x02013;<lpage>231</lpage>. <pub-id pub-id-type="doi">10.1214/ss/1009213726</pub-id></citation></ref>
<ref id="B32">
<label>32.</label>
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Molnar</surname> <given-names>C</given-names></name></person-group>. <source>Interpretable Machine Learning: A Guide for Making Black Box Models Explainable</source>. (<year>2019</year>). Available online at: <ext-link ext-link-type="uri" xlink:href="https://christophm.github.io/interpretable-ml-book/">https://christophm.github.io/interpretable-ml-book/</ext-link> (accessed July 11, 2021).</citation></ref>
<ref id="B33">
<label>33.</label>
<citation citation-type="web"><person-group person-group-type="author"><name><surname>McInnes</surname> <given-names>L</given-names></name> <name><surname>Healy</surname> <given-names>J</given-names></name> <name><surname>Melville</surname> <given-names>J</given-names></name></person-group>. <source>UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction</source>. (<year>2020</year>). Available online at: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1802.03426">http://arxiv.org/abs/1802.03426</ext-link> (accessed March 2, 2021).</citation></ref>
<ref id="B34">
<label>34.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Saarela</surname> <given-names>M</given-names></name> <name><surname>Jauhiainen</surname> <given-names>S</given-names></name></person-group>. <article-title>Comparison of feature importance measures as explanations for classification models</article-title>. <source>SN Appl Sci.</source> (<year>2021</year>) <volume>3</volume>:<fpage>272</fpage>. <pub-id pub-id-type="doi">10.1007/s42452-021-04148-9</pub-id></citation></ref>
<ref id="B35">
<label>35.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Valko</surname> <given-names>M</given-names></name> <name><surname>Hauskrecht</surname> <given-names>M</given-names></name></person-group>. <article-title>Feature importance analysis for patient management decisions</article-title>. <source>Stud Health Technol Inform.</source> (<year>2010</year>) <volume>160</volume>:<fpage>861</fpage>&#x02013;<lpage>5</lpage>. <pub-id pub-id-type="pmid">20841808</pub-id></citation></ref>
<ref id="B36">
<label>36.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Breiman</surname> <given-names>L</given-names></name> <name><surname>Friedman</surname> <given-names>JH</given-names></name> <name><surname>Olshen</surname> <given-names>RA</given-names></name> <name><surname>Stone</surname> <given-names>CJ</given-names></name></person-group>. <source>Classification and Regression Trees</source>. <edition>1th ed.</edition> <publisher-loc>Boca Raton, FL</publisher-loc>: <publisher-name>Chapman and Hall/CRC</publisher-name> (<year>1984</year>).</citation></ref>
<ref id="B37">
<label>37.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sundararajan</surname> <given-names>M</given-names></name> <name><surname>Taly</surname> <given-names>A</given-names></name> <name><surname>Yan</surname> <given-names>Q</given-names></name></person-group>. <article-title>Axiomatic attribution for deep networks</article-title>. In: <source>Proceedings of the 34th International Conference on Machine Learning - Volume 70</source>. <publisher-loc>Sydney, NSW</publisher-loc>: <publisher-name>ICML&#x00027;17</publisher-name>. p. <fpage>3319</fpage>&#x02013;<lpage>28</lpage> (<year>2017</year>).</citation></ref>
<ref id="B38">
<label>38.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cortes</surname> <given-names>C</given-names></name> <name><surname>Vapnik</surname> <given-names>V</given-names></name></person-group>. <article-title>Support-vector networks</article-title>. <source>Mach Learn.</source> (<year>1995</year>) <volume>20</volume>:<fpage>273</fpage>&#x02013;<lpage>97</lpage>. <pub-id pub-id-type="doi">10.1007/BF00994018</pub-id></citation></ref>
<ref id="B39">
<label>39.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Rasmussen</surname> <given-names>CE</given-names></name> <name><surname>Williams</surname> <given-names>CKI</given-names></name></person-group>. <source>Gaussian Processes for Machine Learning</source>. <publisher-loc>Cambridge</publisher-loc>: <publisher-name>MIT Press</publisher-name> (<year>2008</year>).</citation></ref>
<ref id="B40">
<label>40.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Criminisi</surname> <given-names>A</given-names></name> <name><surname>Konukoglu</surname> <given-names>E</given-names></name> <name><surname>Shotton</surname> <given-names>J</given-names></name></person-group>. <article-title>Decision forests for classification, regression, density estimation, manifold learning and semi-supervised learning</article-title>. <source>Micro Tech Rep.</source> (<year>2011</year>). <pub-id pub-id-type="doi">10.1561/9781601985415</pub-id></citation></ref>
<ref id="B41">
<label>41.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Freund</surname> <given-names>Y</given-names></name> <name><surname>Schapire</surname> <given-names>RE</given-names></name></person-group>. <article-title>A decision-theoretic generalization of on-line learning and an application to boosting</article-title>. <source>J Comp Syst Sci.</source> (<year>1997</year>) <volume>55</volume>:<fpage>119</fpage>&#x02013;<lpage>39</lpage>. <pub-id pub-id-type="doi">10.1006/jcss.1997.1504</pub-id></citation></ref>
<ref id="B42">
<label>42.</label>
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Srivastava</surname> <given-names>N</given-names></name> <name><surname>Hinton</surname> <given-names>G</given-names></name> <name><surname>Krizhevsky</surname> <given-names>A</given-names></name> <name><surname>Sutskever</surname> <given-names>I</given-names></name> <name><surname>Salakhutdinov</surname> <given-names>R</given-names></name></person-group>. <article-title>Dropout: a simple way to prevent neural networks from overfitting</article-title>. <source>J Mach Learn Res.</source> (<year>2014</year>) <volume>15</volume>:<fpage>1929</fpage>&#x02013;<lpage>58</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://dl.acm.org/doi/10.5555/2627435.2670313">https://dl.acm.org/doi/10.5555/2627435.2670313</ext-link></citation></ref>
<ref id="B43">
<label>43.</label>
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Ioffe</surname> <given-names>S</given-names></name> <name><surname>Szegedy</surname> <given-names>C</given-names></name></person-group>. <article-title>Batch normalization: accelerating deep network training by reducing internal covariate shift</article-title> (<year>2015</year>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1502.03167">http://arxiv.org/abs/1502.03167</ext-link> (accessed March 3, 2021).</citation></ref>
<ref id="B44">
<label>44.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shapiro</surname> <given-names>SS</given-names></name> <name><surname>Wilk</surname> <given-names>MB</given-names></name></person-group>. <article-title>An analysis of variance test for normality (Complete Samples)</article-title>. <source>Biometrika.</source> (<year>1965</year>) <volume>52</volume>:<fpage>591</fpage>. <pub-id pub-id-type="doi">10.1093/biomet/52.3-4.591</pub-id></citation></ref>
<ref id="B45">
<label>45.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mann</surname> <given-names>HB</given-names></name> <name><surname>Whitney</surname> <given-names>DR</given-names></name></person-group>. <article-title>On a test of whether one of two random variables is stochastically larger than the other</article-title>. <source>Ann Math Statist.</source> (<year>1947</year>) <volume>18</volume>:<fpage>50</fpage>&#x02013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.1214/aoms/1177730491</pub-id></citation></ref>
<ref id="B46">
<label>46.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kruskal</surname> <given-names>WH</given-names></name> <name><surname>Wallis</surname> <given-names>WA</given-names></name></person-group>. <article-title>Use of ranks in one-criterion variance analysis</article-title>. <source>J Am Statis Assoc.</source> (<year>1952</year>) <volume>47</volume>:<fpage>583</fpage>&#x02013;<lpage>621</lpage>. <pub-id pub-id-type="doi">10.1080/01621459.1952.10483441</pub-id></citation></ref>
<ref id="B47">
<label>47.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cressie</surname> <given-names>N</given-names></name> <name><surname>Read</surname> <given-names>TRC</given-names></name></person-group>. <article-title>Multinomial goodness-Of-Fit tests</article-title>. <source>J Royal Statis Soc Series.</source> (<year>1984</year>) <volume>46</volume>:<fpage>440</fpage>&#x02013;<lpage>64</lpage>. <pub-id pub-id-type="doi">10.1111/j.2517-6161.1984.tb01318.x</pub-id></citation></ref>
<ref id="B48">
<label>48.</label>
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Pedregosa</surname> <given-names>F</given-names></name> <name><surname>Varoquaux</surname> <given-names>G</given-names></name> <name><surname>Gramfort</surname> <given-names>A</given-names></name> <name><surname>Michel</surname> <given-names>V</given-names></name> <name><surname>Thirion</surname> <given-names>B</given-names></name> <name><surname>Grisel</surname> <given-names>O</given-names></name> <etal/></person-group>. <article-title>Scikit-learn: machine learning in python</article-title>. <source>J Mach Learn Res.</source> (<year>2011</year>) <volume>12</volume>:<fpage>2825</fpage>&#x02013;<lpage>30</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://dl.acm.org/doi/10.5555/1953048.2078195">https://dl.acm.org/doi/10.5555/1953048.2078195</ext-link></citation></ref>
<ref id="B49">
<label>49.</label>
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Reback</surname> <given-names>J</given-names></name> <name><surname>McKinney</surname> <given-names>W</given-names></name> Jbrockmendel <name><surname>Bossche</surname> <given-names>JVD</given-names></name> <name><surname>Augspurger</surname> <given-names>T</given-names></name> <name><surname>Cloud</surname> <given-names>P</given-names></name> <etal/></person-group>. <source>Pandas-dev/pandas: Pandas 1.2.3. Zenodo</source>. (<year>2021</year>). Available online at: <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/3509134">https://zenodo.org/record/3509134</ext-link></citation></ref>
<ref id="B50">
<label>50.</label>
<citation citation-type="journal"><person-group person-group-type="author"><collab>SciPy 1.0 Contributors</collab> <name><surname>Virtanen</surname> <given-names>P</given-names></name> <name><surname>Gommers</surname> <given-names>R</given-names></name> <name><surname>Oliphant</surname> <given-names>TE</given-names></name> <name><surname>Haberland</surname> <given-names>M</given-names></name> <name><surname>Reddy</surname> <given-names>T</given-names></name> <etal/></person-group>. <article-title>SciPy 1.0: fundamental algorithms for scientific computing in Python</article-title>. <source>Nat Meth.</source> (<year>2020</year>) <volume>17</volume>:<fpage>261</fpage>&#x02013;<lpage>72</lpage>. <pub-id pub-id-type="doi">10.1038/s41592-019-0686-2</pub-id><pub-id pub-id-type="pmid">32094914</pub-id></citation></ref>
<ref id="B51">
<label>51.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Paszke</surname> <given-names>A</given-names></name> <name><surname>Gross</surname> <given-names>S</given-names></name> <name><surname>Massa</surname> <given-names>F</given-names></name> <name><surname>Lerer</surname> <given-names>A</given-names></name> <name><surname>Bradbury</surname> <given-names>J</given-names></name> <name><surname>Chanan</surname> <given-names>G</given-names></name> <etal/></person-group>. <article-title>PyTorch: an imperative style, high-performance deep learning library</article-title>. In: <person-group person-group-type="editor"><name><surname>Wallach</surname> <given-names>H</given-names></name> <name><surname>Larochelle</surname> <given-names>H</given-names></name> <name><surname>Beygelzimer</surname> <given-names>A</given-names></name> <name><surname>Alch&#x000E9;-Buc</surname> <given-names>F</given-names></name> <name><surname>Fox</surname> <given-names>E</given-names></name> <name><surname>Garnett</surname> <given-names>R</given-names></name></person-group>, editors. <source>Advances in Neural Information Processing Systems 32.</source> <publisher-loc>Vancouver, BC</publisher-loc>: <publisher-name>Curran Associates, Inc</publisher-name>. p <fpage>8026</fpage>&#x02013;<lpage>37</lpage>.</citation></ref>
<ref id="B52">
<label>52.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Seabold</surname> <given-names>S</given-names></name> <name><surname>Perktold</surname> <given-names>J</given-names></name></person-group>. <article-title>Statsmodels: econometric and statistical modeling with python</article-title>. In: <source>9th Python in Science Conference</source>. <publisher-loc>Austin, TX</publisher-loc> (<year>2010</year>).</citation></ref>
<ref id="B53">
<label>53.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vallat</surname> <given-names>R</given-names></name></person-group>. <article-title>Pingouin: statistics in Python</article-title>. <source>JOSS.</source> (<year>2018</year>) <volume>3</volume>:<fpage>1026</fpage>. <pub-id pub-id-type="doi">10.21105/joss.01026</pub-id></citation></ref>
<ref id="B54">
<label>54.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Box</surname> <given-names>GEP</given-names></name> <name><surname>Cox</surname> <given-names>DR</given-names></name></person-group>. <article-title>An analysis of transformations</article-title>. <source>J Royal Statis Soc Series B.</source> (<year>1964</year>) <volume>26</volume>:<fpage>211</fpage>&#x02013;<lpage>43</lpage>. <pub-id pub-id-type="doi">10.1111/j.2517-6161.1964.tb00553.x</pub-id></citation></ref>
<ref id="B55">
<label>55.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wilson</surname> <given-names>G</given-names></name> <name><surname>Cook</surname> <given-names>DJ</given-names></name></person-group>. <article-title>A survey of unsupervised deep domain adaptation</article-title>. <source>ACM Trans Intell Syst Technol.</source> (<year>2020</year>) <volume>11</volume>:<fpage>1</fpage>&#x02013;<lpage>46</lpage>. <pub-id pub-id-type="doi">10.1145/3400066</pub-id></citation></ref>
<ref id="B56">
<label>56.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yeo</surname> <given-names>I-K</given-names></name> <name><surname>Johnson</surname> <given-names>RA</given-names></name></person-group>. <article-title>A new family of power transformations to improve normality or symmetry</article-title>. <source>Biometrika.</source> (<year>2000</year>) <volume>87</volume>:<fpage>954</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1093/biomet/87.4.954</pub-id></citation></ref>
<ref id="B57">
<label>57.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mandel</surname> <given-names>JSP</given-names></name></person-group>. <article-title>A comparison of six methods for missing data imputation</article-title>. <source>J Biom Biostat.</source> (<year>2015</year>) <volume>06</volume>:<fpage>1</fpage>&#x02013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.4172/2155-6180.1000224</pub-id><pub-id pub-id-type="pmid">30671242</pub-id></citation></ref>
<ref id="B58">
<label>58.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Azur</surname> <given-names>MJ</given-names></name> <name><surname>Stuart</surname> <given-names>EA</given-names></name> <name><surname>Frangakis</surname> <given-names>C</given-names></name> <name><surname>Leaf</surname> <given-names>PJ</given-names></name></person-group>. <article-title>Multiple imputation by chained equations: what is it and how does it work? Multiple imputation by chained equations</article-title>. <source>Int J Methods Psychiatr Res.</source> (<year>2011</year>) <volume>20</volume>:<fpage>40</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1002/mpr.329</pub-id><pub-id pub-id-type="pmid">21499542</pub-id></citation></ref>
<ref id="B59">
<label>59.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>He</surname> <given-names>X</given-names></name> <name><surname>Zhao</surname> <given-names>K</given-names></name> <name><surname>Chu</surname> <given-names>X</given-names></name></person-group>. <article-title>AutoML: a survey of the state-of-the-art</article-title>. <source>Knowledg Based Syst.</source> (<year>2021</year>) <volume>212</volume>:<fpage>106622</fpage>. <pub-id pub-id-type="doi">10.1016/j.knosys.2020.106622</pub-id></citation></ref>
<ref id="B60">
<label>60.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Strupp</surname> <given-names>M</given-names></name> <name><surname>Kim</surname> <given-names>J-S</given-names></name> <name><surname>Murofushi</surname> <given-names>T</given-names></name> <name><surname>Straumann</surname> <given-names>D</given-names></name> <name><surname>Jen</surname> <given-names>JC</given-names></name> <name><surname>Rosengren</surname> <given-names>SM</given-names></name> <etal/></person-group>. <article-title>Bilateral vestibulopathy: diagnostic criteria consensus document of the classification committee of the b&#x000E1;r&#x000E1;ny society</article-title>. <source>VES.</source> (<year>2017</year>) <volume>27</volume>:<fpage>177</fpage>&#x02013;<lpage>89</lpage>. <pub-id pub-id-type="doi">10.3233/VES-170619</pub-id><pub-id pub-id-type="pmid">29081426</pub-id></citation></ref>
<ref id="B61">
<label>61.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Decker</surname> <given-names>J</given-names></name> <name><surname>Limburg</surname> <given-names>K</given-names></name> <name><surname>Henningsen</surname> <given-names>P</given-names></name> <name><surname>Lahmann</surname> <given-names>C</given-names></name> <name><surname>Brandt</surname> <given-names>T</given-names></name> <name><surname>Dieterich</surname> <given-names>M</given-names></name></person-group>. <article-title>Intact vestibular function is relevant for anxiety related to vertigo</article-title>. <source>J Neurol.</source> (<year>2019</year>) <volume>266</volume>:<fpage>89</fpage>&#x02013;<lpage>92</lpage>. <pub-id pub-id-type="doi">10.1007/s00415-019-09351-8</pub-id><pub-id pub-id-type="pmid">31073714</pub-id></citation></ref>
<ref id="B62">
<label>62.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dieterich</surname> <given-names>M</given-names></name> <name><surname>Staab</surname> <given-names>JP</given-names></name></person-group>. <article-title>Functional dizziness: from phobic postural vertigo and chronic subjective dizziness to persistent postural-perceptual dizziness</article-title>. <source>Curr Opin Neurol.</source> (<year>2017</year>) <volume>30</volume>:<fpage>107</fpage>&#x02013;<lpage>13</lpage>. <pub-id pub-id-type="doi">10.1097/WCO.0000000000000417</pub-id></citation>
</ref>
<ref id="B63">
<label>63.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lahmann</surname> <given-names>C</given-names></name> <name><surname>Henningsen</surname> <given-names>P</given-names></name> <name><surname>Brandt</surname> <given-names>T</given-names></name> <name><surname>Strupp</surname> <given-names>M</given-names></name> <name><surname>Jahn</surname> <given-names>K</given-names></name> <name><surname>Dieterich</surname> <given-names>M</given-names></name> <etal/></person-group>. <article-title>Psychiatric comorbidity and psychosocial impairment among patients with vertigo and dizziness</article-title>. <source>J Neurol Neurosurg Psychiatry.</source> (<year>2015</year>) <volume>86</volume>:<fpage>302</fpage>&#x02013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1136/jnnp-2014-307601</pub-id><pub-id pub-id-type="pmid">24963122</pub-id></citation></ref>
<ref id="B64">
<label>64.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huppert</surname> <given-names>D</given-names></name> <name><surname>Strupp</surname> <given-names>M</given-names></name> <name><surname>Brandt</surname> <given-names>T</given-names></name></person-group>. <article-title>Long-term course of Meni&#x000E8;re&#x00027;s disease revisited</article-title>. <source>Acta Oto Laryngol.</source> (<year>2010</year>) <volume>130</volume>:<fpage>644</fpage>&#x02013;<lpage>51</lpage>. <pub-id pub-id-type="doi">10.3109/00016480903382808</pub-id><pub-id pub-id-type="pmid">20001444</pub-id></citation></ref>
<ref id="B65">
<label>65.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Radtke</surname> <given-names>A</given-names></name> <name><surname>von Brevern</surname> <given-names>M</given-names></name> <name><surname>Neuhauser</surname> <given-names>H</given-names></name> <name><surname>Hottenrott</surname> <given-names>T</given-names></name> <name><surname>Lempert</surname> <given-names>T</given-names></name></person-group>. <article-title>Vestibular migraine: long-term follow-up of clinical symptoms and vestibulo-cochlear findings</article-title>. <source>Neurology.</source> (<year>2012</year>) <volume>79</volume>:<fpage>1607</fpage>&#x02013;<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1212/WNL.0b013e31826e264f</pub-id><pub-id pub-id-type="pmid">23019266</pub-id></citation></ref>
<ref id="B66">
<label>66.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lempert</surname> <given-names>T</given-names></name> <name><surname>Olesen</surname> <given-names>J</given-names></name> <name><surname>Furman</surname> <given-names>J</given-names></name> <name><surname>Waterston</surname> <given-names>J</given-names></name> <name><surname>Seemungal</surname> <given-names>B</given-names></name> <name><surname>Carey</surname> <given-names>J</given-names></name> <etal/></person-group>. <article-title>Vestibular migraine: diagnostic criteria</article-title>. <source>J Vest Res.</source> (<year>2012</year>) <volume>22</volume>:<fpage>167</fpage>&#x02013;<lpage>72</lpage>. <pub-id pub-id-type="doi">10.3233/VES-2012-0453</pub-id></citation></ref>
<ref id="B67">
<label>67.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lopez-Escamez</surname> <given-names>JA</given-names></name> <name><surname>Dlugaiczyk</surname> <given-names>J</given-names></name> <name><surname>Jacobs</surname> <given-names>J</given-names></name> <name><surname>Lempert</surname> <given-names>T</given-names></name> <name><surname>Teggi</surname> <given-names>R</given-names></name> <name><surname>von Brevern</surname> <given-names>M</given-names></name> <etal/></person-group>. <article-title>Accompanying symptoms overlap during attacks in menieres disease and vestibular migraine</article-title>. <source>Front Neurol.</source> (<year>2014</year>) <volume>5</volume>:<fpage>265</fpage>. <pub-id pub-id-type="doi">10.3389/fneur.2014.00265</pub-id></citation></ref>
<ref id="B68">
<label>68.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Soto-Varela</surname> <given-names>A</given-names></name> <name><surname>Ar&#x000E1;n-Gonz&#x000E1;lez</surname> <given-names>I</given-names></name> <name><surname>L&#x000F3;pez-Esc&#x000E1;mez</surname> <given-names>JA</given-names></name> <name><surname>Morera-P&#x000E9;rez</surname> <given-names>C</given-names></name> <name><surname>Oliva-Dom&#x000ED;nguez</surname> <given-names>M</given-names></name> <name><surname>P&#x000E9;rez-Fern&#x000E1;ndez</surname> <given-names>N</given-names></name> <etal/></person-group>. <article-title>Peripheral vertigo classification of the otoneurology committee of the spanish otorhinolaryngology society: diagnostic agreement and update (Version 2-2011)</article-title>. <source>Acta Otorrinolaringol.</source> (<year>2012</year>) <volume>63</volume>:<fpage>125</fpage>&#x02013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.1016/j.otoeng.2012.03.011</pub-id><pub-id pub-id-type="pmid">22169589</pub-id></citation></ref>
<ref id="B69">
<label>69.</label>
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Marinescu</surname> <given-names>RV</given-names></name> <name><surname>Oxtoby</surname> <given-names>NP</given-names></name> <name><surname>Young</surname> <given-names>AL</given-names></name> <name><surname>Bron</surname> <given-names>EE</given-names></name> <name><surname>Toga</surname> <given-names>AW</given-names></name> <name><surname>Weiner</surname> <given-names>MW</given-names></name> <etal/></person-group>. <article-title>TADPOLE challenge: prediction of longitudinal evolution in Alzheimer&#x00027;s disease</article-title> (<year>2018</year>). Available online at: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1805.03909">http://arxiv.org/abs/1805.03909</ext-link> (accessed June 18, 2021).</citation></ref>
<ref id="B70">
<label>70.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gray</surname> <given-names>KR</given-names></name> <name><surname>Aljabar</surname> <given-names>P</given-names></name> <name><surname>Heckemann</surname> <given-names>RA</given-names></name> <name><surname>Hammers</surname> <given-names>A</given-names></name> <name><surname>Rueckert</surname> <given-names>D</given-names></name></person-group>. <article-title>Random forest-based similarity measures for multi-modal classification of Alzheimer&#x00027;s disease</article-title>. <source>NeuroImage.</source> (<year>2013</year>) <volume>65</volume>:<fpage>167</fpage>&#x02013;<lpage>75</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2012.09.065</pub-id><pub-id pub-id-type="pmid">23041336</pub-id></citation></ref>
</ref-list>
<app-group>
<title>Appendix</title>
<app id="A1">
<sec>
<title>Appendix A. Supplementary Experiments on TADPOLE Dataset</title>
<sec>
<title>Data Description</title>
<p>TADOLE (<xref ref-type="bibr" rid="B69">69</xref>) is an ADNI-based dataset consisting of imaging-derived features and non-imaging features. The task is to classify whether observations at a baseline timepoint are from healthy normal controls (NC), patients with mild cognitive impairment (MCI), and Alzheimer&#x00027;s disease (AD). It consists of 813 instances (229 NC, 396 MCI, and 188 AD). Imaging features are computed using standard ADNI feature extraction pipelines.</p>
</sec>
<sec>
<title>Results and Discussion</title>
<p>We evaluated all models on this dataset as supplementary experiment to understand the strengths and limitations of our proposed model. For our purposes we only look at the F1-score, as this metric is more robust to class imbalance, which is present in TADPOLE. We observe that the best performing models are the hyper-parameter-optimized tree-based models such as Random Forest and AdaBoostClassifier. Furthermore, neural network based models such as MLPClassifier and MGMC yield comparable results but do not outperform other models. We also observe from the confusion matrices that the biggest source of error in most models is to distinguish patients with diagnosed AD from patients with MCI. Likewise, the confusion matrices reveal that models almost never mistake healthy controls with AD patients and vice-versa. Overall almost all models perform comparably, except notable mis-classification rates in KNeighborClassifier and GaussianProcessClassifier. Our obtained classification results of &#x0007E;0.6&#x02013;0.7 F1-score are in line with recent literature, e.g., our previous comparison of MGMC with regular machine learning classifiers [cf. results in (13), not yet computed with base-ml], or RF-based AD classification by Gray et al. (<xref ref-type="bibr" rid="B70">70</xref>).</p>
</sec>
</sec>
</app>
<app id="A2">
<sec>
<title>Appendix B. Supplementary Experiments on Generated Dataset</title>
<sec>
<title>Data Description</title>
<p>To further illustrate the utility of base-ml, we created a synthetic dataset for a binary classification task. We generated 5,000 samples with 20 features of which 10 features are informative and the remaining 10 are uninformative using Scikit-learn (48) (using the built-in function &#x0003C; make_classification&#x0003E;). It is important to note that by design, this classification task has a non-linear separation boundary between the two classes and can therefore not be solved with high accuracy by linear classifier models.</p>
</sec>
<sec>
<title>Results and Discussion</title>
<p>As can be seen in <xref ref-type="fig" rid="F2">Figure A2</xref>, most non-linear models based on neural networks and properly tuned tree-based models such as Random Forest could yield comparable performance. When looking at the classification accuracy of both MGMC and Random Forest, both perform nearly identically, and with the highest accuracies among all models. As expected, the linear models such as Logistic Regression and Linear Discriminant Analysis obtained the lowest classification accuracy. Overall, we observe that base-ml properly reflects the statistical properties and the difficulty of this artificial classification problem. The source data distributions are not simply separable by topology mapping (see UMAP embedding), and the separation is only resolvable by selected and properly tuned non-linear models &#x02013; this characteristic would not have been detected by an analysis that was limited to linear models, or less suited non-linear models (e.g., for this dataset: Decision Tree Classifier or AdaBoost Classifier).</p>
<fig id="F5" position="float">
<label>Figure A1</label>
<caption><p>Results panel produced by base-ml for TADPOLE.</p></caption>
<graphic xlink:href="fneur-12-681140-g0005.tif"/>
</fig>
<fig id="F6" position="float">
<label>Figure A2</label>
<caption><p>Results panel produced by base-ml for Synthetic data.</p></caption>
<graphic xlink:href="fneur-12-681140-g0006.tif"/>
</fig>
</sec>
</sec>
</app>
<app id="A3">
<sec>
<title>Appendix C. Implementation Details: Hyperparameter Search Ranges</title>
<p>To have a more comparable analysis, we selected the best hyperparameters using the validation set, before reporting performance metrics on a with-held test-set (nested cross-validation). We do this by randomly searching the hyperparameter space for 100 iterations for every model and select the best hyperparameters which yields the best validation set classification performance.</p>
<p>For Logistic Regression we used the following hyperparameters (C: randint(1, 11); penalty: {&#x0201C;elasticnet&#x0201D;}, solver: {&#x0201C;saga&#x0201D;}, l1-ratio: uniform(0, 1));</p>
<p>Random Forest (max_depth: {3, None}; max_features: randint(1, 11); min_sample_split: randint(2, 11); bootstrap: {True, False}; criterion: {&#x0201C;gini&#x0201D;, &#x0201C;entropy&#x0201D;}, n_estimators: randint(5, 50));</p>
<p>K-Neighbors Classifier (n_neighbors: randint(3, 100); weights: {&#x0201C;uniform&#x0201D;, &#x0201C;distance&#x0201D;});</p>
<p>SVC (C: log_uniform(1e&#x02212;6, 1e&#x0002B;6); gamma log_uniform(1e&#x02212;6, 1e&#x0002B;6); degree: randint(1, 8), kernel: {&#x0201C;linear&#x0201D;, &#x0201C;poly&#x0201D;, &#x0201C;rbf&#x0201D;});</p>
<p>Decision trees (max_depth: {3, None}; max_features: randint(1, 11); min_samples_split: randin(2, 11); criterion: {&#x0201C;gini&#x0201D;, &#x0201C;entropy&#x0201D;};</p>
<p>Gaussian Process Classifier (kernel: {1&#x0002A;RBF(), 1&#x0002A;DotProduct(), 1&#x0002A;Matern(), 1&#x0002A;RationalQuadratic(), 1&#x0002A;WhiteKernel()};</p>
<p>AdaBoostClassifier: (n_estimators: {50, 60, 70, 80, 90, 100}, learning-rate: {0.5, 0.8, 1.0, 1.3});</p>
<p>GaussianNB (var_smoothing: logspace(0, 9, num=100));</p>
<p>Linear Discriminant Analysis (solver: {&#x0201C;svd&#x0201D;, &#x0201C;lsqr&#x0201D;, &#x0201C;eigen&#x0201D;}; shrinkage: numpy.arange(0, 1, 0.01));</p>
<p>MLP Classifier (learning-rate: {1e&#x02212;1, 1e&#x02212;2, 1e&#x02212;3, 1e&#x02212;4}; hidden-units: {32, 64, 128}, dropout probability: {0.0, 0.1, 0.2, 0.3, 0.4, 0.5});</p>
<p>MGMC ([cross-entropy, Frobenius-norm, Dirichlet-norm weighting]: uniform(0.001, 1000)).</p>
</sec>
</app>
</app-group>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>Base-ml source code and documentation: <ext-link ext-link-type="uri" xlink:href="https://github.com/pydsgz/base-ml">https://github.com/pydsgz/base-ml</ext-link>.</p></fn>
<fn id="fn0002"><p><sup>2</sup>Scorch source code and documentation: <ext-link ext-link-type="uri" xlink:href="https://github.com/skorch-dev/skorch">https://github.com/skorch-dev/skorch</ext-link></p></fn>
<fn id="fn0003"><p><sup>3</sup>Captum source code and documentation: <ext-link ext-link-type="uri" xlink:href="https://github.com/pytorch/captum">https://github.com/pytorch/captum</ext-link></p></fn>
</fn-group>
<fn-group>
<fn fn-type="financial-disclosure"><p><bold>Funding.</bold> This study was supported by the German Federal Ministry of Education and Research (BMBF) in connection with the foundation of the German Center for Vertigo and Balance Disorders (DSGZ) (grant number 01 EO 0901).</p>
</fn>
</fn-group>
</back>
</article>
