<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neurol.</journal-id>
<journal-title>Frontiers in Neurology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neurol.</abbrev-journal-title>
<issn pub-type="epub">1664-2295</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fneur.2021.678484</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neurology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Ranking the Predictive Power of Clinical and Biological Features Associated With Disease Progression in Huntington&#x00027;s Disease</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Ghazaleh</surname> <given-names>Naghmeh</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/352871/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Houghton</surname> <given-names>Richard</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1299245/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Palermo</surname> <given-names>Giuseppe</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Schobel</surname> <given-names>Scott A.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Wijeratne</surname> <given-names>Peter A.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1191335/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Long</surname> <given-names>Jeffrey D.</given-names></name>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<xref ref-type="aff" rid="aff5"><sup>5</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/84273/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>F. Hoffmann-La Roche Ltd.</institution>, <addr-line>Basel</addr-line>, <country>Switzerland</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Computer Science, Centre for Medical Imaging Computing, University College London</institution>, <addr-line>London</addr-line>, <country>United Kingdom</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Neurodegenerative Disease, Huntington&#x00027;s Disease Research Centre, Queen Square Institute of Neurology, University College London</institution>, <addr-line>London</addr-line>, <country>United Kingdom</country></aff>
<aff id="aff4"><sup>4</sup><institution>Department of Psychiatry, University of Iowa</institution>, <addr-line>Iowa City, IA</addr-line>, <country>United States</country></aff>
<aff id="aff5"><sup>5</sup><institution>Department of Biostatistics, University of Iowa</institution>, <addr-line>Iowa City, IA</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Emilia Mabel Gatto, Sanatorio de la Trinidad Mitre, Argentina</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Zhong Pei, Sun Yat-Sen University, China; Yi-Ting Hsu, China Medical University, Taiwan</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Jeffrey D. Long <email>jeffrey-long&#x00040;uiowa.edu</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Movement Disorders, a section of the journal Frontiers in Neurology</p></fn></author-notes>
<pub-date pub-type="epub">
<day>20</day>
<month>05</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>12</volume>
<elocation-id>678484</elocation-id>
<history>
<date date-type="received">
<day>09</day>
<month>03</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>26</day>
<month>04</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2021 Ghazaleh, Houghton, Palermo, Schobel, Wijeratne and Long.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Ghazaleh, Houghton, Palermo, Schobel, Wijeratne and Long</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract><p>Huntington&#x00027;s disease (HD) is characterised by a triad of cognitive, behavioural, and motor symptoms which lead to functional decline and loss of independence. With potential disease-modifying therapies in development, there is interest in accurately measuring HD progression and characterising prognostic variables to improve efficiency of clinical trials. Using the large, prospective Enroll-HD cohort, we investigated the relative contribution and ranking of potential prognostic variables in patients with manifest HD. A random forest regression model was trained to predict change of clinical outcomes based on the variables, which were ranked based on their contribution to the prediction. The highest-ranked variables included novel predictors of progression&#x02014;being accompanied at clinical visit, cognitive impairment, age at diagnosis and tetrabenazine or antipsychotics use&#x02014;in addition to established predictors, cytosine adenine guanine (CAG) repeat length and CAG-age product. The novel prognostic variables improved the ability of the model to predict clinical outcomes and may be candidates for statistical control in HD clinical studies.</p></abstract>
<kwd-group>
<kwd>Huntington&#x00027;s disease</kwd>
<kwd>disease progression</kwd>
<kwd>prognostic variables</kwd>
<kwd>machine learning</kwd>
<kwd>random forest</kwd>
</kwd-group>
<contract-sponsor id="cn001">F. Hoffmann-La Roche<named-content content-type="fundref-id">10.13039/100007013</named-content></contract-sponsor>
<counts>
<fig-count count="2"/>
<table-count count="5"/>
<equation-count count="0"/>
<ref-count count="30"/>
<page-count count="8"/>
<word-count count="5189"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>Huntington&#x00027;s disease (HD) is a rare, genetic, neurodegenerative disease caused by a cytosine adenine guanine (CAG) repeat expansion variant of the huntingtin gene (<italic>HTT</italic>) (<xref ref-type="bibr" rid="B1">1</xref>) and is characterised by a triad of cognitive, behavioural, and motor symptoms (<xref ref-type="bibr" rid="B2">2</xref>, <xref ref-type="bibr" rid="B3">3</xref>). Disease onset, defined as the onset of motor signs and symptoms as measured by a Diagnostic Confidence Level of 4 (<xref ref-type="bibr" rid="B3">3</xref>, <xref ref-type="bibr" rid="B4">4</xref>), typically occurs in the prime of life, between the ages of 30 and 50 years (<xref ref-type="bibr" rid="B2">2</xref>). HD is associated with increasing disability, worsening of function and loss of independence, leading to death within approximately 15 years of onset (<xref ref-type="bibr" rid="B2">2</xref>, <xref ref-type="bibr" rid="B5">5</xref>). Motor and cognitive symptoms deteriorate steadily as the disease progresses (<xref ref-type="bibr" rid="B3">3</xref>, <xref ref-type="bibr" rid="B6">6</xref>&#x02013;<xref ref-type="bibr" rid="B9">9</xref>), while behavioural symptoms tend to be episodic (<xref ref-type="bibr" rid="B10">10</xref>).</p>
<p>With potential disease-modifying therapies for HD in clinical development (<xref ref-type="bibr" rid="B11">11</xref>), there is interest in measuring disease progression and characterising prognostic variables in order to improve the efficiency and accuracy of clinical trials (<xref ref-type="bibr" rid="B12">12</xref>). Prognostic variables can be used to identify a patient population through an enrichment strategy to reduce interpatient variability in clinical trials or alternatively to enrich for faster progressors, and to eventually inform the optimum time to start treatment (<xref ref-type="bibr" rid="B12">12</xref>). Statistically controlling for prognostic baseline variables may also be important in non-randomised (e.g., open-label) studies as they could confound the relationship between treatment exposure and outcomes. Additionally, when testing hypotheses in randomised studies, the probability of detecting a treatment effect will usually increase by including prognostic variables as covariates in the analysis, as this would explain a significant amount of variability observed due to random error.</p>
<p>Large prospective cohort studies have shown that manifestations of progression, that is, clinical signs and symptoms of HD, as well as known biological predictors of progression such as CAG repeat length and CAG-age product (CAP) score, can predict clinical progression or motor onset (<xref ref-type="bibr" rid="B7">7</xref>, <xref ref-type="bibr" rid="B13">13</xref>). However, no study has systematically ranked the importance of predictors of progression in a manifest HD population (i.e., after the onset of unequivocal motor symptoms).</p>
<p>Random forest (RF) regression models permit interrogation of large, complex clinical datasets to capture non-linear associations between multidimensional predictive variables and clinical outcomes with high predictive accuracy (<xref ref-type="bibr" rid="B14">14</xref>, <xref ref-type="bibr" rid="B15">15</xref>). RF approaches are well-suited to classification and regression problems, such as identifying variables with predictive potential for disease progression from clinical datasets. Here, we use modern machine learning methods to examine a large number of HD variables to identify the most important predictors of progression on five clinical outcomes: total functional capacity (TFC), a measure of function; stroop word reading (SWR), a measure of attention and psychomotor processing speed; symbol digit modalities test (SDMT), a measure of executive function, visuo-spatial working memory, attention and processing speed; total motor score (TMS), a measure of motor function; and the composite unified HD rating scale (cUHDRS), an equally weighted composite outcome measure of the TFC, TMS, SDMT, and SWR that was developed based on an early manifest HD population (<xref ref-type="bibr" rid="B16">16</xref>). The large prospective Enroll-HD cohort (NCT01574053) is used to investigate the relative contribution and ranking of potential prognostic variables to predict clinical progression in a clinical trial-like manifest HD population.</p></sec>
<sec sec-type="results" id="s2">
<title>Results</title>
<p>The analysis included 1,608 individuals meeting typical criteria for clinical trials in manifest HD and with CAG repeats between 36 and 64 (filtering criteria shown in <xref ref-type="table" rid="T1">Table 1</xref>). Patient demographics are shown in <xref ref-type="table" rid="T2">Table 2</xref>.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Attrition table showing number of patients included after applying filters for each inclusion criterium.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Total initial population in the dataset: 15,301</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1) Age at baseline = 25&#x02013;65 years; diagnosis age &#x0003E;20 years: <bold>6,432</bold></td>
</tr>
<tr>
<td valign="top" align="left">2) Manifest only (HDCAT = 3): <bold>6,025</bold></td>
</tr>
<tr>
<td valign="top" align="left">3) DCL = 4: <bold>5,694</bold></td>
</tr>
<tr>
<td valign="top" align="left">4) IS at baseline 100 &#x02265; IS &#x0003E; 70: <bold>4,277</bold></td>
</tr>
<tr>
<td valign="top" align="left">5) At least 2 year&#x00027;s follow-up score for all four outcomes: <bold>1,608</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Totals are for subjects with complete data on all outcome scores</italic>.</p>
<p><italic>DCL, diagnostic confidence level; HDCAT, HD category; IS, Independence Scale</italic>.</p>
<p><italic>The bold values indicate the number of patients included after applying each inclusion criterium</italic>.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Patient demographics.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Demographic</bold></th>
<th valign="top" align="center"><bold><italic>N</italic> &#x0003D; 1,608</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Age, years, mean (SD)</td>
<td valign="top" align="center">49.60 (9.32)</td>
</tr>
<tr>
<td valign="top" align="left">Male sex, <italic>n</italic> (%)</td>
<td valign="top" align="center">819 (50.9)</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Region</bold>, <italic><bold>n</bold></italic> <bold>(%)</bold></td>
</tr>
<tr>
<td valign="top" align="left">Australasia</td>
<td valign="top" align="center">69 (4.3)</td>
</tr>
<tr>
<td valign="top" align="left">Europe</td>
<td valign="top" align="center">1,068 (66.4)</td>
</tr>
<tr>
<td valign="top" align="left">Latin America</td>
<td valign="top" align="center">9 (0.6)</td>
</tr>
<tr>
<td valign="top" align="left">North America</td>
<td valign="top" align="center">462 (28.7)</td>
</tr>
<tr>
<td valign="top" align="left">CAG repeat length, mean (SD)</td>
<td valign="top" align="center">43.93 (3.04)</td>
</tr>
<tr>
<td valign="top" align="left">CAP score, mean (%)</td>
<td valign="top" align="center">488.18 (82.35)</td>
</tr>
<tr>
<td valign="top" align="left">Shoulsen and Fahn Stage at baseline, <italic>n</italic> (%)</td>
<td valign="top" align="center">Stage I: 716 (44.53)<break/> Stage II: 731 (45.49)<break/> Stage III: 159 (9.89)<break/> Stage IIII: 2 (0.12)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>CAG, cytosine adenine guanine; CAP, CAG-age product; SD, standard deviation</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>The highest-ranked variables predictive of disease progression for each outcome are shown in <xref ref-type="fig" rid="F1">Figure 1</xref> and the top 10 variables for each outcome shown in <xref ref-type="table" rid="T3">Table 3</xref>. CAP was found to be the most predictive variable for all outcomes and CAG repeat length was ranked as the second most important variable for all outcomes. Other prognostic variables associated with faster progression that ranked in the top 10 for at least three of the five outcomes were: age at diagnosis (all but SWR and TFC), being accompanied to clinic visits (for all outcomes), history of cognitive impairment (all but SWR), tetrabenazine use (for all outcomes) and antipsychotics use (all but TMS). The effect of these variables on disease progression trajectory as measured by the cUHDRS is shown in <xref ref-type="fig" rid="F2">Figure 2</xref>.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Rankings of predictors of clinical progression as measured by <bold>(A)</bold> cUHDRS; <bold>(B)</bold> TMS; <bold>(C)</bold> SDMT; <bold>(D)</bold> SWR; <bold>(E)</bold> TFC. Boxplots are shown with the upper box edge representing the 75<sup>th</sup> quantile and the &#x0201C;whisker&#x0201D; extending to 1.5 times the IQR. A circle is an outlier, defined as a ranking that extends beyond a whisker. BMI, body mass index; CAG, cytosine adenine guanine; CAP, CAG-age product; cUHDRS, composite Unified HD Rating Scale; ENT, ear, nose, throat; IQR, interquartile range; MH, mental health; SDMT, symbol digit modalities test; SWR, stroop word reading; TFC, total functional capacity; TMS, total motor score.</p></caption>
<graphic xlink:href="fneur-12-678484-g0001.tif"/>
</fig>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Top 10 predictive variables for each outcome.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Rank</bold></th>
<th valign="top" align="center" colspan="5" style="border-bottom: thin solid #000000;"><bold>Outcome</bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="left"><bold>cUHDRS</bold></th>
<th valign="top" align="left"><bold>TMS</bold></th>
<th valign="top" align="left"><bold>TFC</bold></th>
<th valign="top" align="left"><bold>SDMT</bold></th>
<th valign="top" align="left"><bold>SWR</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="left">Baseline CAP</td>
<td valign="top" align="left">Baseline CAP</td>
<td valign="top" align="left">Baseline CAP</td>
<td valign="top" align="left">Baseline CAP</td>
<td valign="top" align="left">Baseline CAP</td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="left">CAG repeats (affected allele)</td>
<td valign="top" align="left">CAG repeats (affected allele)</td>
<td valign="top" align="left">CAG repeats (affected allele)</td>
<td valign="top" align="left">CAG repeats (affected allele)</td>
<td valign="top" align="left">CAG repeats (affected allele)</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="left">Accompanied to clinical visit</td>
<td valign="top" align="left">Accompanied to clinical visit</td>
<td valign="top" align="left">Accompanied to clinical visit</td>
<td valign="top" align="left">Tetrabenazine use</td>
<td valign="top" align="left">Accompanied to clinical visit</td>
</tr>
<tr>
<td valign="top" align="left">4</td>
<td valign="top" align="left">Tetrabenazine use</td>
<td valign="top" align="left">Tetrabenazine use</td>
<td valign="top" align="left">Unaccompanied to clinical visit</td>
<td valign="top" align="left">Antipsychotics use</td>
<td valign="top" align="left">Antipsychotics use</td>
</tr>
<tr>
<td valign="top" align="left">5</td>
<td valign="top" align="left">Antipsychotics use</td>
<td valign="top" align="left">Significant cognitive impairment or dementia</td>
<td valign="top" align="left">Tetrabenazine use</td>
<td valign="top" align="left">Significant cognitive impairment or dementia</td>
<td valign="top" align="left">Unaccompanied to clinical visit</td>
</tr>
<tr>
<td valign="top" align="left">6</td>
<td valign="top" align="left">Unaccompanied to clinical visit</td>
<td valign="top" align="left">Age at diagnosis</td>
<td valign="top" align="left">History of apathy</td>
<td valign="top" align="left">Accompanied to clinical visit</td>
<td valign="top" align="left">Tetrabenazine use</td>
</tr>
<tr>
<td valign="top" align="left">7</td>
<td valign="top" align="left">Significant cognitive impairment or dementia</td>
<td valign="top" align="left">BMI</td>
<td valign="top" align="left">Antipsychotics use</td>
<td valign="top" align="left">Age at diagnosis</td>
<td valign="top" align="left">Age at diagnosis</td>
</tr>
<tr>
<td valign="top" align="left">8</td>
<td valign="top" align="left">History of apathy</td>
<td valign="top" align="left">Unaccompanied to clinical visit</td>
<td valign="top" align="left">Speech therapy</td>
<td valign="top" align="left">History of drug abuse</td>
<td valign="top" align="left">Anti-epileptics use</td>
</tr>
<tr>
<td valign="top" align="left">9</td>
<td valign="top" align="left">Age at diagnosis</td>
<td valign="top" align="left">Age</td>
<td valign="top" align="left">History of perseverative obsessive MH behaviours</td>
<td valign="top" align="left">Region&#x02014;North America</td>
<td valign="top" align="left">Occupational therapy</td>
</tr>
<tr>
<td valign="top" align="left">10</td>
<td valign="top" align="left">Mix of accompanied and unaccompanied to clinical visit</td>
<td valign="top" align="left">Mix of accompanied and unaccompanied to clinical visit</td>
<td valign="top" align="left">Significant cognitive impairment or dementia</td>
<td valign="top" align="left">Comorbidities&#x02014;musculoskeletal</td>
<td valign="top" align="left">Education&#x02014;upper secondary</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>BMI, body mass index; CAG, cytosine adenine guanine; CAP, CAG-age product; cUHDRS, composite Unified HD Rating Scale; MH, mental health; SDMT, symbol digit modalities test; SWR, stroop word reading; TFC, total functional capacity; TMS, total motor score</italic>.</p>
</table-wrap-foot>
</table-wrap>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Effect of the highest-ranked variables on clinical progression trajectory as measured by cUHDRS. <bold>(A)</bold> Cognitive impairment; <bold>(B)</bold> Antipsychotics use; <bold>(C)</bold> Tetrabenazine use; <bold>(D)</bold> Being accompanied at clinic visit. cUHDRS, composite Unified HD Rating Scale.</p></caption>
<graphic xlink:href="fneur-12-678484-g0002.tif"/>
</fig>
<p>The common variables among the top 10 most important features for all outcomes were: CAP score, CAG repeats, accompanied or unaccompanied at clinic visit, tetrabenazine use, antipsychotics use and having severe cognitive impairment.</p>
<p>Unadjusted R<sup>2</sup> measures were calculated for the RF models including CAG and CAP score only and compared with the model including all the features (<xref ref-type="table" rid="T4">Table 4</xref>). Using additional features with CAP and CAG can capture the variance of outcome by 17% more for cUHDRS and 15% more on average for the other outcomes. The slight improvement in model fit with CAP, CAG and age compared with the model built with the shared top 10 features could be due to the different cross-validation sets, and also the very high contributions of CAP and CAG to the model fit.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Comparison of model performance with all the features (1), with the discovered top-ranking features (2) and with only established prognostic features (3 and 4).</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="left"><bold>Model</bold></th>
<th valign="top" align="center" colspan="5" style="border-bottom: thin solid #000000;"><bold>Outcome variable</bold></th>
</tr>
<tr>
<th/>
<th/>
<th valign="top" align="center"><bold>cUHDRS</bold></th>
<th valign="top" align="center"><bold>SDMT</bold></th>
<th valign="top" align="center"><bold>TFC</bold></th>
<th valign="top" align="center"><bold>SWR</bold></th>
<th valign="top" align="center"><bold>TMS</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="left">R<sup>2</sup> with all the features</td>
<td valign="top" align="center">41%</td>
<td valign="top" align="center">28%</td>
<td valign="top" align="center">30%</td>
<td valign="top" align="center">26%</td>
<td valign="top" align="center">36%</td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="left">R<sup>2</sup> with the shared top 10 features<xref ref-type="table-fn" rid="TN1"><sup>&#x0002A;</sup></xref></td>
<td valign="top" align="center">31%</td>
<td valign="top" align="center">18%</td>
<td valign="top" align="center">20%</td>
<td valign="top" align="center">14%</td>
<td valign="top" align="center">24%</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="left">R<sup>2</sup> with CAP, CAG and age at baseline</td>
<td valign="top" align="center">29%</td>
<td valign="top" align="center">19%</td>
<td valign="top" align="center">20%</td>
<td valign="top" align="center">16%</td>
<td valign="top" align="center">25%</td>
</tr>
<tr>
<td valign="top" align="left">4</td>
<td valign="top" align="left">R<sup>2</sup> with CAP and CAG</td>
<td valign="top" align="center">24%</td>
<td valign="top" align="center">14%</td>
<td valign="top" align="center">15%</td>
<td valign="top" align="center">12%</td>
<td valign="top" align="center">20%</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Difference between 4 and 1</td>
<td valign="top" align="center">17%</td>
<td valign="top" align="center">14%</td>
<td valign="top" align="center">15%</td>
<td valign="top" align="center">14%</td>
<td valign="top" align="center">16%</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN1"><label>&#x0002A;</label><p><italic>Shared features among the individual top 10 important features of each outcome: CAP, CAG, accompanied or alone at visit, tetrabenazine use, antipsychotics use and cognitive impairment</italic>.</p></fn>
<p><italic>CAG, cytosine adenine guanine; CAP, CAG-age product; cUHDRS, composite Unified HD Rating Scale; SDMT, symbol digit modalities test; SWR, stroop word reading; TFC, total functional capacity; TMS, total motor score</italic>.</p>
</table-wrap-foot>
</table-wrap></sec>
<sec sec-type="discussion" id="s3">
<title>Discussion</title>
<p>This analysis used real-world data from the large Enroll-HD registry and a machine learning algorithm to identify novel predictors of HD progression with significant impact on the slope of clinical decline observed over a 2-year follow-up period. The two most important predictors identified were CAP score and CAG repeat length, in agreement with previous studies (<xref ref-type="bibr" rid="B7">7</xref>, <xref ref-type="bibr" rid="B13">13</xref>). In addition, several strong predictors were identified that have either not been previously studied (being accompanied to a visit) or have had inconsistent effects in other studies (cognitive impairment, use of tetrabenazine or antipsychotics) (<xref ref-type="bibr" rid="B7">7</xref>, <xref ref-type="bibr" rid="B17">17</xref>&#x02013;<xref ref-type="bibr" rid="B19">19</xref>). The novel variables identified were predictive of progression over multiple clinical domains, measured by motor, cognitive and functional endpoints, as well as the composite endpoint. Using all predictors in addition to known prognostic variables improved the ability of the model to predict clinical outcomes (see video abstract in the <xref ref-type="supplementary-material" rid="SM1">Supplementary Materials</xref>).</p>
<p>Some of the features tested, which may have been expected to rank highly as prognostic variables based on previous studies in premanifest HD (i.e., prior to the onset of unequivocal motor symptoms)&#x02014;including smoking, alcohol intake and body mass index (BMI) (<xref ref-type="bibr" rid="B20">20</xref>&#x02013;<xref ref-type="bibr" rid="B22">22</xref>)&#x02014;were not found to be important predictors of progression. These results may not be directly comparable to the current study, which was carried out in a manifest HD population. It is also known that self-reporting of smoking and alcohol use is unreliable in the general population, as revealed by studies using advances in DNA methylation measurement to assess substance use status (<xref ref-type="bibr" rid="B23">23</xref>). We found BMI to be a weak discriminating factor among patients with different values of change in outcome. A further potential explanation for the disparity between our findings and previous studies could be that the prognostic value of these variables may be dependent on disease stage. In the current study, the population was relatively progressed, and it is possible that other variables associated with the disease could outweigh environmental variables.</p>
<p>It should be noted that we used an RF algorithm with the setting that prevents bias in the ranking based on the data structure. Whilst it is still possible that a feature can rank highly due to collinearity with another feature that is a strong predictor of the outcome, this could be prevented by calculating the conditional importance of the features, which is computationally very complex (<xref ref-type="bibr" rid="B24">24</xref>).</p>
<p>Additionally, the observed associations are based on observational data and are therefore not indicative of causal relationships, due to measured and unmeasured potential confounding factors. For example, being accompanied to clinic visits may affect the clinical outcome scores measured by virtue of the companion&#x00027;s additional report which informs the clinical rating. It may also be because healthier participants are able to continually attend visits alone, whereas those who are on worse clinical trajectories need additional emotional or practical assistance (e.g., driving) to complete visits. Similarly, antipsychotics may be used to treat motor symptoms in HD, and therefore may be expected to reduce TMS without influencing overall disease progression. Cognitive outcome measures may also be related to variables including dementia and severe cognitive impairment. RF approaches have good performance in modelling complex, multidimensional disease-specific datasets (like Enroll-HD) (<xref ref-type="bibr" rid="B25">25</xref>). In this application, an RF approach was used to find novel associations (e.g., identify variables with predictive potential for disease progression), and does not imply causality (i.e., the aetiological role of the variable during disease progression) (<xref ref-type="bibr" rid="B26">26</xref>). Nevertheless, by virtue of the strength of the associations observed, some of these features may be important to control for in analyses of observational studies and may have implications for companion participation in interventional trials.</p>
<p>The current study focused on clinical variables only and did not include imaging or fluid biomarkers, which previous studies have suggested may be predictive of disease progression (<xref ref-type="bibr" rid="B7">7</xref>, <xref ref-type="bibr" rid="B27">27</xref>, <xref ref-type="bibr" rid="B28">28</xref>). This limitation was due to the nature of the currently available HD databases. In this study, we used the Enroll-HD database, which provides a sufficiently large sample size for the analysis but does not include imaging or biofluid data as part of the main study. Imaging databases such as PREDICT-HD (NCT00051324) are available, but do not provide the comprehensive range of clinical variables that is available in Enroll-HD, such as medication history. The available biofluid databases, such as the HD-CSF study (<xref ref-type="bibr" rid="B28">28</xref>, <xref ref-type="bibr" rid="B29">29</xref>), are too small to be informative on the scale of the current analysis.</p>
<p>A further limitation is that the results described here are based on a selected cohort intended to reflect the inclusion criteria of ongoing clinical trials, and therefore may not be representative of the wider HD population, including younger patients (juvenile-onset HD), elderly patients (&#x0003E;65 years), late-stage patients (&#x0003E;Stage III) or premanifest patients. Further research is needed to determine the wider applicability of these results to these populations.</p>
<p>This study made use of a supervised RF regression model to identify putative and novel predictors of disease progression in HD. Identifying prognostic variables usually requires large sample sizes to optimise predictive accuracy, which may be a limitation for rare conditions such as HD. Since 2012, the Enroll-HD registry, which includes over 19,000 participants from 177 sites in 20 countries, has allowed a large, high-quality dataset to be available for researchers to advance the understanding of HD. The power of RF modelling is particularly relevant within the context of HD, where improved understanding of this multidomain disease and need for efficient trial design is evident.</p>
<p>To overcome known methodological limitations, RF approaches are being developed to harness the full potential of long-term registry data in clinical risk prediction and may, in future, accelerate disease risk and course prediction in HD. Given the dynamic nature of disease, the recently published RF Survival, Longitudinal and Multivariate model was developed to evaluate the temporal nature of variables (such as rate of change of variables) (<xref ref-type="bibr" rid="B30">30</xref>). Such approaches will further refine identification of clinically meaningful predictive variables not only for risk of disease progression as a static entity, but risk of disease progression over time and clinical course. Such temporal approaches will prove useful for future studies in HD, where disease course is highly variable.</p>
<p>In summary, the RF approach described here using the Enroll-HD dataset has identified novel prognostic variables which may be important candidates for statistical control in clinical trials and observational studies in HD.</p></sec>
<sec sec-type="methods" id="s4">
<title>Methods</title>
<sec>
<title>Data Source</title>
<p>Data from the Enroll-HD database were used for this study. Enroll-HD is a global platform designed to facilitate clinical research in HD. Core variables are collected annually from all research participants as part of this multicentre, longitudinal, observational study. Data are monitored for quality and accuracy using a risk-based monitoring approach. All sites are required to obtain and maintain local ethical approval. The study began recruiting in 2012, and as of data released in 2018, includes over 19,000 total participants and more than 8,000 patients with manifest HD. The second version of the fourth periodic dataset release (PDS4 version 2.0) was used, which has a data cut-off date of 31 October 2018 and was made available in August 2019.</p></sec>
<sec>
<title>Patient Population</title>
<p>The study population is purposely limited to individuals meeting typical criteria for clinical trials in manifest HD, using the filtering criteria shown in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<p>The primary population of interest was individuals with manifest HD aged 25&#x02013;65 years, inclusive. Patients with juvenile-onset HD (age of first symptom onset at age &#x0003C;20 years) were excluded. Participants were required to have Independence Scale &#x0003E;70 at baseline and at least two subsequent annual visits with clinical information recorded. The rationale for these criteria is that the typical duration of clinical studies in this population is 2 years.</p></sec>
<sec>
<title>Analytical Approach</title>
<p>A total of 102 prognostic variables (<xref ref-type="table" rid="T5">Table 5</xref>) were considered for each participant, including demographics, clinical characteristics, comorbidities, symptoms, as well as pharmacological and non-pharmacological treatments at baseline. The predicted variables are the estimated change of outcome measures over time. Estimated linear change was calculated for five outcome measures of HD which have known sensitivity to detect clinical change (change from baseline was measured out to 2 years): TFC, SWR, SDMT, TMS, and cUHDRS. The slope was estimated based on a linear mixed model with fixed and random intercept and slope respectively, and the individual-specific slope was computed as the sum of the random and fixed slope. Follow-up assessments up to 2 years that fell within a &#x000B1; 90-day window around planned annual visits were included.</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p>List of candidate prognostic variables included in analyses.</p></caption>
<table frame="hsides" rules="groups">
<tbody><tr>
<td valign="top" align="left">Age</td>
<td valign="top" align="left">Marital status&#x02014;separated</td>
<td valign="top" align="left">Nutrition&#x02014;homeopathic</td>
</tr>
<tr>
<td valign="top" align="left">Age at diagnosis</td>
<td valign="top" align="left">Marital status&#x02014;single</td>
<td valign="top" align="left">Nutrition&#x02014;aromatherapies</td>
</tr>
<tr>
<td valign="top" align="left">Rater&#x00027;s judgement of initial major symptom&#x02014;motor</td>
<td valign="top" align="left">Residence&#x02014;rural</td>
<td valign="top" align="left">Non-pharmacological therapies&#x02014;physical therapy</td>
</tr>
<tr>
<td valign="top" align="left">Rater&#x00027;s judgement of initial major symptom&#x02014;cognitive</td>
<td valign="top" align="left">Residence&#x02014;village</td>
<td valign="top" align="left">Non-pharmacological therapies&#x02014;occupational therapy</td>
</tr>
<tr>
<td valign="top" align="left">Rater&#x00027;s judgement of initial major symptom&#x02014;psychiatric</td>
<td valign="top" align="left">Residence&#x02014;town</td>
<td valign="top" align="left">Non-pharmacological therapies&#x02014;psychotherapy</td>
</tr>
<tr>
<td valign="top" align="left">Rater&#x00027;s judgement of initial major symptom&#x02014;oculomotor</td>
<td valign="top" align="left">Residence&#x02014;city</td>
<td valign="top" align="left">Non-pharmacological therapies&#x02014;counselling</td>
</tr>
<tr>
<td valign="top" align="left">Rater&#x00027;s judgement of initial major symptom&#x02014;other</td>
<td valign="top" align="left">Residence&#x02014;unknown</td>
<td valign="top" align="left">Non-pharmacological therapies&#x02014;speech therapy</td>
</tr>
<tr>
<td valign="top" align="left">Rater&#x00027;s judgement of initial major symptom&#x02014;mixed</td>
<td valign="top" align="left">Region&#x02014;Australasia</td>
<td valign="top" align="left">Non-pharmacological therapies&#x02014;swallowing therapy</td>
</tr>
<tr>
<td valign="top" align="left">Rater&#x00027;s judgement of initial major symptom&#x02014;unknown or missing</td>
<td valign="top" align="left">Region&#x02014;Europe</td>
<td valign="top" align="left">Non-pharmacological therapies&#x02014;music therapy</td>
</tr>
<tr>
<td valign="top" align="left">CAG repeats (affected allele)</td>
<td valign="top" align="left">Region&#x02014;Latin America</td>
<td valign="top" align="left">Non-pharmacological therapies&#x02014;relaxation therapy</td>
</tr>
<tr>
<td valign="top" align="left">CAG repeats (unaffected allele)</td>
<td valign="top" align="left">Region&#x02014;North America</td>
<td valign="top" align="left">Non-pharmacological therapies&#x02014;acupuncture</td>
</tr>
<tr>
<td valign="top" align="left">Baseline CAP score&#x02014;age<xref ref-type="table-fn" rid="TN2"><sup>&#x0002A;</sup></xref>(affected CAG repeats&#x02212;33.66)</td>
<td valign="top" align="left">BMI</td>
<td valign="top" align="left">Accompanied to clinic visit<xref ref-type="table-fn" rid="TN2"><sup>&#x0002A;</sup></xref></td>
</tr>
<tr>
<td valign="top" align="left">Male sex</td>
<td valign="top" align="left">Comorbidities&#x02014;renal</td>
<td valign="top" align="left">Unaccompanied to clinic visit<xref ref-type="table-fn" rid="TN2"><sup>&#x0002A;</sup></xref></td>
</tr>
<tr>
<td valign="top" align="left">Female sex</td>
<td valign="top" align="left">Comorbidities&#x02014;gynaecological</td>
<td valign="top" align="left">Mix of accompanied and unaccompanied to clinic visit<xref ref-type="table-fn" rid="TN2"><sup>&#x0002A;</sup></xref></td>
</tr>
<tr>
<td valign="top" align="left">Race&#x02014;black</td>
<td valign="top" align="left">Comorbidities&#x02014;reproductive</td>
<td valign="top" align="left">History of irritability</td>
</tr>
<tr>
<td valign="top" align="left">Race&#x02014;Native American</td>
<td valign="top" align="left">Comorbidities&#x02014;dermatological</td>
<td valign="top" align="left">History of depression</td>
</tr>
<tr>
<td valign="top" align="left">Race&#x02014;Asian</td>
<td valign="top" align="left">Comorbidities&#x02014;musculoskeletal</td>
<td valign="top" align="left">History of violence/aggression</td>
</tr>
<tr>
<td valign="top" align="left">Race&#x02014;Caucasian</td>
<td valign="top" align="left">Comorbidities&#x02014;neurological</td>
<td valign="top" align="left">History of apathy</td>
</tr>
<tr>
<td valign="top" align="left">Race&#x02014;Hispanic/Latin American</td>
<td valign="top" align="left">Comorbidities&#x02014;metabolic</td>
<td valign="top" align="left">History of perseverative obsessive MH behaviours</td>
</tr>
<tr>
<td valign="top" align="left">Race&#x02014;mixed</td>
<td valign="top" align="left">Comorbidities&#x02014;psychiatric</td>
<td valign="top" align="left">History of psychosis (hallucinations MH or delusions)</td>
</tr>
<tr>
<td valign="top" align="left">Race&#x02014;other</td>
<td valign="top" align="left">Comorbidities&#x02014;ENT</td>
<td valign="top" align="left">Family history of psychotic illness in 1<sup>st</sup> degree relative</td>
</tr>
<tr>
<td valign="top" align="left">Previous alcohol problems</td>
<td valign="top" align="left">Comorbidities&#x02014;gastrointestinal</td>
<td valign="top" align="left">Significant cognitive impairment or dementia</td>
</tr>
<tr>
<td valign="top" align="left">Ever smoked</td>
<td valign="top" align="left">Comorbidities&#x02014;allergy: immunological</td>
<td valign="top" align="left">History of motor symptoms MH compatible with HD</td>
</tr>
<tr>
<td valign="top" align="left">Currently smoke</td>
<td valign="top" align="left">Comorbidities&#x02014;pulmonary</td>
<td valign="top" align="left">Drugs for anti-depression</td>
</tr>
<tr>
<td valign="top" align="left">Ever abused drugs</td>
<td valign="top" align="left">Comorbidities&#x02014;ophthalmological</td>
<td valign="top" align="left">Lipid-modifying agents, plain and combinations</td>
</tr>
<tr>
<td valign="top" align="left">Current drug abuse</td>
<td valign="top" align="left">Comorbidities&#x02014;cardiovascular</td>
<td valign="top" align="left">Thyroid therapies</td>
</tr>
<tr>
<td valign="top" align="left">Currently drink alcohol</td>
<td valign="top" align="left">Comorbidities&#x02014;hepatobiliary</td>
<td valign="top" align="left">Antipsychotics use</td>
</tr>
<tr>
<td valign="top" align="left">Father affected by HD</td>
<td valign="top" align="left">Comorbidities&#x02014;haematological/lymphatic</td>
<td valign="top" align="left">Anxiolytics, hypnotics and sedatives use</td>
</tr>
<tr>
<td valign="top" align="left">Mother affected by HD</td>
<td valign="top" align="left">Comorbidities&#x02014;none</td>
<td valign="top" align="left">Analgesics use</td>
</tr>
<tr>
<td valign="top" align="left">Inheritance unknown</td>
<td valign="top" align="left">Comorbidities&#x02014;other</td>
<td valign="top" align="left">Tetrabenazine use</td>
</tr>
<tr>
<td valign="top" align="left">Education&#x02014;bachelor&#x00027;s degree or higher</td>
<td valign="top" align="left">Nutrition&#x02014;vitamin</td>
<td valign="top" align="left">ACE inhibitors (plain and combination)</td>
</tr>
<tr>
<td valign="top" align="left">Education&#x02014;post secondary but not university degree</td>
<td valign="top" align="left">Nutrition&#x02014;herbs</td>
<td valign="top" align="left">Anti-epileptics use</td>
</tr>
<tr>
<td valign="top" align="left">Education&#x02014;upper secondary or lower</td>
<td valign="top" align="left">Nutrition&#x02014;teas</td>
<td valign="top" align="left">Protein pump inhibitors use</td>
</tr>
<tr>
<td valign="top" align="left">Marital status&#x02014;married</td>
<td valign="top" align="left">Nutrition&#x02014;other</td>
<td valign="top" align="left">Dopaminergic therapies use</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>ACE, angiotensin-converting enzyme; CAG, cytosine adenine guanine; CAP, CAG-age product; ENT, ear, nose, throat; HD, Huntington&#x00027;s disease; MH, mental health</italic>.</p>
<fn id="TN2"><label>&#x0002A;</label><p><italic>Accompanied/unaccompanied/mix of accompanied and unaccompanied to clinical visits were separated into a trichotomous variable. Accompanied means the patient did not come alone to all Enroll-HD study visits throughout follow-up (maximum three visits); Unaccompanied means they came alone to all visits; Mix of accompanied and unaccompanied means they sometimes came alone, but not always. This is the only candidate predictive variable based on some post-baseline data</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>An RF regression model with 1,000 trees was trained to rank the features on their ability to predict the estimated linear change of each clinical outcome. The model randomly selects a subset of 34 variables (one third of all available) for splitting at each node within each tree. The training was repeated 100 times, each time on a 75% random sample of the data. In each round, permutation importance of each feature for prediction of the outcome was calculated and used for the ranking of the features. The median of the rankings of these 100 models was used for the final ranking of the feature importance.</p>
<p>The R<sup>2</sup> measure (the percentage of the slope variance that is explained by the model) was calculated for the following models predicting each outcome&#x02014;a model trained with CAP score and CAG only, a model trained with CAP score, CAG and age, a model trained with the above-mentioned shared top 10 ranked features and a model trained with all features.</p>
<p>The analysis was done using R version 3.5.2, with <monospace>lmer</monospace>() from the <monospace>lme4</monospace> package for the linear mixed-effects model, and <monospace>Cforest</monospace>() from the <monospace>Party</monospace> package for the RF regression model.</p></sec></sec>
<sec sec-type="data-availability-statement" id="s5">
<title>Data Availability Statement</title>
<p>The data analysed in this study was obtained from Enroll-HD, <ext-link ext-link-type="uri" xlink:href="https://enroll-hd.org/">https://enroll-hd.org/</ext-link>, the following licenses/restrictions apply: to access data you must be a researcher employed by a recognized academic institution, company or non-profit organisation and apply for an Enroll-HD access account. Requests to access these datasets should be directed to <ext-link ext-link-type="uri" xlink:href="https://enroll-hd.org/for-researchers/become-a-qualified-researcher/">https://enroll-hd.org/for-researchers/become-a-qualified-researcher/</ext-link>.</p></sec>
<sec id="s6">
<title>Code Availability</title>
<p>All the codes and algorithms for the analysis are available upon request.</p></sec>
<sec id="s7">
<title>Author Contributions</title>
<p>NG contributed to the conception and design of the study, and acquisition and interpretation of data for the work. SS contributed to conception of the study, and acquisition and interpretation of data for the work. GP and PW contributed to the design of the study. JL contributed to the conception and design of the study. RH contributed to conception and design of the study, and acquisition and analysis of data for the work. All authors drafted or substantively revised a significant portion of the manuscript or figures, and approved the final version for submission.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>NG, RH, SS, and GP are employees of F. Hoffmann-La Roche Ltd. JL is a paid Advisory Board member for F. Hoffmann-La Roche Ltd and uniQure biopharma B.V, and a paid consultant for Vaccinex Inc, Wave Life Sciences USA Inc, Genentech Inc and Triplet Inc. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The authors declare that this study received funding from F. Hoffmann-La Roche Ltd. The funder was involved in the study design, analysis, interpretation of data and the decision to submit it for publication.</p></sec>
</body>
<back>
<ack>
<p>Enroll-HD is a clinical research platform and longitudinal observational study for Huntington&#x00027;s disease (HD) families intended to accelerate progress towards therapeutics; it is sponsored by CHDI Foundation, a non-profit biomedical research organisation exclusively dedicated to collaboratively developing therapeutics for HD. Enroll-HD would not be possible without the vital contribution of the research participants and their families.</p>
<p>The authors thank all the people who participated in this study.</p>
</ack>
<sec sec-type="supplementary-material" id="s8">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fneur.2021.678484/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fneur.2021.678484/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Video_1.MP4" id="SM1" mimetype="video/mp4" xmlns:xlink="http://www.w3.org/1999/xlink"/></sec>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bates</surname> <given-names>GP</given-names></name> <name><surname>Dorsey</surname> <given-names>R</given-names></name> <name><surname>Gusella</surname> <given-names>JF</given-names></name> <name><surname>Hayden</surname> <given-names>MR</given-names></name> <name><surname>Kay</surname> <given-names>C</given-names></name> <name><surname>Leavitt</surname> <given-names>BR</given-names></name> <etal/></person-group>. <article-title>Huntington disease</article-title>. <source>Nat Rev Dis Primers.</source> (<year>2015</year>) <volume>1</volume>:<fpage>15005</fpage>. <pub-id pub-id-type="doi">10.1038/nrdp.2015.5</pub-id></citation></ref>
<ref id="B2">
<label>2.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roos</surname> <given-names>RA</given-names></name></person-group>. <article-title>Huntington&#x00027;s disease: a clinical review</article-title>. <source>Orphanet J Rare Dis.</source> (<year>2010</year>) <volume>5</volume>:<fpage>40</fpage>. <pub-id pub-id-type="doi">10.1186/1750-1172-5-40</pub-id></citation></ref>
<ref id="B3">
<label>3.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ross</surname> <given-names>CA</given-names></name> <name><surname>Aylward</surname> <given-names>EH</given-names></name> <name><surname>Wild</surname> <given-names>EJ</given-names></name> <name><surname>Langbehn</surname> <given-names>DR</given-names></name> <name><surname>Long</surname> <given-names>JD</given-names></name> <name><surname>Warner</surname> <given-names>JH</given-names></name> <etal/></person-group>. <article-title>Huntington disease: natural history, biomarkers and prospects for therapeutics</article-title>. <source>Nat Rev Neurol.</source> (<year>2014</year>) <volume>10</volume>:<fpage>204</fpage>&#x02013;<lpage>16</lpage>. <pub-id pub-id-type="doi">10.1038/nrneurol.2014.24</pub-id><pub-id pub-id-type="pmid">24614516</pub-id></citation></ref>
<ref id="B4">
<label>4.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Reilmann</surname> <given-names>R</given-names></name> <name><surname>Leavitt</surname> <given-names>BR</given-names></name> <name><surname>Ross</surname> <given-names>CA</given-names></name></person-group>. <article-title>Diagnostic criteria for Huntington&#x00027;s disease based on natural history</article-title>. <source>Mov Disord.</source> (<year>2014</year>) <volume>29</volume>:<fpage>1335</fpage>&#x02013;<lpage>41</lpage>. <pub-id pub-id-type="doi">10.1002/mds.26011</pub-id><pub-id pub-id-type="pmid">25164527</pub-id></citation></ref>
<ref id="B5">
<label>5.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Keum</surname> <given-names>JW</given-names></name> <name><surname>Shin</surname> <given-names>A</given-names></name> <name><surname>Gillis</surname> <given-names>T</given-names></name> <name><surname>Mysore</surname> <given-names>JS</given-names></name> <name><surname>Abu</surname> <given-names>Elneel K</given-names></name> <name><surname>Lucente</surname> <given-names>D</given-names></name> <etal/></person-group>. <article-title>The HTT CAG-expansion mutation determines age at death but not disease duration in Huntington disease</article-title>. <source>Am J Hum Genet.</source> (<year>2016</year>) <volume>98</volume>:<fpage>287</fpage>&#x02013;<lpage>98</lpage>. <pub-id pub-id-type="doi">10.1016/j.ajhg.2015.12.018</pub-id><pub-id pub-id-type="pmid">26849111</pub-id></citation></ref>
<ref id="B6">
<label>6.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paulsen</surname> <given-names>JS</given-names></name> <name><surname>Long</surname> <given-names>JD</given-names></name> <name><surname>Ross</surname> <given-names>CA</given-names></name> <name><surname>Harrington</surname> <given-names>DL</given-names></name> <name><surname>Erwin</surname> <given-names>CJ</given-names></name> <name><surname>Williams</surname> <given-names>JK</given-names></name> <etal/></person-group>. <article-title>Prediction of manifest Huntington&#x00027;s disease with clinical and imaging measures: a prospective observational study</article-title>. <source>Lancet Neurol.</source> (<year>2014</year>) <volume>13</volume>:<fpage>1193</fpage>&#x02013;<lpage>201</lpage>. <pub-id pub-id-type="doi">10.1016/S1474-4422(14)70238-8</pub-id><pub-id pub-id-type="pmid">25453459</pub-id></citation></ref>
<ref id="B7">
<label>7.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tabrizi</surname> <given-names>SJ</given-names></name> <name><surname>Scahill</surname> <given-names>RI</given-names></name> <name><surname>Owen</surname> <given-names>G</given-names></name> <name><surname>Durr</surname> <given-names>A</given-names></name> <name><surname>Leavitt</surname> <given-names>BR</given-names></name> <name><surname>Roos</surname> <given-names>RA</given-names></name> <etal/></person-group>. <article-title>Predictors of phenotypic progression and disease onset in premanifest and early-stage Huntington&#x00027;s disease in the TRACK-HD study: analysis of 36-month observational data</article-title>. <source>Lancet Neurol.</source> (<year>2013</year>) <volume>12</volume>:<fpage>637</fpage>&#x02013;<lpage>49</lpage>. <pub-id pub-id-type="doi">10.1016/S1474-4422(13)70088-7</pub-id><pub-id pub-id-type="pmid">23664844</pub-id></citation></ref>
<ref id="B8">
<label>8.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paulsen</surname> <given-names>JS</given-names></name></person-group>. <article-title>Cognitive impairment in Huntington disease: diagnosis and treatment</article-title>. <source>Curr Neurol Neurosci Rep</source>. (<year>2011</year>) <volume>11</volume>:<fpage>474</fpage>&#x02013;<lpage>83</lpage>. <pub-id pub-id-type="doi">10.1007/s11910-011-0215-x</pub-id><pub-id pub-id-type="pmid">21861097</pub-id></citation></ref>
<ref id="B9">
<label>9.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paulsen</surname> <given-names>JS</given-names></name> <name><surname>Long</surname> <given-names>JD</given-names></name> <name><surname>Johnson</surname> <given-names>HJ</given-names></name> <name><surname>Aylward</surname> <given-names>EH</given-names></name> <name><surname>Ross</surname> <given-names>CA</given-names></name> <name><surname>Williams</surname> <given-names>JK</given-names></name> <etal/></person-group>. <article-title>Clinical and biomarker changes in premanifest Huntington disease show trial feasibility: a decade of the PREDICT-HD study</article-title>. <source>Front Aging Neurosci.</source> (<year>2014</year>) <volume>6</volume>:<fpage>78</fpage>. <pub-id pub-id-type="doi">10.3389/fnagi.2014.00078</pub-id><pub-id pub-id-type="pmid">24795630</pub-id></citation></ref>
<ref id="B10">
<label>10.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rosenblatt</surname> <given-names>A</given-names></name></person-group>. <article-title>Neuropsychiatry of Huntington&#x00027;s disease</article-title>. <source>Dialogues Clin Neurosci.</source> (<year>2007</year>) <volume>9</volume>:<fpage>191</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.31887/DCNS.2007.9.2/arosenblatt</pub-id></citation></ref>
<ref id="B11">
<label>11.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wild</surname> <given-names>EJ</given-names></name> <name><surname>Tabrizi</surname> <given-names>SJ</given-names></name></person-group>. <article-title>Therapies targeting DNA and RNA in Huntington&#x00027;s disease</article-title>. <source>Lancet Neurol.</source> (<year>2017</year>) <volume>16</volume>:<fpage>837</fpage>&#x02013;<lpage>47</lpage>. <pub-id pub-id-type="doi">10.1016/S1474-4422(17)30280-6</pub-id><pub-id pub-id-type="pmid">28920889</pub-id></citation></ref>
<ref id="B12">
<label>12.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Frost</surname> <given-names>C</given-names></name> <name><surname>Mulick</surname> <given-names>A</given-names></name> <name><surname>Scahill</surname> <given-names>RI</given-names></name> <name><surname>Owen</surname> <given-names>G</given-names></name> <name><surname>Aylward</surname> <given-names>E</given-names></name> <name><surname>Leavitt</surname> <given-names>BR</given-names></name> <etal/></person-group>. <article-title>Design optimization for clinical trials in early-stage manifest Huntington&#x00027;s disease</article-title>. <source>Mov Disord.</source> (<year>2017</year>) <volume>32</volume>:<fpage>1610</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1002/mds.27122</pub-id><pub-id pub-id-type="pmid">28906031</pub-id></citation></ref>
<ref id="B13">
<label>13.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Langbehn</surname> <given-names>DR</given-names></name> <name><surname>Stout</surname> <given-names>JC</given-names></name> <name><surname>Gregory</surname> <given-names>S</given-names></name> <name><surname>Mills</surname> <given-names>JA</given-names></name> <name><surname>Durr</surname> <given-names>A</given-names></name> <name><surname>Leavitt</surname> <given-names>BR</given-names></name> <etal/></person-group>. <article-title>Association of CAG repeats with long-term progression in huntington disease</article-title>. <source>JAMA Neurol.</source> (<year>2019</year>) <volume>76</volume>:<fpage>1375</fpage>&#x02013;<lpage>85</lpage>. <pub-id pub-id-type="doi">10.1001/jamaneurol.2019.2368</pub-id><pub-id pub-id-type="pmid">31403680</pub-id></citation></ref>
<ref id="B14">
<label>14.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Epifanio</surname> <given-names>I</given-names></name></person-group>. <article-title>Intervention in prediction measure: a new approach to assessing variable importance for random forests</article-title>. <source>BMC Bioinformatics.</source> (<year>2017</year>) <volume>18</volume>:<fpage>230</fpage>. <pub-id pub-id-type="doi">10.1186/s12859-017-1650-8</pub-id><pub-id pub-id-type="pmid">28464827</pub-id></citation></ref>
<ref id="B15">
<label>15.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rigatti</surname> <given-names>SJ</given-names></name></person-group>. <article-title>Random forest</article-title>. <source>J Insur Med.</source> (<year>2017</year>) <volume>47</volume>:<fpage>31</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.17849/insm-47-01-31-39.1</pub-id></citation></ref>
<ref id="B16">
<label>16.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schobel</surname> <given-names>SA</given-names></name> <name><surname>Palermo</surname> <given-names>G</given-names></name> <name><surname>Auinger</surname> <given-names>P</given-names></name> <name><surname>Long</surname> <given-names>JD</given-names></name> <name><surname>Ma</surname> <given-names>S</given-names></name> <name><surname>Khwaja</surname> <given-names>OS</given-names></name> <etal/></person-group>. <article-title>Motor, cognitive, and functional declines contribute to a single progressive factor in early HD</article-title>. <source>Neurology.</source> (<year>2017</year>) <volume>89</volume>:<fpage>2495</fpage>&#x02013;<lpage>502</lpage>. <pub-id pub-id-type="doi">10.1212/WNL.0000000000004743</pub-id><pub-id pub-id-type="pmid">29142089</pub-id></citation></ref>
<ref id="B17">
<label>17.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Keogh</surname> <given-names>R</given-names></name> <name><surname>Frost</surname> <given-names>C</given-names></name> <name><surname>Owen</surname> <given-names>G</given-names></name> <name><surname>Daniel</surname> <given-names>RM</given-names></name> <name><surname>Langbehn</surname> <given-names>DR</given-names></name> <name><surname>Leavitt</surname> <given-names>B</given-names></name> <etal/></person-group>. <article-title>Medication use in early-HD participants in track-hd: an investigation of its effects on clinical performance</article-title>. <source>PLoS Curr.</source> (<year>2016</year>) <volume>8</volume>. <pub-id pub-id-type="doi">10.1371/currents.hd.8060298fac1801b01ccea6acc00f97cb</pub-id><pub-id pub-id-type="pmid">26819833</pub-id></citation></ref>
<ref id="B18">
<label>18.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dorsey</surname> <given-names>ER</given-names></name> <name><surname>Brocht</surname> <given-names>AF</given-names></name> <name><surname>Nichols</surname> <given-names>PE</given-names></name> <name><surname>Darwin</surname> <given-names>KC</given-names></name> <name><surname>Anderson</surname> <given-names>KE</given-names></name> <name><surname>Beck</surname> <given-names>CA</given-names></name> <etal/></person-group>. <article-title>Depressed mood and suicidality in individuals exposed to tetrabenazine in a large Huntington disease observational study</article-title>. <source>J Huntingtons Dis.</source> (<year>2013</year>) <volume>2</volume>:<fpage>509</fpage>&#x02013;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.3233/JHD-130071</pub-id><pub-id pub-id-type="pmid">25062735</pub-id></citation></ref>
<ref id="B19">
<label>19.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schultz</surname> <given-names>JL</given-names></name> <name><surname>Killoran</surname> <given-names>A</given-names></name> <name><surname>Nopoulos</surname> <given-names>PC</given-names></name> <name><surname>Chabal</surname> <given-names>CC</given-names></name> <name><surname>Moser</surname> <given-names>DJ</given-names></name> <name><surname>Kamholz</surname> <given-names>JA</given-names></name></person-group>. <article-title>Evaluating depression and suicidality in tetrabenazine users with Huntington disease</article-title>. <source>Neurology.</source> (<year>2018</year>) <volume>91</volume>:<fpage>e202</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1212/WNL.0000000000005817</pub-id><pub-id pub-id-type="pmid">29925548</pub-id></citation></ref>
<ref id="B20">
<label>20.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schultz</surname> <given-names>JL</given-names></name> <name><surname>Kamholz</surname> <given-names>JA</given-names></name> <name><surname>Moser</surname> <given-names>DJ</given-names></name> <name><surname>Feely</surname> <given-names>SM</given-names></name> <name><surname>Paulsen</surname> <given-names>JS</given-names></name> <name><surname>Nopoulos</surname> <given-names>PC</given-names></name></person-group>. <article-title>Substance abuse may hasten motor onset of Huntington disease: evaluating the Enroll-HD database</article-title>. <source>Neurology.</source> (<year>2017</year>) <volume>88</volume>:<fpage>909</fpage>&#x02013;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1212/WNL.0000000000003661</pub-id><pub-id pub-id-type="pmid">28148631</pub-id></citation></ref>
<ref id="B21">
<label>21.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schultz</surname> <given-names>JL</given-names></name> <name><surname>Harshman</surname> <given-names>LA</given-names></name> <name><surname>Langbehn</surname> <given-names>DR</given-names></name> <name><surname>Nopoulos</surname> <given-names>PC</given-names></name></person-group>. <article-title>Hypertension is associated with an earlier age of onset of huntington&#x00027;s disease</article-title>. <source>Mov Disord.</source> (<year>2020</year>) <volume>35</volume>:<fpage>1558</fpage>&#x02013;<lpage>64</lpage>. <pub-id pub-id-type="doi">10.1002/mds.28062</pub-id><pub-id pub-id-type="pmid">32339315</pub-id></citation></ref>
<ref id="B22">
<label>22.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van</surname> <given-names>der Burg JMM</given-names></name> <name><surname>Gardiner</surname> <given-names>SL</given-names></name> <name><surname>Ludolph</surname> <given-names>AC</given-names></name> <name><surname>Landwehrmeyer</surname> <given-names>GB</given-names></name> <name><surname>Roos</surname> <given-names>RAC</given-names></name> <name><surname>Aziz</surname> <given-names>NA</given-names></name></person-group>. <article-title>Body weight is a robust predictor of clinical progression in Huntington disease</article-title>. <source>Ann Neurol.</source> (<year>2017</year>) <volume>82</volume>:<fpage>479</fpage>&#x02013;<lpage>83</lpage>. <pub-id pub-id-type="doi">10.1002/ana.25007</pub-id><pub-id pub-id-type="pmid">28779551</pub-id></citation></ref>
<ref id="B23">
<label>23.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Philibert</surname> <given-names>R</given-names></name> <name><surname>Dogan</surname> <given-names>M</given-names></name> <name><surname>Noel</surname> <given-names>A</given-names></name> <name><surname>Miller</surname> <given-names>S</given-names></name> <name><surname>Krukow</surname> <given-names>B</given-names></name> <name><surname>Papworth</surname> <given-names>E</given-names></name> <etal/></person-group>. <article-title>Dose response and prediction characteristics of a methylation sensitive digital PCR assay for cigarette consumption in adults</article-title>. <source>Front Genet.</source> (<year>2018</year>) <volume>9</volume>:<fpage>137</fpage>. <pub-id pub-id-type="doi">10.3389/fgene.2018.00137</pub-id><pub-id pub-id-type="pmid">29740475</pub-id></citation></ref>
<ref id="B24">
<label>24.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Strobl</surname> <given-names>C</given-names></name> <name><surname>Boulesteix</surname> <given-names>AL</given-names></name> <name><surname>Zeileis</surname> <given-names>A</given-names></name> <name><surname>Hothorn</surname> <given-names>T</given-names></name></person-group>. <article-title>Bias in random forest variable importance measures: illustrations, sources and a solution</article-title>. <source>BMC Bioinformatics.</source> (<year>2007</year>) <volume>8</volume>:<fpage>25</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-8-25</pub-id><pub-id pub-id-type="pmid">17254353</pub-id></citation></ref>
<ref id="B25">
<label>25.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mariani</surname> <given-names>MC</given-names></name> <name><surname>Tweneboah</surname> <given-names>OK</given-names></name> <name><surname>Bhuiyan</surname> <given-names>MAM</given-names></name></person-group>. <article-title>Supervised machine learning models applied to disease diagnosis and prognosis</article-title>. <source>AIMS Public Health.</source> (<year>2019</year>) <volume>6</volume>:<fpage>405</fpage>&#x02013;<lpage>23</lpage>. <pub-id pub-id-type="doi">10.3934/publichealth.2019.4.405</pub-id><pub-id pub-id-type="pmid">31909063</pub-id></citation></ref>
<ref id="B26">
<label>26.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Obermeyer</surname> <given-names>Z</given-names></name> <name><surname>Emanuel</surname> <given-names>EJ</given-names></name></person-group>. <article-title>Predicting the future&#x02014;big data, machine learning, and clinical medicine</article-title>. <source>N Engl J Med.</source> (<year>2016</year>) <volume>375</volume>:<fpage>1216</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1056/NEJMp1606181</pub-id><pub-id pub-id-type="pmid">27682033</pub-id></citation></ref>
<ref id="B27">
<label>27.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Byrne</surname> <given-names>LM</given-names></name> <name><surname>Rodrigues</surname> <given-names>FB</given-names></name> <name><surname>Blennow</surname> <given-names>K</given-names></name> <name><surname>Durr</surname> <given-names>A</given-names></name> <name><surname>Leavitt</surname> <given-names>BR</given-names></name> <name><surname>Roos</surname> <given-names>RAC</given-names></name> <etal/></person-group>. <article-title>Neurofilament light protein in blood as a potential biomarker of neurodegeneration in Huntington&#x00027;s disease: a retrospective cohort analysis</article-title>. <source>Lancet Neurol.</source> (<year>2017</year>) <volume>16</volume>:<fpage>601</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1016/S1474-4422(17)30124-2</pub-id><pub-id pub-id-type="pmid">28601473</pub-id></citation></ref>
<ref id="B28">
<label>28.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rodrigues</surname> <given-names>FB</given-names></name> <name><surname>Byrne</surname> <given-names>LM</given-names></name> <name><surname>Tortelli</surname> <given-names>R</given-names></name> <name><surname>Johnson</surname> <given-names>EB</given-names></name> <name><surname>Wijeratne</surname> <given-names>PA</given-names></name> <name><surname>Arridge</surname> <given-names>M</given-names></name> <etal/></person-group>. <article-title>Mutant huntingtin and neurofilament light have distinct longitudinal dynamics in Huntington&#x00027;s disease</article-title>. <source>Sci Transl Med.</source> (<year>2020</year>) <volume>12</volume>:<fpage>eabc2888</fpage>. <pub-id pub-id-type="doi">10.1126/scitranslmed.abc2888</pub-id><pub-id pub-id-type="pmid">33328328</pub-id></citation></ref>
<ref id="B29">
<label>29.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Byrne</surname> <given-names>LM</given-names></name> <name><surname>Rodrigues</surname> <given-names>FB</given-names></name> <name><surname>Johnson</surname> <given-names>EB</given-names></name> <name><surname>Wijeratne</surname> <given-names>PA</given-names></name> <name><surname>De</surname> <given-names>Vita E</given-names></name> <name><surname>Alexander</surname> <given-names>DC</given-names></name> <etal/></person-group>. <article-title>Evaluation of mutant huntingtin and neurofilament proteins as potential markers in Huntington&#x00027;s disease</article-title>. <source>Sci Transl Med.</source> (<year>2018</year>) <volume>10</volume>:<fpage>eaat7108</fpage>. <pub-id pub-id-type="doi">10.1126/scitranslmed.aat7108</pub-id><pub-id pub-id-type="pmid">30209243</pub-id></citation></ref>
<ref id="B30">
<label>30.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wongvibulsin</surname> <given-names>S</given-names></name> <name><surname>Wu</surname> <given-names>KC</given-names></name> <name><surname>Zeger</surname> <given-names>SL</given-names></name></person-group>. <article-title>Clinical risk prediction with random forests for survival, longitudinal, and multivariate (RF-SLAM) data analysis</article-title>. <source>BMC Med Res Methodol.</source> (<year>2019</year>) <volume>20</volume>:<fpage>1</fpage>. <pub-id pub-id-type="doi">10.1186/s12874-019-0863-0</pub-id><pub-id pub-id-type="pmid">31888507</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn fn-type="financial-disclosure"><p><bold>Funding.</bold> This study was funded by F. Hoffmann-La Roche Ltd. The authors thank Matt Gooding and Caroline Sproat of MediTech Media, UK for providing medical writing support, which was funded by F. Hoffmann-La Roche Basel Ltd, Switzerland in accordance with Good Publication Practise (GGP3) guidelines (<ext-link ext-link-type="uri" xlink:href="http://www.ismpp.org/gpp3">http://www.ismpp.org/gpp3</ext-link>). PW was supported by a UKRI Medical Research Council Skills Development Fellowship (MR/T027770/1).</p>
</fn>
</fn-group>
</back>
</article> 