<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="discussion">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Big Data</journal-id>
<journal-title>Frontiers in Big Data</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Big Data</abbrev-journal-title>
<issn pub-type="epub">2624-909X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fdata.2022.979465</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Big Data</subject>
<subj-group>
<subject>Opinion</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>The ABC recommendations for validation of supervised machine learning results in biomedical sciences</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Chicco</surname> <given-names>Davide</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/115230/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Jurman</surname> <given-names>Giuseppe</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="author-notes" rid="fn003"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/246296/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Institute of Health Policy Management and Evaluation, University of Toronto</institution>, <addr-line>Toronto, ON</addr-line>, <country>Canada</country></aff>
<aff id="aff2"><sup>2</sup><institution>Data Science for Health Unit, Fondazione Bruno Kessler</institution>, <addr-line>Trento</addr-line>, <country>Italy</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Sujata Dash, North Orissa University, India</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Ashish Luhach, Papua New Guinea University of Technology, Papua New Guinea</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Davide Chicco <email>davidechicco&#x00040;davidechicco.it</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Medicine and Public Health, a section of the journal Frontiers in Big Data</p></fn>
<fn fn-type="other" id="fn002"><p>&#x02020;ORCID: Davide Chicco <ext-link ext-link-type="uri" xlink:href="https://orcid.org/0000-0001-9655-7142">orcid.org/0000-0001-9655-7142</ext-link></p></fn>
<fn fn-type="other" id="fn003"><p>Giuseppe Jurman <ext-link ext-link-type="uri" xlink:href="https://orcid.org/0000-0002-2705-5728">orcid.org/0000-0002-2705-5728</ext-link></p></fn></author-notes>
<pub-date pub-type="epub">
<day>27</day>
<month>09</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>5</volume>
<elocation-id>979465</elocation-id>
<history>
<date date-type="received">
<day>27</day>
<month>06</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>12</day>
<month>09</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Chicco and Jurman.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Chicco and Jurman</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license></permissions> 
<kwd-group>
<kwd>supervised machine learning</kwd>
<kwd>computational validation</kwd>
<kwd>recommendations</kwd>
<kwd>data mining</kwd>
<kwd>best practices in machine learning</kwd>
</kwd-group>
<counts>
<fig-count count="1"/>
<table-count count="1"/>
<equation-count count="0"/>
<ref-count count="67"/>
<page-count count="6"/>
<word-count count="4540"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Supervised machine learning has become pervasive in the biomedical sciences nowadays (Larra&#x000F1;aga et al., <xref ref-type="bibr" rid="B39">2006</xref>; Tarca et al., <xref ref-type="bibr" rid="B57">2007</xref>), and its validation has obtained a key role in all these scientific fields. We therefore read with great interest the article by Walsh et al. (<xref ref-type="bibr" rid="B61">2021</xref>), which reported a list of DOME recommendations to properly validate results achieved with supervised machine learning, according to the authors. In the past, several studies already listed common best practices and recommendations for the proper usage of machine learning (Bhaskar et al., <xref ref-type="bibr" rid="B5">2006</xref>; Domingos, <xref ref-type="bibr" rid="B25">2012</xref>; Chicco, <xref ref-type="bibr" rid="B13">2017</xref>; Cearns et al., <xref ref-type="bibr" rid="B11">2019</xref>; Stevens et al., <xref ref-type="bibr" rid="B55">2020</xref>; Artrith et al., <xref ref-type="bibr" rid="B2">2021</xref>; Cabitza and Campagner, <xref ref-type="bibr" rid="B10">2021</xref>; Larson et al., <xref ref-type="bibr" rid="B40">2021</xref>; Whalen et al., <xref ref-type="bibr" rid="B62">2021</xref>; Lee et al., <xref ref-type="bibr" rid="B41">2022</xref>) and computational statistics (Benjamin et al., <xref ref-type="bibr" rid="B4">2018</xref>; Makin and de Xivry, <xref ref-type="bibr" rid="B42">2019</xref>), but the comment by Walsh et al. (<xref ref-type="bibr" rid="B61">2021</xref>) has the merit to highlight the importance of computational validation, which is a key step perhaps even more important than the machine learning algorithm design itself.</p>
<p>Although interesting and complete, that article describes numerous of steps and aspects in a way that we find complicated, especially for beginners. We believe that the 21 questions of the Box 1 of the DOME article (Walsh et al., <xref ref-type="bibr" rid="B61">2021</xref>) can be adequate for a data mining expert, but they might scare and discourage an inexperienced practitioner. For example, the recommendations about the <italic>meta-predictions</italic> and about the hyper-parameters&#x00027; optimization might not be understandable by a machine learning beginner or by a wet lab biologist. And it should not be a problem: a robust machine learning analysis can be performed, in fact, without using meta-predictions or hyper-parameters, too. A beginner, in front of so many guidelines of that article, some of which being so complex, might even decide to abandon the computational intelligence analysis, to avoid making any mistake in their scientific project. Moreover, the DOME (Walsh et al., <xref ref-type="bibr" rid="B61">2021</xref>) authors present the 21 questions of the article Box 1 with the same level of importance. In contrast, we think that three key aspects to keep in mind for computational validation are pivotal and can be sufficient, if verified correctly. So we believe that a practitioner would better focus all their attention and energy on accurately respecting these three recommendations.</p>
<p>We therefore wrote this note to propose our own recommendations for the computational validation of supervised machine learning results in the biomedical sciences: just three, explained easily and clearly, that alone can pave the way for a successful machine learning validation phase. We designed these simple quick tips from our experience gained on tens of biomedical projects involving machine learning phases. We call these recommendations ABC to highlight their essential role in any computational validation (<xref ref-type="fig" rid="F1">Figure 1</xref>).</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>ABC recommendations checklist. An overview of our ABC recommendations, to keep in mind for any machine learning study.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-05-979465-g0001.tif"/>
</fig>
</sec>
<sec id="s2">
<title>2. The ABC recommendations</title>
<sec>
<title>(A) Always divide the dataset carefully into separate training set and test set</title>
<p>This rule must become your obsession: verify and double-check that no data element is shared by both the training set and the test set. They must be completely independent.</p>
<p>You then can do anything you want on the training set, including the hyper-parameter optimization, but make sure you do not touch the test set. Leave the test set alone until your supervised machine learning model training has finished (and its hyper-parameters are optimized, if any). If you have enough data, consider also allocating a subset of it (such as 10% of data elements, randomly selected) as a holdout set (Skocik et al., <xref ref-type="bibr" rid="B53">2016</xref>), to use as an alternative test set to confirm your findings and to avoid over-validation (Wainberg et al., <xref ref-type="bibr" rid="B60">2016</xref>).</p>
<p>This important separation will allow you to avoid <italic>data snooping</italic> (White, <xref ref-type="bibr" rid="B63">2000</xref>; Smith, <xref ref-type="bibr" rid="B54">2021</xref>), that is a common mistake in multiple studies involving computational intelligence (Jensen, <xref ref-type="bibr" rid="B34">2000</xref>; Sewell, <xref ref-type="bibr" rid="B51">2021</xref>). Data snooping, also known as <italic>data dredging</italic> and called &#x0201C;the dark side of data mining&#x0201D; (Jensen, <xref ref-type="bibr" rid="B34">2000</xref>), happens in fact when some data elements of the training set are present in the test set, too, and therefore over-optimistically improve the results obtained by the trained machine learning model on the test set. Sometimes, this problem can happen even when different data elements of the same patients (for example, radiography images in digital pathology) are shared between training set and test set, and is usually called <italic>data leakage</italic> (Bussola et al., <xref ref-type="bibr" rid="B9">2021</xref>). This mistake is dangerous for every machine learning study, because it can give the illusion of success to an unaware researcher. In this situation, you need to keep in mind the famous quote by Richard Feynman: &#x0201C;The first principle is that you must not fool yourself, and you are the easiest person to fool&#x0201D; (Chicco, <xref ref-type="bibr" rid="B13">2017</xref>).</p>
<p>Data snooping does exactly that: it makes you fool yourself and makes you believe you obtained excellent results, while actually machine learning performance was flawed. Once you make sure the training set and the test set are independent from each other, you can use traditional cross-validation methods such as <italic>k</italic>-fold cross-validation, leave-one-out cross-validation, and nested cross-validation (Yadav and Shukla, <xref ref-type="bibr" rid="B66">2016</xref>), or bootstrap validation (Efron, <xref ref-type="bibr" rid="B28">1992</xref>; Efron and Tibshirani, <xref ref-type="bibr" rid="B29">1994</xref>), to mitigate over-fitting (Dietterich, <xref ref-type="bibr" rid="B24">1995</xref>; Chicco, <xref ref-type="bibr" rid="B13">2017</xref>). Moreover, over-fitting can be tackled through calibration methods such as calibration curves (Austin et al., <xref ref-type="bibr" rid="B3">2022</xref>) or calibration-in-the-large (Crowson et al., <xref ref-type="bibr" rid="B22">2016</xref>), which can also help measuring the robustness of model performance.</p>
<p>Moreover, it is important to notice that sometimes splitting the dataset into two subsets (training set and test set) might not be enough (Picard and Berk, <xref ref-type="bibr" rid="B46">1990</xref>). Even for shallow machine learning models, a correct splitting methodology should be enforced: for instance, see the Data Analysis Protocol strategy introduced by the MAQC/SEQC initiatives led by the US Food and Drug Administration (FDA) (MAQC Consortium, <xref ref-type="bibr" rid="B43">2010</xref>; Zhang et al., <xref ref-type="bibr" rid="B67">2015</xref>). And when there are hyper-parameters to optimize (Feurer and Hutter, <xref ref-type="bibr" rid="B30">2019</xref>), such as the number of hidden layers and the number of hidden units in artificial neural networks, it is advisable to split the dataset into three subsets: training set, validation set, and test set (Chicco, <xref ref-type="bibr" rid="B13">2017</xref>). In these cases, sometimes in scientific literature the names <italic>validation set</italic> and <italic>test set</italic> are used interchangeably; in this report, we call <italic>validation set</italic> the part of the dataset employed to evaluate the algorithm configuration with a particular hyper-parameter value, and we call <italic>test set</italic> the portion of the dataset to keep untouched and eventually use to verify the algorithm with the optimal hyper-parameters&#x00027; configuration.</p>
</sec>
<sec>
<title>(B) Broadly use multiple rates to evaluate your results</title>
<p>Evaluate your results with various rates, and definitely include the Matthew&#x00027;s correlation coefficient (MCC) (Matthews, <xref ref-type="bibr" rid="B44">1975</xref>) for binary classifications (Chicco and Jurman, <xref ref-type="bibr" rid="B15">2020</xref>; Chicco et al., <xref ref-type="bibr" rid="B18">2021a</xref>) and the coefficient of determination (<italic>R</italic><sup>2</sup>) (Wright, <xref ref-type="bibr" rid="B65">1921</xref>) for regression analyses (Chicco et al., <xref ref-type="bibr" rid="B19">2021b</xref>). Moreover, make sure you include at least accuracy, <italic>F</italic><sub>1</sub> score, sensitivity, specificity, precision, negative predictive value, Cohen&#x00027;s Kappa, and the area under the curve (AUC) of the receiving operating characteristic curve (ROC) and of the prediction-recall curve (PR) for binary classifications. For regression analyses, make sure you incorporate at least mean absolute error (MAE), mean absolute percentage error (MAPE), mean square error (MSE), root mean square error (RMSE), and symmetric mean absolute percentage error (SMAPE), in addition to the already-mentioned <italic>R</italic><sup>2</sup>. We recap our suggestions in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Recap of the suggested metrics for evaluating results of binary classifications and regression analyses.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Analysis type</bold></th>
<th valign="top" align="left"><bold>Always include</bold></th>
<th valign="top" align="left"><bold>We suggest to include</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td/>
<td/>
<td valign="top" align="left">TPR, TNR, PPV, NPV, accuracy,</td>
</tr>
<tr>
<td valign="top" align="left">Binary classification</td>
<td valign="top" align="left">MCC</td>
<td valign="top" align="left">F<sub>1</sub> score, Cohen&#x00027;s Kappa,</td>
</tr>
<tr>
<td/>
<td/>
<td valign="top" align="left">ROC AUC, and PR AUC</td>
</tr>
<tr>
<td valign="top" align="left">Regression analysis</td>
<td valign="top" align="left">R<sup>2</sup></td>
<td valign="top" align="left">SMAPE, MAPE, MAE, MSE, and RMSE</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The formulas of the binary classification rates can be found in Chicco and Jurman (<xref ref-type="bibr" rid="B15">2020</xref>) and Chicco et al. (<xref ref-type="bibr" rid="B18">2021a</xref>,<xref ref-type="bibr" rid="B20">c</xref>) and the formulas of the regression analysis rates can be found in Chicco et al. (<xref ref-type="bibr" rid="B19">2021b</xref>).</p>
</table-wrap-foot>
</table-wrap>
<p>It is necessary to include all these scores because each of them provides a singular, useful piece of information about your supervised machine learning results. The more statistics you include, the more chances you have to spot any possible flaw in your predictions. All these rates work like dashboard indicator lamps in a car: if something somewhere in your machine (learning) did not work out the way it was supposed to, a lamp (rate) will inform you about it.</p>
<p>The Matthew&#x00027;s correlation coefficient, in particular, has a fundamental role in binary classification evaluation: it has a high score only if the classifier correctly predicted most of the positive elements and of the negative elements, and only if the classifier made mostly correct positive predictions and mostly correct negative predictions (Chicco and Jurman, <xref ref-type="bibr" rid="B15">2020</xref>, <xref ref-type="bibr" rid="B16">2022</xref>; Chicco et al., <xref ref-type="bibr" rid="B17">2021</xref>; Chicco et al., <xref ref-type="bibr" rid="B18">2021a</xref>). That means, a high MCC corresponds to a high score for all the four basic rates of a 2 &#x000D7; 2 confusion matrix: sensitivity, specificity, precision, and negative predictive value (Chicco et al., <xref ref-type="bibr" rid="B18">2021a</xref>). Because of its efficacy, the MCC has been employed as the standard metric in several scientific projects. For example, the USFDA agency used the MCC as the main evaluation rate in the MicroArray II/Sequencing Quality Control (MAQC/SEQC) projects (MAQC Consortium, <xref ref-type="bibr" rid="B43">2010</xref>; SEQC/MAQC-III Consortium, <xref ref-type="bibr" rid="B50">2014</xref>).</p>
<p>Regarding regression analysis assessment, the coefficient of determination <italic>R</italic>-squared (<italic>R</italic><sup>2</sup>) is the only rate that generates a high score only if the predictive algorithm was able to correctly predict most of the elements of each data class, considering their distribution (Chicco et al., <xref ref-type="bibr" rid="B19">2021b</xref>). Additionally, <italic>R</italic><sup>2</sup> allows the comparison of models applied to datasets having different scales (Chicco et al., <xref ref-type="bibr" rid="B19">2021b</xref>). Because of its effectiveness, the coefficient of determination has been employed as the standard evaluation metric for several international scientific projects, such as the Overhead Geopose DrivenData Challenge (DrivenData.org, <xref ref-type="bibr" rid="B26">2022</xref>) and the Breast Cancer Prognosis DREAM Education Challenge (Bionetworks, <xref ref-type="bibr" rid="B6">2021</xref>).</p>
</sec>
<sec>
<title>(C) Confirm your findings with external data, if possible</title>
<p>If you can, use data coming from a different data source and made of a different data type from the main dataset to verify your discoveries. Obtaining the same results you achieved on the main original dataset on an external dataset coming from another scientific research centre would be a strong confirmation of your scientific findings. Moreover, if this external data were in a data type different from the original data, it would even increase the level of independence between the two datasets, and even more strongly confirm your scientific outcomes.</p>
<p>In a bioinformatics study, for example, Kustra and Zagdanski (<xref ref-type="bibr" rid="B38">2008</xref>) employed a data fusion approach to cluster microarray gene expression data and associate the derived clusters to Gene Ontology annotations (Gene Ontology Consortium, <xref ref-type="bibr" rid="B32">2019</xref>). For validating their results, instead of using a different microarray dataset, the authors decided to take advantage of an external database made of a different data type: a protein&#x02013;protein database called General Repository for Interaction Data Sets (GRID) (Breitkreutz et al., <xref ref-type="bibr" rid="B8">2003</xref>). This way, the authors were able to find in external data a strong confirmation of the results they obtained on the original data, and therefore were able to claim their study outcomes as robust and reliable in their manuscript&#x00027;s conclusions.</p>
<p>Moving from bioinformatics to health informatics, a call for external data validation has recently been raised in machine learning and computational statistics applied to heart failure prediction as well (Shin et al., <xref ref-type="bibr" rid="B52">2021</xref>).</p>
<p>That being said, we are aware that obtaining compatible additional data and integrating them might be difficult for some biomedical studies, but we still invite all the machine learning practitioners to make an attempt and to try to collect confirmatory data for their analyses anyway. In some cases, there are plenty of public datasets available for free use that can be downloaded and integrated easily.</p>
<p>Bioinformaticians working on gene expression analysis, for example, can take advantage of the thousands of different datasets available on the Gene Expression Omnibus (GEO) (Edgar et al., <xref ref-type="bibr" rid="B27">2002</xref>). Tens of compatible datasets of a particular cancer type can be found by specifying the microarray platform, for example, through the recently released <monospace>geoCancerPrognosticDatasetsRetriever</monospace> (Alameer and Chicco, <xref ref-type="bibr" rid="B1">2022</xref>) bioinformatics tool. Researchers can take advantage of these compatible datasets (for example, built on the GPL570 Affymetrix platform) to verify their findings, after applying some quality-control and preprocessing steps such as batch correction (Chen et al., <xref ref-type="bibr" rid="B12">2011</xref>) and data normalization, if needed.</p>
<p>Moreover, public data repositories for biomedical domains, such as ophthalmology images (Khan et al., <xref ref-type="bibr" rid="B36">2021</xref>), cancer images (Clark et al., <xref ref-type="bibr" rid="B21">2013</xref>), or neuroblastoma electronic health records (Chicco et al., <xref ref-type="bibr" rid="B14">in press</xref>), can provide additional datasets that can be used as validation cohorts. Additional public datasets can be found on the University of California Irvine Machine Learning Repository (University of California Irvine, <xref ref-type="bibr" rid="B58">1987</xref>), on the DREAM Challenges platform (Kueffner et al., <xref ref-type="bibr" rid="B37">2019</xref>; Sage Bionetworks, <xref ref-type="bibr" rid="B49">2022</xref>), or on Kaggle (Kaggle, <xref ref-type="bibr" rid="B35">2022</xref>), for example.</p>
<p>When using external data, an aspect to keep in mind is checking and correcting issues like dataset shift (Finlayson et al., <xref ref-type="bibr" rid="B31">2021</xref>) and model underspecification (D&#x00027;Amour et al., <xref ref-type="bibr" rid="B23">2020</xref>), which might jeopardize the coherence of the learning pipeline when moving from training and testing and validation.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s3">
<title>3. Discussion</title>
<p>Computational intelligence makes computers able to identify trends in data that otherwise would be difficult or impossible to notice by humans. With the spread of new technologies and electronic devices able to save and store large amounts of data, data mining has become a ubiquitous tool in numerous scientific studies, especially in biomedical informatics. In these studies, the validation of the results obtained through supervised machine learning has become a crucial phase, especially because of the high risk of achieving over-optimistic, inflated results, that can even lead to false discoveries (Ioannidis, <xref ref-type="bibr" rid="B33">2005</xref>).</p>
<p>In the past, several studies proposed rules and guidelines to develop more effective and efficient predictive models in medical informatics and computational epidemiology (Steyerberg and Vergouwe, <xref ref-type="bibr" rid="B56">2014</xref>, Riley et al., <xref ref-type="bibr" rid="B48">2016</xref>, <xref ref-type="bibr" rid="B47">2021</xref>; Bonnett et al., <xref ref-type="bibr" rid="B7">2019</xref>; Wolff et al., <xref ref-type="bibr" rid="B64">2019</xref>; Navarro et al., <xref ref-type="bibr" rid="B45">2021</xref>; Van Calster et al., <xref ref-type="bibr" rid="B59">2021</xref>). Most of them however, provided complicated lists of steps and tips which might be hard to follow by machine learning practitioners, especially by beginners.</p>
<p>In this context, the article of Walsh et al. (<xref ref-type="bibr" rid="B61">2021</xref>) plays its part by describing thoroughly several DOME recommendations and steps for validating supervised machine learning results, but in our opinion it suffers from excessive complexity and might be difficult to follow by beginners. In this note, we propose our own simple, easy, essential ABC tips to keep in mind when validating results obtained with data mining methods.</p>
<p>We believe our ABC recommendations can be an effective tool to follow for all the machine learning practitioners, both by beginners and experienced ones, and can pave the way to stronger, more robust, more reliable scientific results in all the biomedical sciences.</p>
</sec>
<sec id="s4">
<title>Author contributions</title>
<p>DC conceived the study and wrote most of the article. GJ reviewed and contributed to the article.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s5">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alameer</surname> <given-names>A.</given-names></name> <name><surname>Chicco</surname> <given-names>D.</given-names></name></person-group> (<year>2022</year>). <article-title>geoCancerPrognosticDatasetsRetriever, a bioinformatics tool to easily identify cancer prognostic datasets on Gene Expression Omnibus (GEO)</article-title>. <source>Bioinformatics</source> <volume>2021</volume>:<fpage>btab852</fpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btab852</pub-id><pub-id pub-id-type="pmid">34935889</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Artrith</surname> <given-names>N.</given-names></name> <name><surname>Butler</surname> <given-names>K. T.</given-names></name> <name><surname>Coudert</surname> <given-names>F. -X.</given-names></name> <name><surname>Han</surname> <given-names>S.</given-names></name> <name><surname>Isayev</surname> <given-names>O.</given-names></name> <name><surname>Jain</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Best practices in machine learning for chemistry</article-title>. <source>Nat. Chem.</source> <volume>13</volume>, <fpage>505</fpage>&#x02013;<lpage>508</lpage>. <pub-id pub-id-type="doi">10.1038/s41557-021-00716-z</pub-id><pub-id pub-id-type="pmid">34059804</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Austin</surname> <given-names>P. C.</given-names></name> <name><surname>Putter</surname> <given-names>H.</given-names></name> <name><surname>Giardiello</surname> <given-names>D.</given-names></name> <name><surname>van Klaveren</surname> <given-names>D.</given-names></name></person-group> (<year>2022</year>). <article-title>Graphical calibration curves and the integrated calibration index (ICI) for competing risk models</article-title>. <source>Diagn. Progn. Res.</source> <volume>6</volume>, <fpage>1</fpage>&#x02013;<lpage>22</lpage>. <pub-id pub-id-type="doi">10.1186/s41512-021-00114-6</pub-id><pub-id pub-id-type="pmid">35039069</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Benjamin</surname> <given-names>D. J.</given-names></name> <name><surname>Berger</surname> <given-names>J. O.</given-names></name> <name><surname>Johannesson</surname> <given-names>M.</given-names></name> <name><surname>Nosek</surname> <given-names>B. A.</given-names></name> <name><surname>Wagenmakers</surname> <given-names>E. -J.</given-names></name> <name><surname>Berk</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Redefine statistical significance</article-title>. <source>Nat. Hum. Behav.</source> <volume>2</volume>, <fpage>6</fpage>&#x02013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1038/s41562-017-0189-z</pub-id><pub-id pub-id-type="pmid">30980045</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bhaskar</surname> <given-names>H.</given-names></name> <name><surname>Hoyle</surname> <given-names>D. C.</given-names></name> <name><surname>Singh</surname> <given-names>S.</given-names></name></person-group> (<year>2006</year>). <article-title>Machine learning in bioinformatics: a brief survey and recommendations for practitioners</article-title>. <source>Comput. Biol. Med.</source> <volume>36</volume>, <fpage>1104</fpage>&#x02013;<lpage>1125</lpage>. <pub-id pub-id-type="doi">10.1016/j.compbiomed.2005.09.002</pub-id><pub-id pub-id-type="pmid">16226240</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Bionetworks</surname> <given-names>S.</given-names></name></person-group> (<year>2021</year>). <source>Breast Cancer Prognosis DREAM Education Challenge</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.synapse.org/&#x00023;!Synapse:syn8650663/wiki/436447">https://www.synapse.org/&#x00023;!Synapse:syn8650663/wiki/436447</ext-link> (accessed August 12, 2021).</citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bonnett</surname> <given-names>L. J.</given-names></name> <name><surname>Snell</surname> <given-names>K. I. E.</given-names></name> <name><surname>Collins</surname> <given-names>G. S.</given-names></name> <name><surname>Riley</surname> <given-names>R. D.</given-names></name></person-group> (<year>2019</year>). <article-title>Guide to presenting clinical prediction models for use in clinical settings</article-title>. <source>BMJ</source> <volume>365</volume>:<fpage>l737</fpage>. <pub-id pub-id-type="doi">10.1136/bmj.l737</pub-id><pub-id pub-id-type="pmid">30995987</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Breitkreutz</surname> <given-names>B. -J.</given-names></name> <name><surname>Stark</surname> <given-names>C.</given-names></name> <name><surname>Tyers</surname> <given-names>M.</given-names></name></person-group> (<year>2003</year>). <article-title>The GRID: the general repository for interaction datasets</article-title>. <source>Genome Biol.</source> <volume>4</volume>:<fpage>R23</fpage>. <pub-id pub-id-type="doi">10.1186/gb-2003-4-2-p1</pub-id><pub-id pub-id-type="pmid">12620108</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bussola</surname> <given-names>N.</given-names></name> <name><surname>Marcolini</surname> <given-names>A.</given-names></name> <name><surname>Maggio</surname> <given-names>V.</given-names></name> <name><surname>Jurman</surname> <given-names>G.</given-names></name> <name><surname>Furlanello</surname> <given-names>C.</given-names></name></person-group> (<year>2021</year>). <article-title>AI slipping on tiles: data leakage in digital pathology,</article-title> in <source>Proceedings of ICPR 2021 &#x02013; The 25th International Conference on Pattern Recognition. ICPR International Workshops and Challenges</source> (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name>), <fpage>167</fpage>&#x02013;<lpage>182</lpage>.</citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cabitza</surname> <given-names>F.</given-names></name> <name><surname>Campagner</surname> <given-names>A.</given-names></name></person-group> (<year>2021</year>). <article-title>The need to separate the wheat from the chaff in medical informatics: introducing a comprehensive checklist for the (self)-assessment of medical AI studies</article-title>. <source>Int. J. Med. Inform.</source> <volume>153</volume>:<fpage>104510</fpage>. <pub-id pub-id-type="doi">10.1016/j.ijmedinf.2021.104510</pub-id><pub-id pub-id-type="pmid">34108105</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cearns</surname> <given-names>M.</given-names></name> <name><surname>Hahn</surname> <given-names>T.</given-names></name> <name><surname>Baune</surname> <given-names>B. T.</given-names></name></person-group> (<year>2019</year>). <article-title>Recommendations and future directions for supervised machine learning in psychiatry</article-title>. <source>Transl. Psychiatry</source> <volume>9</volume>:<fpage>271</fpage>. <pub-id pub-id-type="doi">10.1038/s41398-019-0607-2</pub-id><pub-id pub-id-type="pmid">31641106</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>C.</given-names></name> <name><surname>Grennan</surname> <given-names>K.</given-names></name> <name><surname>Badner</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>D.</given-names></name> <name><surname>Gershon</surname> <given-names>E.</given-names></name> <name><surname>Jin</surname> <given-names>L.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods</article-title>. <source>PLoS ONE</source> <volume>6</volume>:<fpage>e17238</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0017238</pub-id><pub-id pub-id-type="pmid">21386892</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chicco</surname> <given-names>D.</given-names></name></person-group> (<year>2017</year>). <article-title>Ten quick tips for machine learning in computational biology</article-title>. <source>BioData Min.</source> <volume>10</volume>:<fpage>35</fpage>. <pub-id pub-id-type="doi">10.1186/s13040-017-0155-3</pub-id><pub-id pub-id-type="pmid">29234465</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chicco</surname> <given-names>D.</given-names></name> <name><surname>Cerono</surname> <given-names>G</given-names></name> <name><surname>Cangelosi</surname> <given-names>D.</given-names></name></person-group> (in press). <article-title>A survey on publicly available open datasets of electronic health records (EHRs) of patients with neuroblastoma</article-title>. <source>Data Sci. J</source>. <fpage>1</fpage>&#x02013;<lpage>15</lpage>.</citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chicco</surname> <given-names>D.</given-names></name> <name><surname>Jurman</surname> <given-names>G.</given-names></name></person-group> (<year>2020</year>). <article-title>The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation</article-title>. <source>BMC Genomics</source> <volume>21</volume>:<fpage>6</fpage>. <pub-id pub-id-type="doi">10.1186/s12864-019-6413-7</pub-id><pub-id pub-id-type="pmid">31898477</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chicco</surname> <given-names>D.</given-names></name> <name><surname>Jurman</surname> <given-names>G.</given-names></name></person-group> (<year>2022</year>). <article-title>An invitation to greater use of Matthews correlation coefficient in robotics and artificial intelligence</article-title>. <source>Front. Robot. AI</source> <volume>9</volume>:<fpage>876814</fpage>. <pub-id pub-id-type="doi">10.3389/frobt.2022.876814</pub-id><pub-id pub-id-type="pmid">35402520</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chicco</surname> <given-names>D.</given-names></name> <name><surname>Starovoitov</surname> <given-names>V.</given-names></name> <name><surname>Jurman</surname> <given-names>G.</given-names></name></person-group> (<year>2021</year>). <article-title>The benefits of the Matthews correlation coefficient (MCC) over the diagnostic odds ratio (DOR) in binary classification assessment</article-title>. <source>IEEE Access.</source> <volume>9</volume>, <fpage>47112</fpage>&#x02013;<lpage>47124</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2021.3068614</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chicco</surname> <given-names>D.</given-names></name> <name><surname>T&#x000F6;tsch</surname> <given-names>N.</given-names></name> <name><surname>Jurman</surname> <given-names>G.</given-names></name></person-group> (<year>2021a</year>). <article-title>The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation</article-title>. <source>BioData Min.</source> <volume>14</volume>:<fpage>13</fpage>. <pub-id pub-id-type="doi">10.1186/s13040-021-00244-z</pub-id><pub-id pub-id-type="pmid">33541410</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chicco</surname> <given-names>D.</given-names></name> <name><surname>Warrens</surname> <given-names>M. J.</given-names></name> <name><surname>Jurman</surname> <given-names>G.</given-names></name></person-group> (<year>2021b</year>). <article-title>The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation</article-title>. <source>PeerJ Comput. Sci.</source> <volume>7</volume>:<fpage>e623</fpage>. <pub-id pub-id-type="doi">10.7717/peerj-cs.623</pub-id><pub-id pub-id-type="pmid">34307865</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chicco</surname> <given-names>D.</given-names></name> <name><surname>Warrens</surname> <given-names>M. J.</given-names></name> <name><surname>Jurman</surname> <given-names>G.</given-names></name></person-group> (<year>2021c</year>). <article-title>The Matthews correlation coefficient (MCC) is more informative than Cohens Kappa and Brier score in binary classification assessment</article-title>. <source>IEEE Access.</source> <volume>9</volume>, <fpage>78368</fpage>&#x02013;<lpage>78381</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2021.3084050</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clark</surname> <given-names>K.</given-names></name> <name><surname>Vendt</surname> <given-names>B.</given-names></name> <name><surname>Smith</surname> <given-names>K.</given-names></name> <name><surname>Freymann</surname> <given-names>J.</given-names></name> <name><surname>Kirby</surname> <given-names>J.</given-names></name> <name><surname>Koppel</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository</article-title>. <source>J. Digit. Imaging</source> <volume>26</volume>, <fpage>1045</fpage>&#x02013;<lpage>1057</lpage>. <pub-id pub-id-type="doi">10.1007/s10278-013-9622-7</pub-id><pub-id pub-id-type="pmid">23884657</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Crowson</surname> <given-names>C. S.</given-names></name> <name><surname>Atkinson</surname> <given-names>E. J.</given-names></name> <name><surname>Therneau</surname> <given-names>T. M.</given-names></name></person-group> (<year>2016</year>). <article-title>Assessing calibration of prognostic risk scores</article-title>. <source>Stat. Methods Med. Res.</source> <volume>25</volume>, <fpage>1692</fpage>&#x02013;<lpage>1706</lpage>. <pub-id pub-id-type="doi">10.1177/0962280213497434</pub-id><pub-id pub-id-type="pmid">28818033</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>D&#x00027;Amour</surname> <given-names>A.</given-names></name> <name><surname>Heller</surname> <given-names>K.</given-names></name> <name><surname>Moldovan</surname> <given-names>D.</given-names></name> <name><surname>Adlam</surname> <given-names>B.</given-names></name> <name><surname>Alipanahi</surname> <given-names>B.</given-names></name> <name><surname>Beutel</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Underspecification presents challenges for credibility in modern machine learning</article-title>. <source>arXiv Preprint arXiv:2011.03395</source>. <pub-id pub-id-type="doi">10.48550/arXiv.2011.03395</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dietterich</surname> <given-names>T.</given-names></name></person-group> (<year>1995</year>). <article-title>Overfitting and undercomputing in machine learning</article-title>. <source>ACM Comput. Surveys</source> <volume>27</volume>, <fpage>326</fpage>&#x02013;<lpage>327</lpage>. <pub-id pub-id-type="doi">10.1145/212094.212114</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Domingos</surname> <given-names>P.</given-names></name></person-group> (<year>2012</year>). <article-title>A few useful things to know about machine learning</article-title>. <source>Commun. ACM</source> <volume>55</volume>, <fpage>78</fpage>&#x02013;<lpage>87</lpage>. <pub-id pub-id-type="doi">10.1145/2347736.2347755</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="web"><person-group person-group-type="author"><collab>DrivenData.org</collab></person-group> (<year>2022</year>). <source>Overhead Geopose Challenge</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.drivendata.org/competitions/78/competition-overhead-geopose/page/372/">https://www.drivendata.org/competitions/78/competition-overhead-geopose/page/372/</ext-link> (accessed August 12, 2021).</citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Edgar</surname> <given-names>R.</given-names></name> <name><surname>Domrachev</surname> <given-names>M.</given-names></name> <name><surname>Lash</surname> <given-names>A. E.</given-names></name></person-group> (<year>2002</year>). <article-title>Gene Expression Omnibus: NCBI gene expression and hybridization array data repository</article-title>. <source>Nucl. Acids Res.</source> <volume>30</volume>, <fpage>207</fpage>&#x02013;<lpage>210</lpage>. <pub-id pub-id-type="doi">10.1093/nar/30.1.207</pub-id><pub-id pub-id-type="pmid">11752295</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Efron</surname> <given-names>B.</given-names></name></person-group> (<year>1992</year>). <article-title>Bootstrap methods: another look at the jackknife,</article-title> in <source>Breakthroughs in Statistics</source>, eds <person-group person-group-type="editor"><name><surname>Kotz</surname> <given-names>S.</given-names></name> <name><surname>Johnson</surname> <given-names>N. L.</given-names></name></person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>569</fpage>&#x02013;<lpage>593</lpage>. <pub-id pub-id-type="doi">10.1007/978-1-4612-4380-9_41</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Efron</surname> <given-names>B.</given-names></name> <name><surname>Tibshirani</surname> <given-names>R. J.</given-names></name></person-group> (<year>1994</year>). <source>An Introduction to the Bootstrap</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>CRC Press</publisher-name>. <pub-id pub-id-type="doi">10.1201/9780429246593</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Feurer</surname> <given-names>M.</given-names></name> <name><surname>Hutter</surname> <given-names>F.</given-names></name></person-group> (<year>2019</year>). <article-title>Hyperparameter optimization,</article-title> in <source>Automated Machine Learning</source>, eds <person-group person-group-type="editor"><name><surname>Hutter</surname> <given-names>F.</given-names></name> <name><surname>Kotthoff</surname> <given-names>L.</given-names></name> <name><surname>Vanschoren</surname> <given-names>J.</given-names></name></person-group> (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>3</fpage>&#x02013;<lpage>33</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-05318-5_1</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Finlayson</surname> <given-names>S. G.</given-names></name> <name><surname>Subbaswamy</surname> <given-names>A.</given-names></name> <name><surname>Singh</surname> <given-names>K.</given-names></name> <name><surname>Bowers</surname> <given-names>J.</given-names></name> <name><surname>Kupke</surname> <given-names>A.</given-names></name> <name><surname>Zittrain</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>The clinician and dataset shift in artificial intelligence</article-title>. <source>N. Engl. J. Med.</source> <volume>385</volume>, <fpage>283</fpage>&#x02013;<lpage>286</lpage>. <pub-id pub-id-type="doi">10.1056/NEJMc2104626</pub-id><pub-id pub-id-type="pmid">34260843</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><collab>Gene Ontology Consortium</collab></person-group> (<year>2019</year>). <article-title>The Gene Ontology resource: 20 years and still GOing strong</article-title>. <source>Nucl. Acids Res.</source> <volume>47</volume>, <fpage>D330</fpage>&#x02013;<lpage>D338</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gky1055</pub-id><pub-id pub-id-type="pmid">30395331</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ioannidis</surname> <given-names>J. P.</given-names></name></person-group> (<year>2005</year>). <article-title>Why most published research findings are false</article-title>. <source>PLOS Med.</source> <volume>2</volume>:<fpage>e124</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pmed.0020124</pub-id><pub-id pub-id-type="pmid">16060722</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jensen</surname> <given-names>D.</given-names></name></person-group> (<year>2000</year>). <article-title>Data snooping, dredging and fishing: the dark side of data mining a SIGKDD99 panel report</article-title>. <source>ACM SIGKDD Explor. Newsl.</source> <volume>1</volume>, <fpage>52</fpage>&#x02013;<lpage>54</lpage>. <pub-id pub-id-type="doi">10.1145/846183.846195</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="web"><person-group person-group-type="author"><collab>Kaggle</collab></person-group> (<year>2022</year>). <source>Kaggle.com &#x02013; Find Open Datasets</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.kaggle.com/datasets">https://www.kaggle.com/datasets</ext-link> (accessed March 27, 2022).</citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Khan</surname> <given-names>S. M.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name> <name><surname>Nath</surname> <given-names>S.</given-names></name> <name><surname>Korot</surname> <given-names>E.</given-names></name> <name><surname>Faes</surname> <given-names>L.</given-names></name> <name><surname>Wagner</surname> <given-names>S. K.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>A global review of publicly available datasets for ophthalmological imaging: barriers to access, usability, and generalisability</article-title>. <source>Lancet Digit. Health</source> <volume>3</volume>, <fpage>e51</fpage>&#x02013;<lpage>e66</lpage>. <pub-id pub-id-type="doi">10.1016/S2589-7500(20)30240-5</pub-id><pub-id pub-id-type="pmid">33735069</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kueffner</surname> <given-names>R.</given-names></name> <name><surname>Zach</surname> <given-names>N.</given-names></name> <name><surname>Bronfeld</surname> <given-names>M.</given-names></name> <name><surname>Norel</surname> <given-names>R.</given-names></name> <name><surname>Atassi</surname> <given-names>N.</given-names></name> <name><surname>Balagurusamy</surname> <given-names>V.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Stratification of amyotrophic lateral sclerosis patients: a crowdsourcing approach</article-title>. <source>Sci. Reports</source> <volume>9</volume>:<fpage>690</fpage>. <pub-id pub-id-type="doi">10.1038/s41598-018-36873-4</pub-id><pub-id pub-id-type="pmid">30679616</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kustra</surname> <given-names>R.</given-names></name> <name><surname>Zagdanski</surname> <given-names>A.</given-names></name></person-group> (<year>2008</year>). <article-title>Data-fusion in clustering microarray data: balancing discovery and interpretability</article-title>. <source>IEEE/ACM Trans. Comput. Biol. Bioinform.</source> <volume>7</volume>, <fpage>50</fpage>&#x02013;<lpage>63</lpage>. <pub-id pub-id-type="doi">10.1109/TCBB.2007.70267</pub-id><pub-id pub-id-type="pmid">20150668</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Larra&#x000F1;aga</surname> <given-names>P.</given-names></name> <name><surname>Calvo</surname> <given-names>B.</given-names></name> <name><surname>Santana</surname> <given-names>R.</given-names></name> <name><surname>Bielza</surname> <given-names>C.</given-names></name> <name><surname>Galdiano</surname> <given-names>J.</given-names></name> <name><surname>Inza</surname> <given-names>I.</given-names></name> <etal/></person-group>. (<year>2006</year>). <article-title>Machine learning in bioinformatics</article-title>. <source>Brief. Bioinform.</source> <volume>7</volume>, <fpage>86</fpage>&#x02013;<lpage>112</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbk007</pub-id><pub-id pub-id-type="pmid">16761367</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Larson</surname> <given-names>D. B.</given-names></name> <name><surname>Harvey</surname> <given-names>H.</given-names></name> <name><surname>Rubin</surname> <given-names>D. L.</given-names></name> <name><surname>Irani</surname> <given-names>N.</given-names></name> <name><surname>Justin</surname> <given-names>R. T.</given-names></name> <name><surname>Langlotz</surname> <given-names>C. P.</given-names></name></person-group> (<year>2021</year>). <article-title>Regulatory frameworks for development and evaluation of artificial intelligence&#x02013;based diagnostic imaging algorithms: summary and recommendations</article-title>. <source>J. Amer. Coll. Radiol.</source> <volume>18</volume>, <fpage>413</fpage>&#x02013;<lpage>424</lpage>. <pub-id pub-id-type="doi">10.1016/j.jacr.2020.09.060</pub-id><pub-id pub-id-type="pmid">33096088</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>B. D.</given-names></name> <name><surname>Gitter</surname> <given-names>A.</given-names></name> <name><surname>Greene</surname> <given-names>C. S.</given-names></name> <name><surname>Raschka</surname> <given-names>S.</given-names></name> <name><surname>Maguire</surname> <given-names>F.</given-names></name> <name><surname>Titus</surname> <given-names>A. J.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Ten quick tips for deep learning in biology</article-title>. <source>PLoS Comput. Biol.</source> <volume>18</volume>:<fpage>e1009803</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1009803</pub-id><pub-id pub-id-type="pmid">35324884</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Makin</surname> <given-names>T. R.</given-names></name> <name><surname>de Xivry</surname> <given-names>J.-J. O.</given-names></name></person-group> (<year>2019</year>). <article-title>Science forum: ten common statistical mistakes to watch out for when writing or reviewing a manuscript</article-title>. <source>eLife</source> <volume>8</volume>:<fpage>e48175</fpage>. <pub-id pub-id-type="doi">10.7554/eLife.48175.005</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><collab>MAQC Consortium</collab></person-group> (<year>2010</year>). <article-title>The MicroArray quality control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models</article-title>. <source>Nat. Biotechnol.</source> <volume>28</volume>, <fpage>827</fpage>&#x02013;<lpage>838</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.1665</pub-id><pub-id pub-id-type="pmid">20676074</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Matthews</surname> <given-names>B. W.</given-names></name></person-group> (<year>1975</year>). <article-title>Comparison of the predicted and observed secondary structure of T4 phage lysozyme</article-title>. <source>Biochim. Biophys. Acta Prot. Struct.</source> <volume>405</volume>, <fpage>442</fpage>&#x02013;<lpage>451</lpage>. <pub-id pub-id-type="doi">10.1016/0005-2795(75)90109-9</pub-id><pub-id pub-id-type="pmid">1180967</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Navarro</surname> <given-names>C. L. A.</given-names></name> <name><surname>Damen</surname> <given-names>J. A.</given-names></name> <name><surname>Takada</surname> <given-names>T.</given-names></name> <name><surname>Nijman</surname> <given-names>S. W.</given-names></name> <name><surname>Dhiman</surname> <given-names>P.</given-names></name> <name><surname>Ma</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review</article-title>. <source>BMJ</source> <volume>375</volume>:<fpage>n2281</fpage>. <pub-id pub-id-type="doi">10.1136/bmj.n2281</pub-id><pub-id pub-id-type="pmid">34670780</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Picard</surname> <given-names>R. R.</given-names></name> <name><surname>Berk</surname> <given-names>K. N.</given-names></name></person-group> (<year>1990</year>). <article-title>Data splitting</article-title>. <source>Amer. Stat.</source> <volume>44</volume>, <fpage>140</fpage>&#x02013;<lpage>147</lpage>. <pub-id pub-id-type="doi">10.1080/00031305.1990.10475704</pub-id></citation>
</ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Riley</surname> <given-names>R. D.</given-names></name> <name><surname>Debray</surname> <given-names>T. P. A.</given-names></name> <name><surname>Collins</surname> <given-names>G. S.</given-names></name> <name><surname>Archer</surname> <given-names>L.</given-names></name> <name><surname>Ensor</surname> <given-names>J.</given-names></name> <name><surname>Smeden</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Minimum sample size for external validation of a clinical prediction model with a binary outcome</article-title>. <source>Stat. Med.</source> <volume>40</volume>, <fpage>4230</fpage>&#x02013;<lpage>4251</lpage>. <pub-id pub-id-type="doi">10.1002/sim.9025</pub-id><pub-id pub-id-type="pmid">34031906</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Riley</surname> <given-names>R. D.</given-names></name> <name><surname>Ensor</surname> <given-names>J.</given-names></name> <name><surname>Snell</surname> <given-names>K. I. E.</given-names></name> <name><surname>Debray</surname> <given-names>T. P. A.</given-names></name> <name><surname>Altman</surname> <given-names>D. G.</given-names></name> <name><surname>Moons</surname> <given-names>K. G. M.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges</article-title>. <source>BMJ</source> <volume>353</volume>:<fpage>i3140</fpage>. <pub-id pub-id-type="doi">10.1136/bmj.i3140</pub-id><pub-id pub-id-type="pmid">31239248</pub-id></citation></ref>
<ref id="B49">
<citation citation-type="web"><person-group person-group-type="author"><collab>Sage Bionetworks</collab></person-group> (<year>2022</year>). <source>DREAM Challenges Publications</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://dreamchallenges.org/publications/">https://dreamchallenges.org/publications/</ext-link> (accessed January 17, 2022).</citation>
</ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><collab>SEQC/MAQC-III Consortium</collab></person-group> (<year>2014</year>). <article-title>A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing quality control consortium</article-title>. <source>Nat. Biotechnol.</source> <volume>32</volume>, <fpage>903</fpage>&#x02013;<lpage>914</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.2957</pub-id><pub-id pub-id-type="pmid">25150838</pub-id></citation></ref>
<ref id="B51">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Sewell</surname> <given-names>M.</given-names></name></person-group> (<year>2021</year>). <source>Data Snooping</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://data-snooping.martinsewell.com">http://data-snooping.martinsewell.com</ext-link> (accessed August 6, 2021).</citation>
</ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shin</surname> <given-names>S.</given-names></name> <name><surname>Austin</surname> <given-names>P. C.</given-names></name> <name><surname>Ross</surname> <given-names>H. J.</given-names></name> <name><surname>Abdel-Qadir</surname> <given-names>H.</given-names></name> <name><surname>Freitas</surname> <given-names>C.</given-names></name> <name><surname>Tomlinson</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Machine learning vs. conventional statistical models for predicting heart failure readmission and mortality</article-title>. <source>ESC Heart Fail.</source> <volume>8</volume>, <fpage>106</fpage>&#x02013;<lpage>115</lpage>. <pub-id pub-id-type="doi">10.1002/ehf2.13073</pub-id><pub-id pub-id-type="pmid">33205591</pub-id></citation></ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Skocik</surname> <given-names>M.</given-names></name> <name><surname>Collins</surname> <given-names>J.</given-names></name> <name><surname>Callahan-Flintoft</surname> <given-names>C.</given-names></name> <name><surname>Bowman</surname> <given-names>H.</given-names></name> <name><surname>Wyble</surname> <given-names>B.</given-names></name></person-group> (<year>2016</year>). <article-title>I tried a bunch of things: the dangers of unexpected overfitting in classification</article-title>. <source>bioRxiv</source> <volume>2016</volume>:<fpage>078816</fpage>. <pub-id pub-id-type="doi">10.1101/078816</pub-id><pub-id pub-id-type="pmid">33035522</pub-id></citation></ref>
<ref id="B54">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>M. K.</given-names></name></person-group> (<year>2021</year>). <source>Data snooping</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://web.ma.utexas.edu/users/mks/statmistakes/datasnooping.html">https://web.ma.utexas.edu/users/mks/statmistakes/datasnooping.html</ext-link> (accessed August 5, 2021).</citation>
</ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stevens</surname> <given-names>L. M.</given-names></name> <name><surname>Mortazavi</surname> <given-names>B. J.</given-names></name> <name><surname>Deo</surname> <given-names>R. C.</given-names></name> <name><surname>Curtis</surname> <given-names>L.</given-names></name> <name><surname>Kao</surname> <given-names>D. P.</given-names></name></person-group> (<year>2020</year>). <article-title>Recommendations for reporting machine learning analyses in clinical research</article-title>. <source>Circ. Cardiovasc. Qual. Outcomes</source> <volume>13</volume>:<fpage>e006556</fpage>. <pub-id pub-id-type="doi">10.1161/CIRCOUTCOMES.120.006556</pub-id><pub-id pub-id-type="pmid">33079589</pub-id></citation></ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Steyerberg</surname> <given-names>E. W.</given-names></name> <name><surname>Vergouwe</surname> <given-names>Y.</given-names></name></person-group> (<year>2014</year>). <article-title>Towards better clinical prediction models: seven steps for development and an ABCD for validation</article-title>. <source>Eur. Heart J.</source> <volume>35</volume>, <fpage>1925</fpage>&#x02013;<lpage>1931</lpage>. <pub-id pub-id-type="doi">10.1093/eurheartj/ehu207</pub-id><pub-id pub-id-type="pmid">24898551</pub-id></citation></ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tarca</surname> <given-names>A. L.</given-names></name> <name><surname>Carey</surname> <given-names>V. J.</given-names></name> <name><surname>Chen</surname> <given-names>X.-W.</given-names></name> <name><surname>Romero</surname> <given-names>R.</given-names></name> <name><surname>Dr&#x00103;ghici</surname> <given-names>S.</given-names></name></person-group> (<year>2007</year>). <article-title>Machine learning and its applications to biology</article-title>. <source>PLoS Comput. Biol.</source> <volume>3</volume>:<fpage>e116</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.0030116</pub-id><pub-id pub-id-type="pmid">17604446</pub-id></citation></ref>
<ref id="B58">
<citation citation-type="web"><person-group person-group-type="author"><collab>University of California Irvine</collab></person-group> (<year>1987</year>). <source>Machine Learning Repository</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://archive.ics.uci.edu/ml">https://archive.ics.uci.edu/ml</ext-link> (accessed January 12, 2021).</citation>
</ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Van Calster</surname> <given-names>B.</given-names></name> <name><surname>Wynants</surname> <given-names>L.</given-names></name> <name><surname>Riley</surname> <given-names>R. D.</given-names></name> <name><surname>van Smeden</surname> <given-names>M.</given-names></name> <name><surname>Collins</surname> <given-names>G. S.</given-names></name></person-group> (<year>2021</year>). <article-title>Methodology over metrics: current scientific standards are a disservice to patients and society</article-title>. <source>J. Clin. Epidemiol.</source> <volume>138</volume>, <fpage>219</fpage>&#x02013;<lpage>226</lpage>. <pub-id pub-id-type="doi">10.1016/j.jclinepi.2021.05.018</pub-id><pub-id pub-id-type="pmid">34077797</pub-id></citation></ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wainberg</surname> <given-names>M.</given-names></name> <name><surname>Alipanahi</surname> <given-names>B.</given-names></name> <name><surname>Frey</surname> <given-names>B. J.</given-names></name></person-group> (<year>2016</year>). <article-title>Are random forests truly the best classifiers?</article-title> <source>J. Mach. Learn. Res.</source> <volume>17</volume>, <fpage>3837</fpage>&#x02013;<lpage>3841</lpage>. <pub-id pub-id-type="doi">10.5555/2946645.3007063</pub-id></citation>
</ref>
<ref id="B61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Walsh</surname> <given-names>I.</given-names></name> <name><surname>Fishman</surname> <given-names>D.</given-names></name> <name><surname>Garcia-Gasulla</surname> <given-names>D.</given-names></name> <name><surname>Titma</surname> <given-names>T.</given-names></name> <name><surname>Pollastri</surname> <given-names>G.</given-names></name> <name><surname>Capriotti</surname> <given-names>E.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>DOME: recommendations for supervised machine learning validation in biology</article-title>. <source>Nat. Methods</source> <volume>5</volume>, <fpage>1122</fpage>&#x02013;<lpage>1127</lpage>. <pub-id pub-id-type="doi">10.1038/s41592-021-01205-4</pub-id><pub-id pub-id-type="pmid">34556867</pub-id></citation></ref>
<ref id="B62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Whalen</surname> <given-names>S.</given-names></name> <name><surname>Schreiber</surname> <given-names>J.</given-names></name> <name><surname>Noble</surname> <given-names>W. S.</given-names></name> <name><surname>Pollard</surname> <given-names>K. S.</given-names></name></person-group> (<year>2021</year>). <article-title>Navigating the pitfalls of applying machine learning in genomics</article-title>. <source>Nat. Rev. Genet.</source> <volume>23</volume>, <fpage>169</fpage>&#x02013;<lpage>181</lpage>. <pub-id pub-id-type="doi">10.1038/s41576-021-00434-9</pub-id><pub-id pub-id-type="pmid">34837041</pub-id></citation></ref>
<ref id="B63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>White</surname> <given-names>H.</given-names></name></person-group> (<year>2000</year>). <article-title>A reality check for data snooping</article-title>. <source>Econometrica</source> <volume>68</volume>, <fpage>1097</fpage>&#x02013;<lpage>1126</lpage>. <pub-id pub-id-type="doi">10.1111/1468-0262.00152</pub-id><pub-id pub-id-type="pmid">24245721</pub-id></citation></ref>
<ref id="B64">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wolff</surname> <given-names>R. F.</given-names></name> <name><surname>Moons</surname> <given-names>K. G.</given-names></name> <name><surname>Riley</surname> <given-names>R. D.</given-names></name> <name><surname>Whiting</surname> <given-names>P. F.</given-names></name> <name><surname>Westwood</surname> <given-names>M.</given-names></name> <name><surname>Collins</surname> <given-names>G. S.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>PROBAST: a tool to assess the risk of bias and applicability of prediction model studies</article-title>. <source>Ann. Intern. Med.</source> <volume>170</volume>, <fpage>51</fpage>&#x02013;<lpage>58</lpage>. <pub-id pub-id-type="doi">10.7326/M18-1376</pub-id><pub-id pub-id-type="pmid">30596876</pub-id></citation></ref>
<ref id="B65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wright</surname> <given-names>S.</given-names></name></person-group> (<year>1921</year>). <article-title>Correlation and causation</article-title>. <source>J. Agric. Res.</source> <fpage>557</fpage>&#x02013;<lpage>585</lpage>.</citation>
</ref>
<ref id="B66">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yadav</surname> <given-names>S.</given-names></name> <name><surname>Shukla</surname> <given-names>S.</given-names></name></person-group> (<year>2016</year>). <article-title>Analysis of <italic>k</italic>-fold cross-validation over hold-out validation on colossal datasets for quality classification,</article-title> in <source>Proceedings of IACC 2016&#x02014;the 6th International Conference on Advanced Computing</source> (<publisher-loc>Bhimavaram</publisher-loc>), <fpage>78</fpage>&#x02013;<lpage>83</lpage>.</citation>
</ref>
<ref id="B67">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>W.</given-names></name> <name><surname>Yu</surname> <given-names>Y.</given-names></name> <name><surname>Hertwig</surname> <given-names>F.</given-names></name> <name><surname>Thierry-Mieg</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>W.</given-names></name> <name><surname>Thierry-Mieg</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Comparison of RNA-seq and microarray-based models for clinical endpoint prediction</article-title>. <source>Genome Biol.</source> <volume>16</volume>:<fpage>133</fpage>. <pub-id pub-id-type="doi">10.1186/s13059-015-0694-1</pub-id><pub-id pub-id-type="pmid">26109056</pub-id></citation></ref>
</ref-list> 
</back>
</article> 