<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article
  PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="brief-report" dtd-version="2.3">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fgene.2019.00899</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Perspective</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>
On the Use of the Pearson Correlation Coefficient for Model Evaluation in Genome-Wide Prediction</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Waldmann</surname>
<given-names>Patrik</given-names>
</name>
<xref ref-type="author-notes" rid="fn001"><sup>*</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/101425"/>
</contrib>
</contrib-group>
<aff id="aff1"><institution>Department of Animal Breeding and Genetics, The Swedish Universiy of Agricultural Sciences, SLU</institution>, <addr-line>Uppsala</addr-line>, <country>Sweden</country></aff>
<author-notes>
<fn fn-type="edited-by">
<p>Edited by: Han Mulder, Wageningen University &amp; Research, Netherlands</p>
</fn>
<fn fn-type="edited-by">
<p>Reviewed by: Zhe Zhang, South China Agricultural University, China; Xiangdong Ding, China Agricultural University (CAU), China; Mario Calus, Wageningen University &amp; Research, Netherlands</p>
</fn>
<fn fn-type="corresp" id="fn001">
<p>*Correspondence: Patrik Waldmann, <email xlink:href="mailto:Patrik.Waldmann@slu.se">Patrik.Waldmann@slu.se</email></p>
</fn>
<fn fn-type="other" id="fn002">
<p>This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>26</day>
<month>09</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="collection">
<year>2019</year>
</pub-date>
<volume>10</volume>
<elocation-id>899</elocation-id>
<history>
<date date-type="received">
<day>03</day>
<month>05</month>
<year>2019</year>
</date>
<date date-type="accepted">
<day>23</day>
<month>08</month>
<year>2019</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2019 Waldmann</copyright-statement>
<copyright-year>2019</copyright-year>
<copyright-holder>Waldmann</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>The large number of markers in genome-wide prediction demands the use of methods with regularization and model comparison based on some hold-out test prediction error measure. In quantitative genetics, it is common practice to calculate the Pearson correlation coefficient (<italic>r<sup>2</sup></italic>) as a standardized measure of the predictive accuracy of a model. Based on arguments from the bias&#x2013;variance trade-off theory in statistical learning, we show that shrinkage of the regression coefficients (i.e., QTL effects) reduces the prediction mean squared error (MSE) by introducing model bias compared with the ordinary least squares method. We also show that the LASSO and the adaptive LASSO (ALASSO) can reduce the model bias and prediction MSE by adding model variance. In an application of ridge regression, the LASSO and ALASSO to a simulated example based on results for 9,723 SNPs and 3,226 individuals, the best model selected was with the LASSO when <italic>r<sup>2</sup></italic> was used as a measure. However, when model selection was based on test MSE and coefficient of determination <italic>R<sup>2</sup></italic> the ALASSO proved to be the best method. Hence, use of <italic>r<sup>2</sup></italic> may lead to selection of the wrong model and therefore also nonoptimal ranking of phenotype predictions and genomic breeding values. Instead, we propose use of the test MSE for model selection and <italic>R<sup>2</sup></italic> as a standardized measure of the accuracy.</p>
</abstract>
<kwd-group>
<kwd>genomic selection</kwd>
<kwd>model comparison</kwd>
<kwd>accuracy</kwd>
<kwd>bias&#x2013;variance trade-off</kwd>
<kwd>coefficient of determination</kwd>
</kwd-group>
<counts>
<fig-count count="0"/>
<table-count count="2"/>
<equation-count count="3"/>
<ref-count count="29"/>
<page-count count="4"/>
<word-count count="2476"/>
</counts>
</article-meta>
</front>
<body>
<sec id="s1" sec-type="intro">
<title>Introduction</title>
<p>At the heart of classical quantitative genetics is linear model theory (<xref ref-type="bibr" rid="B17">Lynch and Walsh, 1998</xref>). Statistical inference in linear models mostly falls within the ordinary least squares (OLS) and maximum likelihood (ML) frameworks (<xref ref-type="bibr" rid="B1">Casella and Berger, 2002</xref>). The recent transition from pedigree-based classical quantitative genetics to prediction based on genome-wide markers involves some steps where the characteristics of the data complicate statistical inference and may have profound effects on model selection.</p>
<p>One of the most important factors is the number of markers <italic>p</italic> in relation to the number of individuals <italic>n</italic>. If <italic>p</italic> &lt; &lt; <italic>n</italic>, we can set up the linear model <italic>y</italic> = <italic>X &#x3b2;</italic> + <italic>e</italic> where each individual genotype score (0,1, or 2) is collected in a matrix <italic>X</italic> (standardized over columns to have mean equal to zero and variance equal to one) and the corresponding phenotypes in a vector <italic>y</italic> (centered to have a mean of zero), and then use standard OLS to obtain unbiased solutions to the regression coefficients of the genetic markers, i.e., <italic>&#x3b2;<sub>OLS</sub></italic> = <italic>(X<sup>T</sup></italic> <italic>X)<sup>-1</sup></italic> <italic>y</italic>. Note that this is also the solution to the ML function <inline-formula>
<mml:math display="inline" id="M1">
<mml:mrow><mml:mover accent="true"><mml:mi>&#x3b2;</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mi>arg</mml:mi><mml:mo>&#x2061;</mml:mo></mml:mrow>
</mml:math>
</inline-formula> max <italic>p(y</italic> | <italic>X, &#x3b2;)</italic>. It is straightforward to incorporate dominance and epistasis into <italic>X</italic> using indicator variables. The predicted phenotypes are calculated as <inline-formula>
<mml:math display="inline" id="M2">
<mml:mrow><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mi>X</mml:mi><mml:mtext>&#xa0;</mml:mtext><mml:mover accent="true"><mml:mi>&#x3b2;</mml:mi><mml:mo>^</mml:mo></mml:mover></mml:mrow>
</mml:math></inline-formula> and the residuals as <italic>e</italic> = <italic>y</italic> - <italic>&#x177;</italic>. Based on the residuals, it is possible to calculate the residual sum of squares RSS = <italic>e<sup>T</sup></italic> <italic>e</italic>, the OLS error variance <inline-formula>
<mml:math display="inline" id="M3"><mml:mrow><mml:msubsup><mml:mi mathvariant="normal">&#x03C3;</mml:mi><mml:mi>e</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:mi>RSS</mml:mi><mml:mo>/</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>n</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math>
</inline-formula>, and the mean squared error:</p>
<disp-formula>
<label>(1)</label>
<mml:math display="block" id="M4">
<mml:mrow><mml:mi>MSE</mml:mi><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mi>n</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mtext>&#xa0;</mml:mtext><mml:mstyle displaystyle="true"><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mrow><mml:mtext>&#xa0;</mml:mtext><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mi>RSS</mml:mi><mml:mo>/</mml:mo><mml:mi>n</mml:mi><mml:mo>.</mml:mo></mml:mrow></mml:mstyle></mml:mrow>
</mml:math>
</disp-formula>
<p>We can also obtain the variances (diagonal terms) and covariances (off-diagonal terms) of the regression coefficients as <inline-formula>
<mml:math display="inline" id="M5">
<mml:mrow><mml:mi>COV</mml:mi><mml:mo stretchy="false">[</mml:mo><mml:mover accent="true"><mml:mi mathvariant="normal">&#x3b2;</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mo stretchy="false">]</mml:mo><mml:mo>=</mml:mo><mml:msubsup><mml:mi mathvariant="normal">&#x3c3;</mml:mi><mml:mi>e</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>X</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi>X</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:mrow>
</mml:math>
</inline-formula> (<xref ref-type="bibr" rid="B20">Ravishanker and Dey, 2002</xref>). However, for estimation of the genomic variance <inline-formula>
<mml:math display="inline" id="M6">
<mml:mrow><mml:msubsup><mml:mi mathvariant="normal">&#x3c3;</mml:mi><mml:mi>g</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow>
</mml:math>
</inline-formula> and the genomic heritability <inline-formula>
<mml:math display="inline" id="M7">
<mml:mrow><mml:msubsup><mml:mi>h</mml:mi><mml:mi>g</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi mathvariant="normal">&#x3c3;</mml:mi><mml:mi>g</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo>/</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi mathvariant="normal">&#x3c3;</mml:mi><mml:mi>g</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="normal">&#x3c3;</mml:mi><mml:mi>e</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mrow>
</mml:math>
</inline-formula> it is necessary to use some random effects model where the covariance structure is based on the outer product <italic>X X<sup>T</sup></italic> instead of the inner product <italic>X<sup>T</sup></italic> <italic>X</italic> (<xref ref-type="bibr" rid="B19">Morota and Gianola, 2014</xref>; <xref ref-type="bibr" rid="B3">de los Campos et al., 2015</xref>). When <italic>p</italic> &lt; &lt; <italic>n</italic>, OLS will give unbiased estimates of the genomic parameters with low variance. However, if <italic>n</italic> is not much larger than <italic>p</italic>, there can be considerable variability in the OLS fit, resulting in overfitting with very small, or even zero error variance, and consequently incorrect predictions of future observations. Hence, it is advisable to cast OLS into a supervized statistical learning framework where the data are split into training and test sets, and MSE is evaluated on the test set (<xref ref-type="bibr" rid="B11">Hastie et al., 2009</xref>).</p>
</sec>
<sec id="s2">
<title>Regularization</title>
<p>Although the number of genotyped individuals is generally increasing, the experimental setting in genomic prediction is often that <italic>p</italic> &gt; <italic>n</italic> or even <italic>p</italic> &gt; &gt; <italic>n</italic>. This is an example of a high-dimensional statistical problem which leads to certain challenges (<xref ref-type="bibr" rid="B14">Johnstone and Titterington, 2009</xref>; <xref ref-type="bibr" rid="B4">Fan et al., 2014</xref>). Standard OLS is not applicable in this situation, because <italic>X<sup>T</sup></italic> <italic>X</italic> is singular (i.e., does not have an inverse) and the parameters in the regression model cannot be uniquely estimated. One approach to overcome the singularity problem is to use regularization (also known as penalization). An early example of this is ridge regression (RR) (<xref ref-type="bibr" rid="B13">Hoerl and Kennard, 1970</xref>), in which the regression coefficient is estimated using <inline-formula>
<mml:math display="inline" id="M8">
<mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>&#x3b2;</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mrow><mml:mi>R</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>X</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mtext>&#xa0;</mml:mtext><mml:mi>X</mml:mi><mml:mo>+</mml:mo><mml:mi mathvariant="normal">&#x3bb;</mml:mi><mml:mtext>&#xa0;</mml:mtext><mml:msub><mml:mi>I</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mi>X</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi>y</mml:mi></mml:mrow>
</mml:math>
</inline-formula>
, where <italic>I<sub>p</sub></italic> is an identity matrix and &#x3bb; is a positive penalty parameter that needs to be tuned using training and test data. Note that genomic best unbiased linear prediction (GBLUP) is a form of random effects RR, where <inline-formula>
<mml:math display="inline" id="M9">
<mml:mrow><mml:mi>&#x3bb;</mml:mi><mml:mo>=</mml:mo><mml:msubsup><mml:mi mathvariant="normal">&#x3c3;</mml:mi><mml:mi>e</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo>/</mml:mo><mml:msubsup><mml:mi mathvariant="normal">&#x3c3;</mml:mi><mml:mi>g</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow>
</mml:math>
</inline-formula>
 and the genomic relationship matrix <italic>G</italic> is calculated based on <italic>XX<sup>T</sup></italic> (<xref ref-type="bibr" rid="B9">Goddard, 2009</xref>; <xref ref-type="bibr" rid="B19">Morota and Gianola, 2014</xref>). There is also a Bayesian rationale for RR where the regression coefficients follows a normal prior, <inline-formula>
<mml:math display="inline" id="M10">
<mml:mrow><mml:mi>&#x3b2;</mml:mi><mml:mo>&#x223c;</mml:mo><mml:mi>N</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi mathvariant="normal">&#x3c3;</mml:mi><mml:mi>e</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo>/</mml:mo><mml:mi>&#x3bb;</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mi>I</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow>
</mml:math>
</inline-formula>. The RR estimator has some interesting properties. Firstly, both the expectation <inline-formula>
<mml:math display="inline" id="M11">
<mml:mrow><mml:mi mathvariant="normal">E</mml:mi><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>&#x3b2;</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mrow><mml:mi>R</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">]</mml:mo></mml:mrow>
</mml:math>
</inline-formula> and the variance <inline-formula>
<mml:math display="inline" id="M12">
<mml:mrow><mml:mi>VAR</mml:mi><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>&#x3b2;</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mrow><mml:mi>R</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">]</mml:mo></mml:mrow>
</mml:math>
</inline-formula> tend towards zero when &#x3bb; goes to infinity. Secondly, compared with OLS estimates, <inline-formula>
<mml:math display="inline" id="M13">
<mml:mrow><mml:mi mathvariant="normal">E</mml:mi><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>&#x3b2;</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mrow><mml:mi>R</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">]</mml:mo></mml:mrow>
</mml:math>
</inline-formula> is biased, and the variance of the OLS estimator <inline-formula>
<mml:math display="inline" id="M14">
<mml:mrow><mml:mi>VAR</mml:mi><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>&#x3b2;</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mrow><mml:mi>O</mml:mi><mml:mi>L</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">]</mml:mo></mml:mrow>
</mml:math>
</inline-formula> is always larger than <inline-formula>
<mml:math display="inline" id="M15">
<mml:mrow><mml:mi>VAR</mml:mi><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>&#x3b2;</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mrow><mml:mi>R</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">]</mml:mo></mml:mrow>
</mml:math>
</inline-formula> when &#x3bb; &gt; 0 (<xref ref-type="bibr" rid="B25">van Wieringen, 2018</xref>).</p>
<p>Another interesting feature of RR appears when considering the MSE. In general, for any estimator of a parameter <italic>&#x3b8;</italic>, the mean squared test error can be decomposed following <inline-formula>
<mml:math display="inline" id="M16">
<mml:mrow><mml:mi>MSE</mml:mi><mml:mo stretchy="false">[</mml:mo><mml:mover accent="true"><mml:mi>&#x3b8;</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mo stretchy="false">]</mml:mo><mml:mo>=</mml:mo><mml:mi>VAR</mml:mi><mml:mo stretchy="false">[</mml:mo><mml:mover accent="true"><mml:mi>&#x3b8;</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mo stretchy="false">]</mml:mo><mml:mtext>&#xa0;</mml:mtext><mml:mo>+</mml:mo><mml:mtext>&#xa0;</mml:mtext><mml:mi>BIAS</mml:mi><mml:msup><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mover accent="true"><mml:mi>&#x3b8;</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mo stretchy="false">]</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow>
</mml:math>
</inline-formula> (<xref ref-type="bibr" rid="B11">Hastie et al., 2009</xref>). The bias&#x2013;variance decomposition is a way of analyzing the expected test error of a learning algorithm with respect to a particular problem. In order to minimize the test error, a model that simultaneously achieves low variance and low bias needs to be selected. The variance refers to the amount by which <italic>&#x3b8;</italic> would change if it were estimated using other training datasets. Ideally, the estimate of <italic>&#x3b8;</italic> should vary as little as possible. Bias represents the error that is the result of approximating a complex problem with a simpler model. Generally, more flexible methods result in less bias, but also lead to higher variance. Hence, there is a bias&#x2013;variance trade-off that needs to be optimized using the test data. For data with an orthonormal design matrix, i.e., <italic>X<sup>T</sup></italic> <italic>X = I<sub>p</sub></italic> <italic>= (X<sup>T</sup></italic> <italic>X)<sup>-1</sup></italic> and <italic>n = p</italic>, it can be mathematically shown that there is a value of &#x3bb; &gt; 0 where<inline-formula>
<mml:math display="inline" id="M17">
<mml:mrow><mml:mi>MSE</mml:mi><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>&#x3b2;</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mrow><mml:mi>R</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">]</mml:mo><mml:mo>&lt;</mml:mo><mml:mi>MSE</mml:mi><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>&#x3b2;</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mrow><mml:mi>O</mml:mi><mml:mi>L</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">]</mml:mo></mml:mrow>
</mml:math>
</inline-formula> (<xref ref-type="bibr" rid="B23">Theobald, 1974</xref>; <xref ref-type="bibr" rid="B5">Farebrother, 1976</xref>).</p>
<p>RR can be written as an optimization problem <inline-formula>
<mml:math display="inline" id="M18">
<mml:mrow><mml:mi>min</mml:mi><mml:mo>{</mml:mo><mml:mo>|</mml:mo><mml:mi>y</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>X</mml:mi><mml:mi>&#x3b2;</mml:mi><mml:msubsup><mml:mo>|</mml:mo><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x3bb;</mml:mi><mml:mo>|</mml:mo><mml:mi>&#x3b2;</mml:mi><mml:msubsup><mml:mo>|</mml:mo><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo>}</mml:mo></mml:mrow>
</mml:math>
</inline-formula>, where <inline-formula>
<mml:math display="inline" id="M19">
<mml:mrow><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:mo>&#x22c5;</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mo>|</mml:mo><mml:mn>2</mml:mn></mml:msub></mml:mrow>
</mml:math>
</inline-formula> denotes the Euclidean <italic>&#x2113;</italic><sub>2</sub>-norm. The first term is the loss function and the second term the penalty. By changing the penalty into an <italic>&#x2113;</italic><sub>1</sub>-norm, we end up with <inline-formula>
<mml:math display="inline" id="M20">
<mml:mrow><mml:mi>min</mml:mi><mml:mo>{</mml:mo><mml:mo>|</mml:mo><mml:mi>y</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>X</mml:mi><mml:mi>&#x3b2;</mml:mi><mml:msubsup><mml:mo>|</mml:mo><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x3bb;</mml:mi><mml:mo>|</mml:mo><mml:mi>&#x3b2;</mml:mi><mml:msub><mml:mo>|</mml:mo><mml:mn>1</mml:mn></mml:msub><mml:mo>}</mml:mo></mml:mrow>
</mml:math>
</inline-formula> which is also known as the LASSO (<xref ref-type="bibr" rid="B24">Tibshirani, 1996</xref>). In contrast to RR, the LASSO sets regression coefficients to zero and therefore performs variable selection. In general, the LASSO will perform better than RR when a relatively small number of predictors (markers) have relatively large effects on the response (phenotype). Compared with OLS, the LASSO and also RR can yield a reduction in variance at the expense of some increase in bias, and consequently generate lower MSE and better prediction accuracy (<xref ref-type="bibr" rid="B11">Hastie et al., 2009</xref>). Unfortunately, minimization of the LASSO problem does not provide an estimate of the error variance, because it depends on a complex relationship between the signal-to-noise ratio (i.e., the heritability) and the sparsity pattern (i.e., number of QTLs in relation to number of markers). In general, it is notoriously difficult to obtain proper error variance estimates with regularization methods in the <italic>p</italic> &gt; <italic>n</italic> situation, because of the biased estimates and the difficulty in calculating correct degrees of freedom (<xref ref-type="bibr" rid="B21">Reid et al., 2016</xref>). The LASSO has been extended in many directions (<xref ref-type="bibr" rid="B26">Vidaurre et al., 2013</xref>; <xref ref-type="bibr" rid="B12">Hastie et al., 2015</xref>). Among the most interesting variants is the adaptive LASSO (ALASSO), where a pre-calculated vector <italic>w</italic> is used to weight the coefficients differently in the penalty, i.e., <inline-formula>
<mml:math display="inline" id="M21">
<mml:mrow><mml:mi>min</mml:mi><mml:mo>{</mml:mo><mml:mo>|</mml:mo><mml:mi>y</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>X</mml:mi><mml:mi>&#x3b2;</mml:mi><mml:mo>|</mml:mo><mml:msubsup><mml:mo>|</mml:mo><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x3bb;</mml:mi><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:mi>w</mml:mi><mml:mi>&#x3b2;</mml:mi><mml:mo>|</mml:mo><mml:msub><mml:mo>|</mml:mo><mml:mn>1</mml:mn></mml:msub><mml:mo>}</mml:mo></mml:mrow>
</mml:math>
</inline-formula> (<xref ref-type="bibr" rid="B29">Zou, 2006</xref>). The weights can be calculated as the absolute values of marginal covariances between the markers and the phenotype. The bias introduced by the shrinkage of <inline-formula>
<mml:math display="inline" id="M22">
<mml:mover accent="true"><mml:mi>&#x3b2;</mml:mi><mml:mo>^</mml:mo></mml:mover>
</mml:math>
</inline-formula> in RR and LASSO is reduced in ALASSO at the expense of an increase in variance (<xref ref-type="bibr" rid="B8">Giraud, 2015</xref>). The LASSO and ALASSO have shown competitive prediction performance compared with a range of other methods in comparative genomic prediction studies (<xref ref-type="bibr" rid="B16">Li and Sillanp&#xe4;&#xe4;, 2012</xref>; <xref ref-type="bibr" rid="B18">Momen et al., 2018</xref>).</p>
</sec>
<sec id="s3">
<title>Model Selection</title>
<p>In order to determine the best model, it is important to find a good measure of the lowest test error, because the training error will decrease when more variables or parameters are added to the model. There are a number of approaches (e.g., Mallows&#x2019; <italic>C<sub>P</sub></italic>, AIC and BIC) that attempt to correct the training RSS for model size. However, their use as model selection criteria in regularized models with <italic>p</italic> &gt; <italic>n</italic> data is questionable, since they rely on asymptotic theory, for example that it is possible to obtain correct degrees of freedom and unbiased error variance estimates. In an application of RR to genomic marker data, <xref ref-type="bibr" rid="B28">Whittaker et al. (2000)</xref> suggest optimizing <italic>&#x3bb;</italic> by minimizing <inline-formula>
<mml:math display="inline" id="M23">
<mml:mrow><mml:mi>MSE</mml:mi><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>&#x3b2;</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mrow><mml:mi>R</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">]</mml:mo><mml:mo>=</mml:mo><mml:mi>RSS</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>n</mml:mi><mml:msubsup><mml:mi mathvariant="normal">&#x3c3;</mml:mi><mml:mi>e</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi>&#x3c3;</mml:mi><mml:mi>e</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mi>t</mml:mi><mml:mi>r</mml:mi><mml:mo stretchy="false">[</mml:mo><mml:msup><mml:mi>X</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi>X</mml:mi><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>X</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi>X</mml:mi><mml:mo>+</mml:mo><mml:mi>&#x3bb;</mml:mi><mml:mi>I</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo stretchy="false">]</mml:mo></mml:mrow>
</mml:math>
</inline-formula>, which is a variant of Mallows&#x2019; <italic>C<sub>P</sub></italic>.</p>
<p>An alternative approach is to use cross-validation (CV). There are several variants of CV, but the general idea is to average MSE over some sets of hold-out test data (<xref ref-type="bibr" rid="B11">Hastie et al., 2009</xref>). In quantitative genetics, it is common to use the Pearson correlation coefficient, <italic>r</italic>, as a model selection criterion, both with and without CV (<xref ref-type="bibr" rid="B10">Gonz&#xe1;lez-Recio et al., 2014</xref>). <xref ref-type="bibr" rid="B2">Daetwyler et al. (2008)</xref> suggest to use the expected predictive correlation accuracy:</p>
<disp-formula>
<label>(2)</label>
<mml:math display="block" id="M24">
<mml:mrow><mml:msup><mml:mi>r</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>COV</mml:mi><mml:mo stretchy="false">[</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mo stretchy="false">]</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>/</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>VAR</mml:mi><mml:mo stretchy="false">[</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">]</mml:mo><mml:mi>VAR</mml:mi><mml:mo stretchy="false">[</mml:mo><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mo stretchy="false">]</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow>
</mml:math>
</disp-formula>
<p>for model evaluation in genome-enabled prediction. The use of <italic>r</italic><sup>2</sup> for model comparison has been questioned, see for example <xref ref-type="bibr" rid="B7">Gianola and Sch&#xf6;n (2016)</xref>. Based on the regularization theory above, it is evident that there are potential problems with <italic>r<sup>2</sup></italic> because VAR[<italic>y</italic>] will be unaffected, whereas VAR[&#x177;] will be heavily influenced by the type of model and level of regularization.</p>
<p>It is also possible to assess the goodness of fit of the models using the coefficient of determination <italic>R<sup>2</sup></italic>. <xref ref-type="bibr" rid="B15">Kv&#xe5;lseth (1985)</xref> identifies eight different variants of this statistic and compares them for different types of models. For linear OLS regression models with an intercept term, the problem seems to be of a minor nature, since the majority of the <italic>R<sup>2</sup></italic> statistics are equivalent. However, for other types of models, such as linear models without intercepts or nonlinear models, the various <italic>R<sup>2</sup></italic> statistics generally yield different values. Although not examined by <xref ref-type="bibr" rid="B15">Kv&#xe5;lseth (1985)</xref>, the same problem applies to regularized models. <xref ref-type="bibr" rid="B15">Kv&#xe5;lseth (1985)</xref> concludes that the best coefficient to use is:</p>
<disp-formula>
<label>(3)</label>
<mml:math display="block" id="M25">
<mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mrow><mml:mstyle displaystyle="true"><mml:msubsup><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mstyle></mml:mrow><mml:mrow><mml:mstyle displaystyle="true"><mml:msubsup><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo>&#xaf;</mml:mo></mml:mover><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mstyle></mml:mrow></mml:mfrac></mml:mrow>
</mml:math>
</disp-formula>
</sec>
<sec id="s4">
<title>Illustration of the Problem With <italic>r</italic><sup>2</sup></title>
<p>In a recent publication (<xref ref-type="bibr" rid="B27">Waldmann et al., 2019</xref>), we presented a novel automatic adaptive LASSO (AUTALASSO) based on the alternating direction method of multipliers (ADMM) optimization algorithm. We also compared the ALASSO, LASSO, and RR on a simulated dataset using the glmnet software (<xref ref-type="bibr" rid="B6">Friedman et al., 2010</xref>). The original simulated data stem from the QTLMAS2010 workshop (<xref ref-type="bibr" rid="B22">Szyd&#x142;owski and Paczynska, 2011</xref>). The total number of individuals is 3,226, structured in a pedigree with five generations. The continuous quantitative trait was created from 37 QTLs, including nine controlled major genes and 28 random minor genes. The controlled QTLs included two pairs of epistatic genes with no individual effects, three maternally imprinted genes, and two additive major genes. The random genes were chosen among the simulated SNPs and their effects were sampled from a truncated normal distribution. In addition to these original data, one dominance locus, one over-dominance and one under-dominance loci were created and added to the phenotype (<xref ref-type="bibr" rid="B27">Waldmann et al., 2019</xref>). The narrow sense heritability was equal to 0.45. MAF cleaning was performed at the 0.01 level, resulting in a final sample of 9,723 SNPs. Data from individual 1 to 2,326 were used as training data and data from individual 2,327 to 3,226 as test (or validation) data. The regularization path in glmnet was run over 100 different <italic>&#x3bb;</italic>-values to estimate the smallest test MSE and largest test <italic>r<sup>2</sup></italic> and <italic>R<sup>2</sup></italic>.</p>
<p>In our previous paper (<xref ref-type="bibr" rid="B27">Waldmann et al., 2019</xref>), we estimated only MSE and <italic>r<sup>2</sup></italic> and therefore add <italic>R<sup>2</sup></italic> here. Application of the ALASSO, LASSO, and RR resulted in a test MSE of 64.52, 65.73, and 83.07, respectively. Hence, based on the MSE, it is clear that the ALASSO is the best model. The ALASSO is also favored in terms of <italic>R<sup>2</sup></italic>, which yields the results 0.449, 0.439, and 0.291, respectively. However, based on <italic>r<sup>2</sup></italic>, the LASSO is the best model, with an estimate of 0.460, compared with ALASSO and RR estimates of 0.455 and 0.300, respectively. Decomposing <italic>r<sup>2</sup></italic> into its parts reveals that the test VAR[<italic>y</italic>] is the same (117.2) for all three methods. However, VAR[&#x177;] differs between the models, increasing from 29.54 for RR to 36.41 for the LASSO and 48.17 for the ALASSO. The COV[<italic>y</italic>,&#x177;] also follows this pattern, but the proportions to VAR[&#x177;] differ. These results are summarized in <xref ref-type="table" rid="T1"><bold>Table 1</bold></xref>. Introduction of the weight factor in the ALASSO increases model complexity, which results in decreased model bias at the expense of increased variance. Most importantly, however, the test MSE is reduced. This is an example of the bias-variance trade-off that is fundamental in statistical learning, where <italic>r<sup>2</sup></italic> can provide estimates that may result in erroneous model decisions.</p>
<table-wrap id="T1" position="float">
<label>Table 1</label>
<caption>
<p>Mean squared error (MSE), predictive correlation accuracy (<italic>r<sup>2</sup></italic>), coefficient of determination (<italic>R<sup>2</sup></italic>), covariance between test phenotypes and predicted test phenotypes (COV[y, &#x177;]), and variance of predicted test phenotypes (VAR[&#x177;]) for ridge regression (RR), LASSO and adaptive LASSO (ALASSO), evaluated on the simulated QTLMAS2010 data.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top">Method</th>
<th valign="top">MSE</th>
<th valign="top"><italic>r<sup>2</sup></italic></th>
<th valign="top"><italic>R<sup>2</sup></italic></th>
<th valign="top">COV[&#x2005;<italic>y</italic>,&#x177;]</th>
<th valign="top">VAR[&#x177;]</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top">RR</td>
<td valign="top">83.07</td>
<td valign="top">0.300</td>
<td valign="top">0.291</td>
<td valign="top">32.22</td>
<td valign="top">29.54</td>
</tr>
<tr>
<td valign="top">LASSO</td>
<td valign="top">65.73</td>
<td valign="top">0.460</td>
<td valign="top">0.439</td>
<td valign="top">44.30</td>
<td valign="top">36.41</td>
</tr>
<tr>
<td valign="top">ALASSO</td>
<td valign="top">64.52</td>
<td valign="top">0.455</td>
<td valign="top">0.449</td>
<td valign="top">50.68</td>
<td valign="top">48.17</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Ranking of individuals in terms of breeding values and predicted phenotypes is important in breeding. The order of the 10 best individuals differs not only between the RR, LASSO and ALASSO, but also within each model when min MSE and max <italic>r<sup>2</sup></italic> are used for determination of the best model (<xref ref-type="table" rid="T2"><bold>Table 2</bold></xref>). How regularization and the variable selection properties of the LASSO and ALASSO affects the statistical properties of rank correlation measures (e.g. Spearman&#x2019;s and Kendall&#x2019;s rank correlation coefficients) is unclear because of the bias-variance trade-off and needs to be further investigated. For example, a rank correlation measure can be high even if the model is highly biased and therefore the rank statistic may work in the opposite direction of the MSE loss function which will lead to optimization conflicts. Hence, it would be necessary to use a model with a rank-based loss function.</p>
<table-wrap id="T2" position="float">
<label>Table 2</label>
<caption>
<p>Ranking of the 10 best individuals from the simulated QTLMAS2010 data based on <italic>&#x177;</italic> for RR, LASSO and ALASSO using min MSE and max predictive correlation accuarcy (<italic>r<sup>2</sup></italic>) as model selection measures.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top"/>
<th valign="top"/>
<th valign="top"/>
<th valign="top"/>
<th valign="top"/>
<th valign="top">Rank</th>
<th valign="top"/>
<th valign="top"/>
<th valign="top"/>
<th valign="top"/>
<th valign="top"/>
</tr>
</thead>
<tbody>
<tr>
<td valign="top">Metdod/selection statistic</td>
<td valign="top">1</td>
<td valign="top">2</td>
<td valign="top">3</td>
<td valign="top">4</td>
<td valign="top">5</td>
<td valign="top">6</td>
<td valign="top">7</td>
<td valign="top">8</td>
<td valign="top">9</td>
<td valign="top">10</td>
</tr>
<tr>
<td valign="top">RR/min[MSE]</td>
<td valign="top">2,586</td>
<td valign="top">2,772</td>
<td valign="top">2,977</td>
<td valign="top">3,050</td>
<td valign="top">3,195</td>
<td valign="top">3,056</td>
<td valign="top">2,756</td>
<td valign="top">2,738</td>
<td valign="top">2,821</td>
<td valign="top">3,184</td>
</tr>
<tr>
<td valign="top">RR/max[<italic>r<sup>2</sup></italic>]</td>
<td valign="top">2,586</td>
<td valign="top">2,772</td>
<td valign="top">3,195</td>
<td valign="top">2,977</td>
<td valign="top">3,050</td>
<td valign="top">3,184</td>
<td valign="top">2,589</td>
<td valign="top">2,821</td>
<td valign="top">2,756</td>
<td valign="top">2,738</td>
</tr>
<tr>
<td valign="top">LASSO/min[MSE]</td>
<td valign="top">2,967</td>
<td valign="top">2,820</td>
<td valign="top">2,586</td>
<td valign="top">2,809</td>
<td valign="top">3,050</td>
<td valign="top">2,977</td>
<td valign="top">3,195</td>
<td valign="top">2,582</td>
<td valign="top">2,688</td>
<td valign="top">2,765</td>
</tr>
<tr>
<td valign="top">LASSO/max[<italic>r<sup>2</sup></italic>]</td>
<td valign="top">2,967</td>
<td valign="top">2,820</td>
<td valign="top">2,809</td>
<td valign="top">2,688</td>
<td valign="top">2,582</td>
<td valign="top">2,586</td>
<td valign="top">3,195</td>
<td valign="top">3,050</td>
<td valign="top">2,977</td>
<td valign="top">2,972</td>
</tr>
<tr>
<td valign="top">ALASSO/min[MSE]</td>
<td valign="top">2,820</td>
<td valign="top">2,582</td>
<td valign="top">2,586</td>
<td valign="top">2,809</td>
<td valign="top">3,050</td>
<td valign="top">2,832</td>
<td valign="top">3,195</td>
<td valign="top">3,006</td>
<td valign="top">2,589</td>
<td valign="top">2,817</td>
</tr>
<tr>
<td valign="top">ALASSO/max[<italic>r<sup>2</sup></italic>]</td>
<td valign="top">2,820</td>
<td valign="top">2,582</td>
<td valign="top">2,809</td>
<td valign="top">2,586</td>
<td valign="top">3,050</td>
<td valign="top">3,195</td>
<td valign="top">2,832</td>
<td valign="top">3,006</td>
<td valign="top">2,817</td>
<td valign="top">2,972</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s5">
<title>Data Availability Statement</title>
<p>The simulated dataset QTLMAS2010ny012.zip can be found in <uri xlink:href="https://github.com/patwa67/AUTALASSO">https://github.com/patwa67/AUTALASSO</uri>.</p>
</sec>
<sec id="s6">
<title>Author Contributions</title>
<p>The author wrote, read and approved the final version of the manuscript.</p>
</sec>
<sec id="s7" sec-type="funding-information">
<title>Funding</title>
<p>Financial support was provided by the Beijer Laboratory for Animal Science, SLU, Uppsala.</p>
</sec>
<sec id="s8">
<title>Conflict of Interest</title>
<p>The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Casella</surname> <given-names>G.</given-names>
</name>
<name>
<surname>Berger</surname> <given-names>R. L.</given-names>
</name>
</person-group> (<year>2002</year>). <source>Statistical Inference</source>. <edition>2nd edn.</edition> <publisher-loc>Pacific Grove, CA</publisher-loc>: <publisher-name>Duxbury</publisher-name>.</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Daetwyler</surname> <given-names>H. D.</given-names>
</name>
<name>
<surname>Villanueva</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Woolliams</surname> <given-names>J. A.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Accuracy of predicting the genetic risk of disease using a genome-wide approach</article-title>. <source>PLoS One</source> <volume>3</volume>, <fpage>e3395</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0003395</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>de los Campos</surname> <given-names>G.</given-names>
</name>
<name>
<surname>Sorensen</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Gianola</surname> <given-names>D.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Genomic heritability: what is it</article-title>? <source>PLoS Genet.</source> <volume>11</volume>, <fpage>e1005048</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pgen.1005048</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fan</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Han</surname> <given-names>F.</given-names>
</name>
<name>
<surname>Liu</surname> <given-names>H.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Challenges of big data analysis</article-title>. <source>Nat. Sci. Rev.</source> <volume>1</volume>, <fpage>293</fpage>&#x2013;<lpage>314</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nsr/nwt032</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Farebrother</surname> <given-names>R. W.</given-names>
</name>
</person-group> (<year>1976</year>). <article-title>Further results on the mean square error of ridge regression</article-title>. <source>J. R. Stat. Soc. Series B</source> <volume>38</volume>, <fpage>248</fpage>&#x2013;<lpage>250</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.2517-6161.1976.tb01588.x</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Friedman</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Hastie</surname> <given-names>T.</given-names>
</name>
<name>
<surname>Tibshirani</surname> <given-names>R.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Regularization paths for generalized linear models <italic>via</italic> coordinate descent</article-title>. <source>J. Stat. Softw.</source> <volume>33</volume>, <fpage>1</fpage>&#x2013;<lpage>22</lpage>. doi: <pub-id pub-id-type="doi">10.18637/jss.v033.i01</pub-id>
</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gianola</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Sch&#xf6;n</surname> <given-names>C. C.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Cross-validation without doing cross-validation in genome-enabled prediction</article-title>. <source>G3</source> <volume>6</volume>, <fpage>3107</fpage>&#x2013;<lpage>3128</lpage>. doi: <pub-id pub-id-type="doi">10.1534/g3.116.033381</pub-id>
</citation>
</ref>
<ref id="B8">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Giraud</surname> <given-names>C.</given-names>
</name>
</person-group> (<year>2015</year>). <source>Introduction to High-Dimensional Statistics</source>. <edition>1st edn.</edition> <publisher-loc>Boca Raton, FL</publisher-loc>: <publisher-name>CRC Press</publisher-name>.</citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goddard</surname> <given-names>M.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Genomic selection: prediction of accuracy and maximisation of long term response</article-title>. <source>Genetica</source> <volume>136</volume>, <fpage>245</fpage>&#x2013;<lpage>257</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s10709-008-9308-0</pub-id>
</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gonz&#xe1;lez-Recio</surname> <given-names>O.</given-names>
</name>
<name>
<surname>Rosa</surname> <given-names>G.</given-names>
</name>
<name>
<surname>J. M.</surname>
</name>
<name>
<surname>Gianola</surname> <given-names>D.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Machine learning methods and predictive ability metrics for genome-wide prediction of complex traits</article-title>. <source>Livest. Sci.</source> <volume>166</volume>, <fpage>217</fpage>&#x2013;<lpage>231</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.livsci.2014.05.036</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hastie</surname> <given-names>T.</given-names>
</name>
<name>
<surname>Tibshirani</surname> <given-names>R.</given-names>
</name>
<name>
<surname>Friedman</surname> <given-names>J.</given-names>
</name>
</person-group> (<year>2009</year>). <source>The Elements of Statistical Learning</source>. <edition>2nd edn.</edition> <publisher-loc>New York</publisher-loc>: <publisher-name>Springer</publisher-name>. doi: <pub-id pub-id-type="doi">10.1007/978-0-387-84858-7</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hastie</surname> <given-names>T.</given-names>
</name>
<name>
<surname>Tibshirani</surname> <given-names>R.</given-names>
</name>
<name>
<surname>Wainwright</surname> <given-names>M.</given-names>
</name>
</person-group> (<year>2015</year>). <source>Statistical Learning with Sparsity: The Lasso and Generalizations</source>. <edition>1st edn.</edition> <publisher-loc>Boca Raton, FL</publisher-loc>: <publisher-name>CRC Press</publisher-name>. doi: <pub-id pub-id-type="doi">10.1201/b18401</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hoerl</surname> <given-names>A. E.</given-names>
</name>
<name>
<surname>Kennard</surname> <given-names>R. W.</given-names>
</name>
</person-group> (<year>1970</year>). <article-title>Ridge regression: Biased estimation for nonorthogonal problems</article-title>. <source>Technometrics</source> <volume>12</volume>, <fpage>55</fpage>&#x2013;<lpage>67</lpage>. doi: <pub-id pub-id-type="doi">10.1080/00401706.1970.10488634</pub-id>
</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Johnstone</surname> <given-names>I. M.</given-names>
</name>
<name>
<surname>Titterington</surname> <given-names>D. M.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Statistical challenges of high-dimensional data</article-title>. <source>Philos. Trans. R. Soc. A</source> <volume>367</volume>, <fpage>4237</fpage>&#x2013;<lpage>4253</lpage>. doi: <pub-id pub-id-type="doi">10.1098/rsta.2009.0159</pub-id>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kv&#xe5;lseth</surname> <given-names>T. O.</given-names>
</name>
</person-group> (<year>1985</year>). <article-title>Cautionary note about <italic>R</italic><sup>2</sup></article-title> <source>Am. Stat.</source> <volume>39</volume>, <fpage>279</fpage>&#x2013;<lpage>285</lpage>. doi: <pub-id pub-id-type="doi">10.1080/00031305.1985.10479448</pub-id>
</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname> <given-names>Z.</given-names>
</name>
<name>
<surname>Sillanp&#xe4;&#xe4;</surname> <given-names>M. J.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Overview of lasso-related penalized regression methods for quantitative trait mapping and genomic selection</article-title>. <source>Theor. Appl. Genet.</source> <volume>125</volume>, <fpage>419</fpage>&#x2013;<lpage>435</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s00122-012-1892-9</pub-id>
</citation>
</ref>
<ref id="B17">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Lynch</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Walsh</surname> <given-names>B.</given-names>
</name>
</person-group> (<year>1998</year>). <source>Genetics and Analysis of Quantitative Traits</source>. <publisher-loc>Sunderland, MA</publisher-loc>: <publisher-name>Sinauer</publisher-name>.</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Momen</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Ayatollahi Mehrgardi</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Sheikhi</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Kranis</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Tusell</surname> <given-names>L.</given-names>
</name>
<name>
<surname>Morota</surname> <given-names>G.</given-names>
</name>
<etal/>
</person-group>. (<year>2018</year>). <article-title>Predictive ability of genome-assisted statistical models under various forms of gene action</article-title>. <source>Sci. Rep.</source> <volume>8</volume>, <fpage>12309</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-018-30089-2</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Morota</surname> <given-names>G.</given-names>
</name>
<name>
<surname>Gianola</surname> <given-names>D.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Kernel-based whole-genome prediction of complex traits: a review</article-title>. <source>Front. Genet.</source> <volume>5</volume>, <fpage>363</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fgene.2014.00363</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Ravishanker</surname> <given-names>N.</given-names>
</name>
<name>
<surname>Dey</surname> <given-names>D. K.</given-names>
</name>
</person-group> (<year>2002</year>). <source>A First Course In Linear Model Theory</source>. <edition>1st edn.</edition> <publisher-loc>Boca Raton, FL</publisher-loc>: <publisher-name>Chapman and Hall</publisher-name>.</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Reid</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Tibshirani</surname> <given-names>R.</given-names>
</name>
<name>
<surname>Friedman</surname> <given-names>J.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>A study of error variance estimation in lasso regression</article-title>. <source>Stat. Sin.</source> <volume>26</volume>, <fpage>35</fpage>&#x2013;<lpage>67</lpage>. doi: <pub-id pub-id-type="doi">10.5705/ss.2014.042</pub-id>
</citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Szyd&#x142;owski</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Paczynska</surname> <given-names>P.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Qtlmas 2010: simulated dataset</article-title>. <source>BMC Proc.</source> <volume>5</volume> (<supplement>Suppl 3</supplement>), <fpage>S3</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1753-6561-5-S3-S3</pub-id>
</citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Theobald</surname> <given-names>C. M.</given-names>
</name>
</person-group> (<year>1974</year>). <article-title>Generalizations of mean square error applied to ridge regression</article-title>. <source>J. R. Stat. Soc. Series B</source> <volume>36</volume>, <fpage>103</fpage>&#x2013;<lpage>106</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.2517-6161.1974.tb00990.x</pub-id>
</citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tibshirani</surname> <given-names>R.</given-names>
</name>
</person-group> (<year>1996</year>). <article-title>Regression shrinkage and selection <italic>via the</italic> lasso</article-title>. <source>J. R. Stat. Soc. Series B</source> <volume>58</volume>, <fpage>267</fpage>&#x2013;<lpage>288</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.2517-6161.1996.tb02080.x</pub-id>
</citation>
</ref>
<ref id="B25">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>van Wieringen</surname> <given-names>W. N.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Lecture notes on ridge regression</article-title>. <source>arXiv.</source> <uri xlink:href="https://arxiv.org/pdf/1509.09169">https://arxiv.org/pdf/1509.09169</uri>
</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vidaurre</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Bielza</surname> <given-names>C.</given-names>
</name>
<name>
<surname>Larranaga</surname> <given-names>P.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>A survey of l1 regression</article-title>. <source>Int. Stat. Rev.</source> <volume>81</volume>, <fpage>361</fpage>&#x2013;<lpage>387</lpage>. doi: <pub-id pub-id-type="doi">10.1111/insr.12023</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Waldmann</surname> <given-names>P.</given-names>
</name>
<name>
<surname>Ferencakovic</surname> <given-names>M.</given-names>
</name>
<name>
<surname>M&#xe9;sz&#xe1;ros</surname> <given-names>G.</given-names>
</name>
<name>
<surname>Khayatzadeh</surname> <given-names>N.</given-names>
</name>
<name>
<surname>Curik</surname> <given-names>I.</given-names>
</name>
<name>
<surname>S&#xf6;lkner</surname> <given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Autalasso: an automatic adaptive lasso for genome-wide prediction</article-title>. <source>BMC Bioinforma.</source> <volume>20</volume>, <fpage>167</fpage>. doi: <pub-id pub-id-type="doi">10.1186/s12859-019-2743-3</pub-id>
</citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Whittaker</surname> <given-names>J. C.</given-names>
</name>
<name>
<surname>Thompson</surname> <given-names>R.</given-names>
</name>
<name>
<surname>Denham</surname> <given-names>M. C.</given-names>
</name>
</person-group> (<year>2000</year>). <article-title>Marker-assisted selection using ridge regression</article-title>. <source>Genet. Res.</source> <volume>75</volume>, <fpage>249</fpage>&#x2013;<lpage>252</lpage>. doi: <pub-id pub-id-type="doi">10.1017/S0016672399004462</pub-id>
</citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zou</surname> <given-names>H.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>The adaptive lasso and its oracle properties</article-title>. <source>J. American Stat. Assoc.</source> <volume>101</volume>, <fpage>1418</fpage>&#x2013;<lpage>1429</lpage>. doi: <pub-id pub-id-type="doi">10.1198/016214506000000735</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>