^{1}

^{2}

^{3}

^{4}

^{4}

^{5}

^{6}

^{7}

^{1}

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

The development of genomic selection (GS) methods has allowed plant breeding programs to select favorable lines using genomic data before performing field trials. Improvements in genotyping technology have yielded high-dimensional genomic marker data which can be difficult to incorporate into statistical models. In this paper, we investigated the utility of applying dimensionality reduction (DR) methods as a pre-processing step for GS methods. We compared five DR methods and studied the trend in the prediction accuracies of each method as a function of the number of features retained. The effect of DR methods was studied using three models that involved the main effects of line, environment, marker, and the genotype by environment interactions. The methods were applied on a real data set containing 315 lines phenotyped in nine environments with 26,817 markers each. Regardless of the DR method and prediction model used, only a fraction of features was sufficient to achieve maximum correlation. Our results underline the usefulness of DR methods as a key pre-processing step in GS models to improve computational efficiency in the face of ever-increasing size of genomic data.

Plant breeding techniques have led to significant gains in crop yields for many decades. Improvements to crops were made through phenotypic and pedigree data. The use of molecular markers is a relatively new technique for improving conventional breeding strategies. Most traits that have economic and agronomic importance are quantitative in nature and are controlled by multiple genes of small effect. The advent of high-throughput genotyping and the ever-reducing cost of Single Nucleotide Polymorphisms (SNP) assays brought forward the possibility of using dense SNP arrays for the prediction of phenotypic traits. Genomic selection (GS) proposed by

A major challenge of GS lies in estimating the large number of marker effects (

Environment and genotype by environment (G×E) interactions strongly impact the performance of lines from one environment to the next, and hence accounting for these effects could improve the performance of the prediction models (

Most of these methods deal with the high-dimensional aspect of genomic selection, also known as genomic prediction, through modern shrinkage procedures. Shrinkage methods perform dimensionality reduction as a part of the modeling process. This paper examines the utility of dimensionality reduction (DR) methods as a pre-processing step to genomic prediction. In “small

The primary objective of this work is to study the utility of implementing dimensionality reduction as a pre-processing step in GS. We employed five different DR methods and investigated their ability to improve GS models. Furthermore, we compared their relative reduction abilities and studied the trends in the prediction accuracy as a function of the number of markers retained from the original marker data. To answer these objectives, we created reduced data sets with an increasing number of markers using each DR method, performed genomic prediction for each size, and computed their respective prediction accuracy values. We hypothesize that the prediction accuracy values would plateau beyond a certain size. Any further increase in the number of markers in the input data set would not significantly improve and potentially even harm the prediction accuracy.

The rest of the paper is organized as follows. First, we present the different DR methods in the materials and methods section. For each method, we also describe their implementation in creating reduced marker data sets. Next, we describe the genomic prediction models and cross-validation schemes used, along with a description of the real data set. Following that, we present the results of the DR for each method, along with comparisons. Finally, we conclude with a discussion and future directions.

Traditional methods and approaches to data analysis prove unsuitable in the face of massive modern data sets. The need of the hour dictates the development of new statistical algorithms that can analyze these large data sets. The objective of these new algorithms is to work within reasonable constraints on computational resources and time while providing reliable solutions to research problems. With ever-increasing access to storage resources and a reduction in the cost of collecting additional features on observational units, the dimension of data sets is constantly increasing. For instance, the advent of high-throughput phenotyping and genotyping technologies in life sciences has led to massive data sets that present unprecedented storage and computational challenges.

High-dimensional data could be classified as “large ^{
n×d
}, obtaining a ‘compressed’ matrix that captures the most important and relevant information present in

It is well known that the Singular Value Decomposition (SVD) obtains the best rank-^{
n×d
} and without loss of generality, assume that _{
k
} takes ^{2}) time (

Dimensionality reduction methods can be categorized into three main approaches (

The feature extraction method yields a compressed matrix formed by computing the linear combinations of the original features. While this method has been shown to provide reliable approximations to the original data matrix for further computations, there is a prominent issue in working with a combination of features. The linear combinations may not be suitable to make statistical inferences about the original data scale, and there may be no sensible interpretation of the combinations themselves in specific applications. Given this drawback of the feature extraction method, the feature selection approach to dimensionality reduction presents itself as a more suitable choice. The feature selection method involves selecting a small subset of the original features to create a compressed features matrix and avoids the issues related to inference and interpretability. For this reason, we examined the feature selection approach in greater detail by investigating four feature selection based algorithms. Each of these methods presents a fundamentally different approach to feature selection.

Dimensionality reduction methods can also be categorized as deterministic or randomized based on how the lower-dimensional representation is derived. In deterministic methods, features are selected fixedly based on some property of the data, such as the singular values, as in the case of PCA. Features are also often selected based on model fit statistics such as Akaike information criterion (AIC) and Bayesian information criterion (BIC) as in the case of forward selection. Randomized algorithms were proposed as an alternative approach that reduce the computational resource requirement and provide faster solutions than deterministic methods (

In this paper, we focused only on the feature selection and feature extraction approaches to DR. We examined the ability of five methods to reduce the dimensionality of the predictor data set in GS. Specifically, we compared the random projection algorithm proposed by

Summary of the five dimensionality reduction methods used in this paper. The methods are categorized as feature selection/feature extraction and as randomized/deterministic.

In order to understand the need for a random sketching algorithm, let us consider the simple example of linear regression. Suppose we have a ^{
n×d
} full rank matrix of predictor variables and a response vector

Thus, we need the Gram matrix ^{2}) time and

Random sketching is a popular method to reduce the computational complexity of this problem. Instead of using the full data set (^{2}). Our two primary goals for a sketched solution are to ensure that the approximate solution is close to the original and that the computational time is significantly reduced. The careful construction of the sketching matrix

Random projection algorithms form one of the major classes of random sketching algorithms. Random projection algorithms “uniformize” the non-uniformity structure present by rotating the matrix to a basis where the uniform random sampling of features is nearly optimal. Random projection can be viewed as the process of dimensionality reduction to preserve the pairwise distances between observations.

Random projection produces matrices formed by a small number of linear combinations of all of the original features. The linear combinations are formed by pre-multiplying the features with a randomly generated coefficients matrix, hence the name “random” projection. The resulting compressed matrix can be used as a substitute in computations, thereby reducing the computational complexity of the problem at hand. Given below is a simple random projection algorithm (

• Consider an input matrix ^{
n×d
} with

• Construct a

• Obtain the sketched matrix

Schematic for a simple random projection where

Let us consider the example of performing the Singular Value Decomposition (SVD) of ^{2}) time. Instead, if we settle for an approximate SVD of ^{2}) time even with the simple algorithm presented above. This example illustrates the motivation for using random projection to reduce the dimensionality of large matrices.

Two questions come to mind quite naturally when looking at the above mentioned approach. How do we choose a suitable

• _{
ij
} ∼ _{
ij
} refers to the element in the

•

•

•

If computing the lower-dimension projection was so computationally expensive that not much performance improvement was gained on the whole, the whole exercise becomes futile. Thus, there has been significant research into building projection mappings that will efficiently implement the random projection. Sparsification was a popular tool to reduce the number of computations performed during projection, as seen with the Rademacher matrix and FJLT approaches. In this paper, we used the FJLT method, also known as the subsampled randomized Hadamard transform (SRHT), to implement the random projection method as it supports fast computation and requires a modest amount of storage compared to other methods. Further details about the FJLT method can be found in the

Like any feature extraction based DR method, the random projection method is based on the linear combinations of all the original features. The newly created features may not be interpretable in the data scale. Even worse, they may have no practical meaning at all. For this reason, feature selection is an attractive alternative approach to feature extraction as a means of dimensionality reduction. A subset of the original features is picked using various strategies in feature selection. This allows for a straightforward interpretation of results. In the next sections, we explore four feature selection algorithms, starting with the random sampling algorithm, the second approach to randomized sketching algorithms.

Random sampling is another randomized approach to forming lower-dimension approximation matrices. While random projection addresses the non-uniformity by “uniformizing” the structure through rotations, random sampling algorithms build an importance probability to address the non-uniformity. The random sampling approach involves sampling a small number of features that represent the whole set of features. Leverage scores can be interpreted as a measure of influence the data points have on related computations and hence can be viewed as a metric to define the non-uniformity in the original matrix. Given a matrix ^{
n×d
}, let _{
i
} = _{
i
}/

The computational bottleneck for using the importance probability distribution lies in its dependence on the computation of the orthogonal basis for the original input matrix. _{
i
} instead of computing the exact statistical leverage scores. Their contribution was a key development in the area of random sampling algorithms. We used their algorithm as the basis for implementing the random sampling algorithm in the genomic prediction problem.

We now describe the random sampling algorithm presented in ^{
n×d
} with _{
i
} of the columns of _{
i
} is the ^{2}
^{+} where ^{+} is the Moore-Penrose inverse. From this, we can redefine the statistical leverage scores as,

The computational complexity of calculating the leverage scores according to ^{+} and performing the matrix multiplication of ^{+}. We apply random projection to overcome both these complexities by performing the computations and finally obtaining the approximate leverage scores

Instead of computing ^{+}, we find a smaller matrix that approximates _{1}

While computing the product ^{+} takes ^{2}), the computation of _{1}) time. This is not efficient since _{1} > ^{+}. Then we can compute the approximate statistical leverage scores as

This result can be extended without loss of generality for _{1} and _{2} through simulation studies. They found that _{1} does not significantly impact the correlation between approximate and exact leverage scores but running time increases linearly with _{1}. On the other hand, the correlations between approximate and exact leverage scores increase rapidly with increasing _{2}, but _{2} does not impact running time. Thus, they concluded that a combination of small _{1} and large _{2} would result in high-quality approximations with a short run time.

Feature selection is also known as the column subset selection problem (CSSP) in matrix theory and linear algebra. _{
k
} is the best rank-k approximation obtained from SVD and ^{2}) is the number of columns in

The randomized algorithm gives a ‘near-optimal’ approximation of the matrix, but may not be computationally as efficient as the deterministic algorithm. _{
i
} follow a steep enough power-law decay, the deterministic algorithm performs equally or better than the randomized algorithm. Furthermore, suppose the leverage scores follow a steep power-law decay. In that case, the number of columns chosen by the deterministic algorithm is similar to or fewer than the randomized counterpart as proposed by

We now summarize the deterministic algorithm presented in

1. Compute the top-_{
k
} of

2. Calculate the leverage scores

3. Select

The deterministic sampling algorithm requires the implementation of SVD to compute the leverage scores. Hence, the time complexity of the algorithm is given by

The deterministic sampling algorithm implementation mimics the random sampling implementation with an ordered sampling approach instead of a randomized sampling, based on the leverage scores. In this work, we also evaluated clustering and penalized regression approaches for DR. Since these methods are well established and widely popular, we do not present them in great detail.

Clustering is the process of grouping a set of objects in such a way that objects in the same group are more similar to each other than to objects in different groups, called clusters. Grouping objects when the data are labeled is a trivial task and is often referred to as supervised classification (

The general scheme of clustering is to start with

Partitional clustering divides the set into non-overlapping subsets (clusters) such that each object is present only in one cluster. Typically, partitioning clusters produce clusters by optimizing some criterion function to produce optimal solutions (

Hierarchical clustering is the process of creating a set of nested clusters arranged into a tree or dendrogram structure. Hierarchical clustering does not require a determination of the number of clusters

Single-linkage is not the preferred metric due to its susceptibility to produce elongated clusters. Complete-linkage is avoided because of its inability to retain large clusters. Between average-linkage and Ward’s minimum variance method, there is no real distinguishing factor. Since Ward’s method can be compared easily to the objective function in k-means, we picked the Ward’s method as our metric of choice for the hierarchical clustering approach.

Hierarchical clustering creates a nested clustering structure, often represented by a dendrogram, which allows the user to create any number of clusters by choosing the appropriate height to cut the dendrogram. One of our objectives was to study the trend in prediction accuracy as a function of the input data sizes. Thus, we are interested in creating reduced data sets of different sizes. To obtain

Variable selection is the process of choosing a subset of the explanatory variables to explain a response variable. Variable selection helps in making models easier to interpret, reducing noise introduced by redundant variables, and reducing the size of the data set for faster computations. When the number of variables is very large, traditional subset selection methods such as “best” subset selection are computationally infeasible. Step-wise selection methods were proposed as an alternative to reduce the computational load. A major drawback of the traditional and step-wise subset selection methods is the discrete nature of the variable selection, i.e., the variables are either retained or discarded. This leads to unstable variable selection, where a small change in data can lead to large change in the subset selected (

Shrinkage methods were developed to address the shortcomings of the subset selection methods. These methods are also known as regularization or penalized methods. They work on the principle of imposing a constraint term that penalizes for model complexity. Shrinkage methods help in variable selection as well as improving the model’s prediction performance through the bias-variance trade-off. In other words, shrinkage methods may provide solutions that have lower variance and higher bias, but ultimately leading to better prediction accuracy according to the mean squared error (MSE).

In this paper, we investigate the shrinkage methods as a tool for variable selection. We use the coefficients of the predictors, obtained from the shrinkage methods, as a form of ranking of the variables for selection purposes.

Ridge regression uses all the _{2} penalty, but does not set any of them exactly equal to zero. Hence, none of the predictors are removed from the final model. On the other hand, LASSO is a shrinkage method that applies an _{1} penalty on the regression coefficients. Due to the nature of the _{1} penalty, LASSO performs both shrinkage and automatic variable selection (

The disadvantages presented by the LASSO algorithm, especially in context of genomic data makes it unsuitable for this study. Elastic net would be the ideal shrinkage algorithm for dimensionality reduction in practice. Unfortunately, it does not provide control on the number of variables selected in the final model. To answer all the objectives of the study, we needed fine control on the number of variables selected by each reduction method to help us compare the DR approaches to one another. Ridge regression performs shrinkage on the coefficients associated with the variables, but does not perform variable selection of any kind. Thus, we picked ridge regression as the shrinkage method of choice for this study.

In ridge regression, the penalty parameter has to be estimated separately. There are several methods for estimating the most appropriate penalty parameter

In this paper, we used five different DR methods as pre-processing step to genomic prediction models. We implemented the methods to reduce the dimensionality of the genomic data to help reduce the computational resource requirements such as memory, storage, and time. In this section, we present details about the implementation of each of the five DR methods on the genomic data to create data sets of differing sizes to evaluate the trends in prediction as the function of the dimensionality.

We used the

1. Compute projection matrices with a predefined number of columns

2. Multiply the projection matrix with original marker data to obtain the reduced matrix

3. Use the reduced matrix to compute the genetic relationship matrix as

4. Repeat steps 2–4 100 times to remove bias in the prediction results.

Implementation of the random projection algorithm for dimensionality reduction of the genomic data set in the genomic prediction problem.

We used the statistical leverage scores to calculate the importance probability distribution required for the random sampling algorithm. We followed the two-stage algorithm presented by

1. Compute the approximate leverage scores as defined in

2. Use the approximate leverage scores to define the importance sampling distribution for the columns of the input marker matrix

3. Randomly sample a predefined number of columns

4. Use the reduced matrix

5. Repeat steps 3–4 100 times to remove sampling bias from the prediction accuracy results.

The advantage of this random sampling algorithm is the computation of the approximate leverage scores instead of the exact scores, effectively reducing computation time.

This section describes our approach to applying dimensionality reduction to genomic data sets using clustering. The R package “

1. Perform agglomerative hierarchical clustering using “

2. Form

3. Sample one feature randomly from each of the

4. The sampled features form the reduced data set of size

5. Repeat the sampling from the clusters and the following model implementation 100 times to remove sampling bias in the prediction accuracy results.

The deterministic sampling algorithm implementation mimics the random sampling implementation without the randomized sampling based on the leverage scores. The deterministic sampling algorithm can be summarized as follows:

1. Compute the approximate leverage scores as defined in

2. Arrange the leverage scores in decreasing order.

3. Pick the top

4. Use these reduced matrices as the input for the genomic prediction models.

We used the “

1. Implement a ridge regression model with the standardized features corresponding to the marker information.

2. Use the coefficients estimated from the ridge regression as a measure of importance of the features.

3. Order the features by their respective regression coefficients.

4. Pick the top

All the methods were applied to a chickpea data set collected by the International Chickpea Screening Nursery (ICSN) of ICRISAT (

The original data set contained 315 lines phenotyped in nine environments, giving a total of 2,835 phenotypic yield observations. All of the 315 lines had corresponding genomic data with 26,817 markers each. After cleaning the data as described in the

In this work, we used the models proposed by

Let the phenotypic trait be represented by _{
ijk
} for the _{
i
} (_{
j
} (_{
j
} (_{
ij
} and the error term be represented as _{
ijk
} (_{
ij
}). The three models corresponding to the three scenarios mentioned above are given by:_{
g
} and _{
e
}, respectively. The genetic relationship matrix is

Using the dimensionality reduction methods, we reduce the size of the marker matrix (X) and thus the dimensionality reduction methods affect only the G + E (

Three different cross-validation schemes were implemented to assess the predictive ability of the models in different scenarios. These are scenarios that breeders might be interested in since these mimic the situations they face in their breeding programs. The performance of the prediction models was assessed by measuring the Pearson correlations (

We investigated five DR methods in this paper. We performed feature selection or feature extraction for each method to create reduced dimensional marker data sets of 26 different sizes based on the number of markers present. The number of markers ranged from 200 to 14,928 (the full marker data set). We set the number of markers as fixed across all methods to compare the dimensionality reduction methods and their prediction ability at each size. For the three randomized dimensionality reduction methods, 100 data sets were generated at each size, and prediction results from the 100 data sets were averaged and set as the prediction accuracy of that size. This was done to overcome any bias that the random selection process may have introduced. For each reduced data set, we implemented the three predictions models (

This study focused on two objectives. The primary objective was to evaluate the merit of using dimensionality reduction methods as a pre-processing step in genomic prediction. We used five dimensionality reduction methods to present dimensionality reduction as an effective pre-processing step in genomic prediction. We compared their reduction capabilities to find which methods work better for genomic prediction across different prediction models and cross-validation schemes. Second, we studied the trends in prediction accuracy of reduced data sets as a function of their size.

The results from these models are summarized in the plots in

Prediction accuracy for the seed yield trait of a chickpea population consisting of 306 genotypes tested in nine environments for the G + E model under the three cross-validation schemes (CV0, CV1, CV2) across 26 different genomic information sizes. Standard errors are depicted at each size.

Prediction accuracy for the seed yield trait of a chickpea population consisting of 306 genotypes tested in nine environments for the G + E + GxE model under the three cross-validation schemes (CV0, CV1, CV2) across 26 different genomic information sizes. Standard errors are depicted at each size.

Irrespective of the model and CV scheme, all dimensionality reduction methods required only a fraction of the total input markers to obtain maximum correlation. In addition, we observed a plateauing of correlation values as the number of markers selected increased for all methods. Thus, the number of markers required to achieve maximum correlation may be an inappropriate measure to evaluate the reduction capability of the method. Instead, we considered a 95% of the maximum correlation as our metric to evaluate the reduction methods. For instance, for the CV1 scheme in the G × E model, the random projection algorithm achieved the maximum correlation of 0.195 with all 14,928 input markers. However, the method achieved a 95% maximum correlation of 0.185 with just 3,000 input markers. This significantly reduces the number of input markers for similar correlation values. In fact, for all the dimensionality reduction methods, fewer than 40% of the input markers were required to achieve a 95% max correlation value. These results are summarized in

Number of markers selected by each dimensionality reduction (DR) method to obtain 95% of the highest correlation for the three prediction models (G + E + GxE, G + E, E + L) under the three cross-validation schemes (CV0, CV1, CV2).

Pred. Model | CV | DR Method | # Cols | Correlation |
---|---|---|---|---|

G + E + GxE | CV0 | Clustering | 200 | 0.105 |

DetSampling | 800 | 0.132 | ||

RanProj | 400 | 0.104 | ||

RanSampling | 400 | 0.104 | ||

Ridge | 200 | 0.127 | ||

CV1 | Clustering | 4,500 | 0.185 | |

DetSampling | 1,200 | 0.199 | ||

RanProj | 3,000 | 0.185 | ||

RanSampling | 4,000 | 0.189 | ||

Ridge | 3,000 | 0.202 | ||

CV2 | Clustering | 4,500 | 0.188 | |

DetSampling | 1,200 | 0.205 | ||

RanProj | 3,000 | 0.189 | ||

RanSampling | 3,000 | 0.191 | ||

Ridge | 2,500 | 0.197 | ||

G + E | CV0 | Clustering | 200 | 0.113 |

DetSampling | 600 | 0.131 | ||

RanProj | 400 | 0.112 | ||

RanSampling | 800 | 0.114 | ||

Ridge | 200 | 0.126 | ||

CV1 | Clustering | 6,000 | 0.101 | |

DetSampling | 800 | 0.123 | ||

RanProj | 1,600 | 0.098 | ||

RanSampling | 5,000 | 0.103 | ||

Ridge | 1,600 | 0.124 | ||

CV2 | Clustering | 200 | 0.155 | |

DetSampling | 1,000 | 0.134 | ||

RanProj | 2000 | 0.109 | ||

RanSampling | 1800 | 0.108 | ||

Ridge | 1,600 | 0.124 | ||

E + L | CV0 | 14,928 | 0.089 | |

CV1 | 14,928 | -0.098 | ||

CV2 | 14,928 | 0.045 |

Second, no one reduction method had the best reduction capability across all prediction models and CV combinations. For instance, for the CV2 scheme of the G × E model, deterministic sampling required only 1,200 markers in the input data to achieve 95% of the maximum correlation compared to the 4,500 required by clustering. On the other hand, for the CV2 scheme of the G + E model, clustering required only 200 input markers to attain 95% of the maximum correlation compared to the 1,000 required by deterministic sampling. Random projection and random sampling methods were very similar in terms of prediction accuracy values across all matrix sizes by model by cross-validation combinations. All the reduction methods had similar prediction accuracies within the model and CV combination, which reiterates the utility of dimensionality reduction regardless of the DR method used.

Modern plant breeding programs combine genomic information with phenotypic performance data to select favorable lines. Early genomic selection models included the line, environment, phenotypic and genomic information to predict the performance of lines. Genetic information, environmental factors, and their interactions affect complex traits such as yield. Hence, the development of models that allowed for this genotype by environment interactions improved the genomic prediction accuracy. The improvements in genotyping technology combined with the reducing cost have led to the generation of genomic data of enormous sizes that are often high-dimensional. While current genomic selection models can handle these high-dimensional data, there are questions about their efficiency. Further, including information on hundreds of thousands of potentially unrelated markers in the genomic prediction models could negatively impact the prediction accuracy of the trait of interest. Lastly, there is a computational resource cost that must be taken into account. Prediction models with larger input sizes require much greater computational resources to run, both in terms of hardware and time. We proposed dimensionality reduction as a mechanism to address all of these concerns.

In this study, we used a chickpea data set. Chickpea is the second largest produced food legume crop in the world (

The key contribution of this work was to propose using dimensionality reduction in genomic prediction analyses and show its utility using a hand-picked subset of all available methods. For example, we explored the possibility of using randomized algorithms for dimensionality reduction with the help of primitive implementations. Several more sophisticated randomized algorithms could improve dimensionality reduction, which can be explored in future works. Our results act as a proof of concept that future researchers can use to explore various dimensionality reduction methods and identify the best method for their breeding data. Our results clearly indicate the need to integrate dimensionality reduction methods into genomic selection to reduce computational resource requirements, improve the prediction process, and select the best performing lines in any breeding program.

The original contributions presented in the study are included in the article/

VM, RV, DJ, and RH contributed to the conception and design of the study. RV designed and executed the field and genotyping experiments. VM wrote the code for the methods and prediction models. VM, DJ, and RH were responsible for interpretation of the results. VM wrote the first draft of the manuscript. DJ and RH contributed to revisions. All authors edited, reviewed, and approved the submitted version.

VM was partially supported by the Development of Waxy Sorghum Lines Grant from the Department of Agriculture - NIFA (UNR 19-58). DJ was supported by the Agriculture and Food Research Initiative Grant (NEB-21-176) from the USDA National Institute of Food and Agriculture, Plant Health and Production and Plant Products: Plant Breeding for Agricultural Production, A1211, Accession No. 1015252.

This research was done using resources provided by the Open Science Grid (

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

The Supplementary Material for this article can be found online at:

_{2}regression and applications