Stock-Index Tracking Optimization Using Auto-Encoders

Deep learning algorithms' powerful capabilities for extracting useful latent information give them the potential to outperform traditional financial models in solving problems of the stock market which is a complex system. In this paper, we explore the use of advanced deep learning algorithms for stock-index tracking. We partially replicate the CSI 300 Index by optimizing with respect to the difference between the returns of the tracking portfolio and the target index. We extract the complex non-linear relationship between index constituents and select a subset of constituents to construct a dynamic tracking portfolio by six well-known auto-encoders (single-hidden-layer undercomplete, sparse, contractive, stacked, denoising, and variational auto-encoders) that have been widely used in contexts other than stock-index tracking. Empirical results show that the auto-encoder-based strategies perform better than conventional ones when the tracking portfolio is constructed with a small number of stocks. Furthermore, strategies based on auto-encoders capable of learning high-capacity encodings of the input, such as sparse and denoising auto-encoders, have even better tracking performance. Our findings offer evidence that deep learning algorithms with explicitly designed hierarchical architectures are suitable for index tracking problems.


INTRODUCTION
The market index system has evolved with the development of the securities market. Financial products such as index funds, index futures, and index options emerge endlessly, indicating that indexing investment has won the favor of investors, especially institutional investors. Traditional investment based on the analysis of timing and stock fundamentals is an actively managed strategy, whereas indexing investment is passively managed. By constructing a portfolio to track a market index, investors expect to obtain the same return and volatility as the target index, with relatively lower risk and management cost, as well as better liquidity. The choice of how to construct a tracking portfolio (i.e., of an index tracking method) is crucial for the management of index funds, for hedging or arbitrage through index financial derivatives such as index futures, and for maximizing the performance of index investment generally. At the present time, the tracking methods utilized with stock index funds are fairly homogeneous but the tracking errors differ significantly. Therefore, there is great value in attempting to improve index tracking technology. In recent years, the rapid development of computer technologies and the discipline of quantitative finance especially make it possible to propose more effective index tracking methods.
The many index tracking strategies that have been put forward in theory and practice can be divided into full replication strategies and optimization strategies [1]. In the full replication method, all the constituent securities of the target index are purchased and allocated the same weights that they have in the index. Although full replication is easy to manage and operate and is highly consistent with the target index, it has many unavoidable defects. Its large portfolio size brings high transaction costs and large tracking errors [2]; some of the constituent securities may not be traded due to liquidity problems; the adverse effects of individual securities cannot be avoided; etc. In the optimization method, the historical data of the components are analyzed and a suitable number of assets for inclusion in the tracking portfolio are selected with the help of advanced algorithms. Thus, fewer securities are required to achieve the purpose of indexing investment [3]. Compared with full replication, the optimization method can significantly reduce management costs and increase tracking efficiency, advantages which have made it the focus of much current academic research.
Among the most widely applied approaches for selecting a subset of constituent stocks are market-value ranking, weight ranking, liquidity ranking [4], correlation coefficient ranking, random sampling, stratified sampling [5], and genetic algorithms [6]. However, these established stock selection approaches fail to collect and utilize adequately historical information about constituent stocks, target indexes, and the correlations between them. Therefore, it is necessary to develop new techniques.
The goal of index tracking is to make the return of the tracking portfolio as close as possible to the return of the target index. There are two main indicators used to evaluate the performance of index tracking: the standard deviation of the difference between the return of the tracking portfolio and that of the benchmark index [7] and the square root of the second-order moment of the difference [8]. There are also other, less common metrics for measuring tracking errors, such as Mean Absolute Deviation (MAD), Maximum Absolute Deviation (Max), Mean Absolute Downside Deviation (MADD), and Downside Maximum Absolute Deviation (DMax) [8]. The objective function can be constructed by minimizing one of the tracking errors defined above; the weight allocations of the tracking portfolio can then be obtained. When the tracking error is defined as the square root of second-order moment of the return difference, minimizing it requires a quadric programming model, and therefore its optimal solution can be found by best linear unbiased estimation (BLUE) [9], a standard econometric method. We will use this model to construct a tracking portfolio.
Since Markowitz [10] first proposed the mean-variance model, the measurement of index tracking errors and optimal replication methods have generated an extensive literature. For example, Roll [11] studies partial replication of the index by optimizing with respect to the volatility of the tracking error based on Markowitz's mean-variance model. Ammann and Tobler [12] present four suitable decompositions of tracking error variance. Dunis and Ho [13] introduce the concept of co-integration into the problem of index tracking optimization and obtain good tracking performance. Chiam et al. [14] build a multi-objective evolutionary system that can simultaneously optimize tracking performance and transaction cost to track the index. Filippi et al. [15] focuses on the problem of index tracking with consideration of the expected excess return, using a bi-objective approach.
Machine learning algorithms have made dramatic progress over the past four decades, and applications for them have been found in various disciplines, including financial asset management. The tools of machine learning have notable advantages in solving asset management problems. Asset managers can use machine learning techniques to identify underlying assets by discovering new patterns in a complex system and immediately make investment decisions based these insights. Further, machine learning algorithms enable new forms of data, such as data in graphic and sound formats, to be used as input to models, helping investment managers better analyze the market trend. In addition, machine learning algorithms may also reduce the negative impact of human subjective biases on investment decisions. Consequently, a growing body of research takes advantage of machine learning algorithms to study asset management or index tracking. Focardi and Fabozzi [16] propose to use clustering for constructing index tracking portfolios. They cluster co-integrated stocks based on Euclidean distances between stock price series and select one stock from each cluster to include in the tracking portfolio. Yang et al. [17] study the index-tracking problem by applying a support-vector machine model. Their empirical results show the model performs robustly on tracking the Hang Seng Index (HSI). Jeurissen and Berg [18] use a hybrid genetic algorithm, where each chromosome represents a subset of the stocks, to address the problem of stock index tracking by partial replication. A backpropagationbased neural network has been built by Zorin and Borisov [19] to form full replication of the stock index (although the tracking performance is not as good as expected). Fernández and Gómez [20] propose a heuristic solution for the portfolio selection problem based on the Hopfield network, but their results demonstrate no superiority over other heuristic models. By analyzing data from the Brazilian stock market, Freitas et al. [21] find a neural network model that outperforms the Markowitz's mean-variance model in portfolio optimization. Chen et al. [22] propose a flexible neural tree ensemble model to predict the NASDAQ-100 and S&P CNX NIFTY stock indexes, achieving reliable forecast performance. Wu et al. [23] use the non-negative-lasso method to fit and predict the CSI 300 Index with short-selling constraints; the results indicate that nonnegative lasso can achieve a small tracking error.
Recently, with the rapid development of deep-learning technology, methods based on artificial intelligence have enjoyed unprecedented popularity [24]. One approach involves applying deep learning algorithms to the problem of index replication since the stock market is a complex system. A portfolio construction approach based on deep learning is first proposed in academia by Heaton et al. [25]. Ouyang et al. [26] have subsequently expanded this framework by including a dynamic asset-weight calculation method and implemented this model to track the HSI. However, their optimized asset weights may become negative, contrary to traditional asset allocation implementations. In order to accomplish partial replication, both Heaton et al. [25] and Ouyang et al. [26] select stocks by measuring the Euclidean distance between the original returns and the reconstructed returns of the index components using auto-encoders, which are the core elements of their frameworks.
Kim and Kim [27] argue that such an asset selection criterion is artificial. They modify it by constructing an auto-encoder in such a way that the deepest hidden layer has only one node (a proxy for the market index) and measuring the similarity of this latent representation to individual asset returns. We disagree with this approach. If an auto-encoder uses non-linear activation functions, then the deepest latent representations are non-linear combinations of the input original asset returns and capture some complex abstract features of the market index. Although these features can represent the market index, it is generally difficult to find their corresponding economic meanings. The candidate asset returns' similarities to these abstract features are not equal or even related to their similarities to the target index returns. A selection criterion based on this measure would therefore seem to be meaningless. Moreover, the extremely contractive structure of the auto-encoder with a single-node deepest latent layer may result in excessive loss of input information. None of the above three papers [25][26][27] suggests that the index tracking approach based on deep learning algorithms can outperform traditional index tracking techniques. Evidence is needed that deep learning is sufficiently advanced to handle index tracking problems. Moreover, various auto-encoders with more complex structures and better properties have been developed; it is reasonable to ask whether they can improve the performance of stock selection.
Based on the framework proposed by Heaton et al. [25], this paper investigates the applications of various auto-encoder deep-learning architectures in selecting representative stocks from the index constituents. The stocks are also selected by measuring the Euclidean distance between the original returns and the reconstructed ones. We then build dynamic tracking portfolios with the selected stocks to partially replicate the return of the index and evaluate their tracking performances. This article differs from Heaton et al. [25] and other related papers in several respects. First, we examine the effectiveness not only of the single-hidden-layer undercomplete auto-encoder but also of five other auto-encoders widely used in academe and industry, including the stacked auto-encoder and the denoising auto-encoder. Second, we propose a method for constructing dynamic tracking portfolios. The weights of the stocks in the tracking portfolio are calculated and adjusted periodically. This is more feasible and appropriate for practical indexing investment than what is done in other deep-learning methods. Third, we introduce two conventional stock selection strategies (weight ranking and market-value ranking) in addition to the strategies implemented by auto-encoders. The tracking performances of all these strategies in selecting various numbers of stocks are contrasted to confirm the advantages of applying auto-encoders.
The rest of the paper is organized as follows: section Methodology outlines the related algorithms and how they will be implemented. Section Empirical Analysis details our experimental setups for index tracking and presents the empirical results and discussion. Section Conclusions concludes the paper.

METHODOLOGY Stock Selection Using Auto-Encoders
Auto-encoders are a special case of feedforward neural networks [28]. They are generally used for dimensionality reduction and feature extraction. Recently, they have also been employed as generative models to produce, for example, pictures. Unlike other feedforward neural networks, auto-encoders use unsupervised learning; their task is to copy the input to the output. An autoencoder is composed of an encoder and a decoder. In Figure 1, x represents the input data; f (x) represents the encoder, forming a hidden layer h that discovers some latent state representation of the input; and g(h) = g(f (x)) represents the decoder, which produces a reconstruction x ′ . In general, the learning process of an auto-encoder can be described as minimizing the reconstruction error L(x, g(f (x))), which is defined as the difference between x and x ′ . The output of an auto-encoder is worthless if it is simply a copy of the input. Auto-encoders are prevented from replicating the input completely by imposing constraints on the hidden layers, such as limiting the number of hidden units and adding regularizers, so that latent attributes of the input data can be learned and described.
A common way to obtain useful features from an autoencoder is to require the dimension of h to be smaller than x. An auto-encoder with this bottleneck structure is called an undercomplete auto-encoder. Consider first a single-hiddenlayer undercomplete auto-encoder that contains one hidden layer with five neurons, consistent with Heaton et al. [25]. Its architecture is shown in Figure 2. Given a training batch D = {x (1) , x (2) , . . . , x (m) } containing m samples, the input of a single-hidden-layer undercomplete auto-encoder is x = [x 1 , x 2 , . . . , x n ] ⊤ ∈ R n , a vector representing n index component stock returns on a certain trading day. Similarly, the output is The input x is mapped to h which is a vector of hidden units through the encoder. The subsequent decoder maps h to the output vector x ′ to reconstruct x. The two steps can be written where W 1 , W 2 represent the weights of a linear transformation; b 1 , b 2 are the biases; and f (·) is an activation function. Frequently used activation functions are sigmoid (1/(1 + e −x )) [29,30], hyperbolic tangent (tanh(x)) [31], or rectified linear units (ReLU) (max{0, x}) [32][33][34]. In this paper, f (·) is set to be a ReLU function, because ReLU solves the gradient vanishing problem (in the positive interval) with a high speed of convergence and calculation compared to other activation functions. When the activation functions are linear and the loss function is the mean squared error, the action of the single-hiddenlayer undercomplete auto-encoder is equivalent to Principal Component Analysis (PCA) [35]. In addition, we do linear transformation other than use non-linear activation functions on the output layer to make the output zero-centered. The characteristics of the output are thereby kept consistent with the input data. The network of the single-hidden-layer undercomplete autoencoder is trained by minimize the reconstruction error L(x, x ′ ), i.e., the two-norm difference between the input vector and the output vector: Back-propagation is used for the solution of Equation (3), with the popular gradient descent optimization algorithm called Adaptive Moment Estimation (Adam) [36]. (Unless otherwise stated, in the constructions of other auto-encoder models in this paper, the designs of the input and output vectors, the activation functions of the hidden layers, the loss functions, and the parameter-optimization algorithms are consistent with those of the single-hidden-layer undercomplete auto-encoder). We already know that undercomplete auto-encoders can learn the most significant features of data distribution. However, if these auto-encoders are given too much capacity, they cannot learn any useful information. Regularized auto-encoders can solve this problem by imposing particular forms of regularization on the networks in order to encourage the models to have better generalization abilities rather than limiting their capacity. Sparse auto-encoders [37,38] are a common kind of regularized auto-encoders. A sparse auto-encoder suppresses the activation of most neurons in the hidden layer by adding a sparsity penalty in the loss function, thereby providing another method of knowledge compression without reducing the number of nodes in the hidden layer. The architecture of the sparse auto-encoder applied in this paper is shown in Figure 3. The hidden layer has the same dimension as the input and output layers. The light-colored circles in the hidden layer represent suppressed neurons, while the darkcolored circles represent activated neurons. Since the activation of neurons is data-driven, the sparse auto-encoder can obtain specific feature representations for different input data. The network's capacity is limited to prevent excessive memorizing of input data, while the capacity to extract data features is not limited. There are two common ways of constructing the sparsity penalty: L1 regularization [39] and Kullback-Leibler (KL) divergence [40]. In this paper, we use L1 regularization. The loss function for training our sparse auto-encoder is given by where the second term penalizes the output value of the hidden layer, scaled by a tuning parameter λ.
We also consider another regularized auto-encoder, the contractive auto-encoder [41], which is designed to make the learned feature representation insensitive to small changes around the training examples. This is accomplished by penalizing instances where a small change in the input results in a large change in the encoding space. Thus, the loss function is where the penalty term is the squared Frobenius norm (sum of squared elements) of the Jacobian matrix for the hidden layer outputs with respect to the input observations. Although the contractive auto-encoder regularization criterion is trivial to calculate in the case of a single hidden layer auto-encoder, it becomes much more difficult in the case of deeper auto-encoders. Therefore, the contractive auto-encoder used in this paper adopts the same structure as the single-hidden-layer undercomplete auto-encoder mentioned above. Since we employ ReLU as the activation function on the hidden layer, the regularization criterion can be given the following analytical form: Auto-encoders are not required to be composed of a singlelayer encoder and a single-layer decoder. In fact, deep autoencoders yield much better compression than corresponding shallow auto-encoders [42]. The general method for training a deep auto-encoder consists of training a stack of shallow auto-encoders so as to pretrain the deep architecture. For this reason, deep autoencoders are also called stacked autoencoders. The stacked auto-encoder employed in this paper is built with the structure shown in Figure 4, where the numbers of hidden layers and neurons in each layer are set by trial and error. Till now, the input and output of the auto-encoders we have introduced are identical. Such models may not perform well on a testing set where the testing and training data do not exhibit the same distribution. The denoising auto-encoder [43] provides remedies for this deficiency. Denoising autoencoders receive as input data that have been corrupted by some form of noise, and are trained to reconstruct the uncorrupted data as their output. After denoising training, the network is forced to learn more robust invariant features and obtain more effective representations of the input. This is very similar to a contractive auto-encoder in the sense that the noise is considered a series of small perturbations to the input. The difference is that contractive auto-encoders make the feature extraction function resist small perturbations of the input, while denoising auto-encoders make the reconstruction function resist them [44]. The initial input can be corrupted by adding Gaussian noise or stochastically discarding certain features. The denoising auto-encoder employed in this paper is constructed with the same architecture as the stacked auto-encoder. The only difference is that the input is the corrupted datax, as shown in Figure 5, and given byx where N (0, I) represents a multivariate standard normal distribution with a diagonal covariance structure, and η denotes noise intensity. The loss function for the denoising auto-encoder still computes the twonorm difference between the output vector x ′ , and the original data x. The decoder networks built by the auto-encoders we have introduced above output a single value to describe each latent attribute. However, sometimes we hope to learn a probability distribution for each latent attribute to produce a better generalization and ensure that the latent space has properties that enable the generative process. This goal can be achieved by applying a well-known generative model, the variational auto-encoder [45,46]. The special structure of the variational auto-encoder designed for the purpose of this paper is shown in Figure 6. Its encoder outputs parameters describing a distribution for each dimension in the latent space.
Here we assume that the prior distribution p(h) of the latent representation obeys a standard normal distribution, and the encoder therefore outputs two vectors describing the mean µ and variance σ 2 of the latent state distribution. The decoder will then generate a latent vector h by sampling from a multivariate Gaussian model with a diagonal covariance matrix and reconstruct the original input. It is worth noting that a simple trick, reparametrization, is used when sampling. It can be expressed as This allows us to sample from a unit Gaussian N (0, I) rather than sampling from the distribution N (µ, σ 2 ), so as to ensure that the results of sampling are derivable and the error can be backpropagated through the network. The loss function for the variational auto-encoder is defined as where  The first term in Equation (10) penalizes reconstruction errors (a feature also found in other auto-encoders). The second term encourages the learned latent-state distribution q(h |x ) to be similar to the prior distribution p(h), which minimizes the KL divergence between these two distributions. The relative weights of these two items are controlled by a hyperparameter λ.
After the auto-encoders have been trained, their encoders output an n-dimensional vector that contains n different latent factors. These latent factors are obtained by the process of dimensionality reduction or compression and can be used to represent n independent implied abstract features of the stock index market. This technique is of great significance in finance. Traditional financial pricing models with shallow architectures (at most two layers) typically describe market information based on linear portfolios. For example, the capital asset pricing model (CAPM) proposed by Sharpe [47] assumes that the market return is expressed by a linear combination of asset returns. In the arbitrage pricing theory (APT) proposed by Rosenberg and McKibbon [48] and Ross [49], a layer of linear factors is used to perform pricing. These traditional financial theories also apply the idea of dimensionality reduction, as they reduce a dataset of n observations (returns or factors) to one parameter. However, while the implied market prices capture linear features of the input asset returns or factors, they ignore a large amount of latent information and the nonlinear relationship between the assets in a complex system with fractality properties. For this reason, we use the auto-encoder model with a hierarchical structure of univariate activation functions of portfolios to make up for the shortcomings of traditional financial models.
The decoders then proceed to reconstruct the input individual stock-returns from the latent representations of the stock index market. However, this process involves compression encoding, and therefore will inevitably bring information loss. Following Heaton et al. [25], we calculate the information loss of each stock during the encoding-decoding process by using Equation (12) below to measure the similarity of the j-th stock with the stock index market (i.e., the total two-norm difference between every original stock return and the corresponding reconstructed one on the training batch): The smaller L j is, the less information the j-th stock loses, and therefore the more similar it is to the stock index market. We rank the stocks by their communal information content, i.e., the amount of information that they share with the stock index market. Since it is not beneficial for improving index tracking performance to include too many stocks contributing the same information, we select a fixed number of the most-communal stocks plus a variable number of the least-communal stocks to construct a tracking portfolio. In addition, in order to investigate the superiorities of auto-encoder-based stock selection strategies, we also adopt for comparison two conventional index-tracking stock-selection strategies: weight ranking and market-value ranking. We evaluate the tracking performance of these strategies under the same conditions.

Index Tracking Model
After selecting the representative stocks by the strategies above, we use an index tracking model to determine the investment weight allocated to each stock in the tracking portfolio, with the objective of minimizing tracking error and other constraints. The index tracking model established in this paper can be expressed as the following quadric programming problem: where R I ∈ R m is a vector of the index return time series; R x = [R 1 , R 2 , . . . , R n ] ∈ R m×n denotes the return matrix of the selected stocks; and w = [w 1 , w 2 , . . . , w n ] ⊤ ∈ R n is a vector of stock weights. The objective function is complemented with a regularization term, λ w 2 2 , to avoid overfitting. In addition, the stock weights are kept non-negative, considering the short-selling restrictions in China's stock market.

Data Description and Processing
We investigate partial replication of the CSI 300 Index with the index tracking strategies we have proposed. The CSI 300 Index is a barometer of China's stock market. Its main income accounts for more than seventy percent of the Chinese market, and it wellrepresents emerging markets throughout the world. We use the daily closing prices of the CSI 300 Index and its constituent stocks from the sample period January 1, 2010 through December 31, 2018 (comprising 2,187 trading days). Because the constituents of the CSI 300 Index are adjusted semi-annually, generally in early January and early July, we obtain the daily closing prices of all the stocks that have been included in the constituents during the sample period. We also record the mid-year and end-year market values of the constituents and their weights from 2010 to 2018, for use in weight ranking and market-value ranking.
To ensure the analysis results are accurate and reliable, we first clean the original pricing data by the following steps: (i) Exclude the stocks if more than 20% of the pricing data is missing in the training set (defined in the next sub-section). (ii) Exclude the stocks if all pricing data for the first 5 days and the last 5 days is missing in the training set. (iii) Exclude the stocks if they have been ejected from the constituents of the CSI 300 Index during the training set and the following testing set (defined in the next sub-section). (iv) Perform linear interpolation to fill the missing prices of the retained stocks.
We obtain the daily return time series r i,t for each stock or the index by calculating r i,t = (P i,t − P i,t−1 )/P i,t−1 , where P i,t denotes the daily closing price of stock (index) i on day t. Then all daily returns are standardized using z-score normalization as follows: wherer i and σ i denote the mean and standard deviation of r i,t , respectively.

Design of Tracking Strategy
In order to construct a dynamically adjusted out-of-sample portfolio to track the index, the data sample is divided into training and testing sets by the rolling-window approach [50]. The rolling-window approach keeps the features of time series in the data, making it match the investment decision-making process in practice. The training set is used to train the stock selection model to select a subset of constituents. The index tracking model which takes the returns of the selected stocks as input is then also trained on the training set to obtain the   stock weights. Afterwards, we construct a tracking portfolio with the selected stocks and corresponding weights obtained from the training set, and compute its portfolio return as well as the index tracking error on the testing set. We use the past four years' data as a training set. The dataset for the following 6 months is regarded as a testing set, in line with the adjustment frequency of the index constituents. This process continues for 5 years on each half-year from Jan. 2014 to Dec. 2018. For each stock selection model, there are in all 10 periods and 5 yearly index tracking results. The tracking procedure is illustrated in Figure 7.

Performance Measurement
We select stocks for each training set by employing eight selection approaches: six auto-encoder-based models, weight ranking, and market-value ranking. The auto-encoders are used to measure the degree of communal information between the stock index market and the constituent stocks. We then sort the constituents accordingly and select a subset of constituents that satisfy our requirements. As an example, Figures 8, 9 illustrate the stock 601618.SH, which shares the most communal information with the stock index market in the first period of the training sets (adopting the signal-hidden-layer undercomplete auto-encoder), and the stock 600015.SH, which shares the least. Obviously, stock 600015.SH loses much more information than stock 601618.SH during the encoding-decoding process. We already know that it is not necessary to add too much communal information to a portfolio. Following Heaton et al. [25], we select the 10 most communal stocks plus the n − 10 least communal stocks to construct a tracking portfolio, where n increases from 15 to 80 in steps of five. The weight (market value) ranking method is to select the n stocks with the largest half-yearly average weights (market values) for inclusion in a tracking portfolio. After determining the stocks required for inclusion in the tracking portfolio, we apply the index tracking model introduced in section Index Tracking Model to determine the stock weights and construct a tracking portfolio to partially replicate the CSI 300 Index. We evaluate the tracking errors on portfolios with the same number of stocks selected by different strategies. A smaller tracking error indicates better tracking performance of the stock selection strategy. The equation for calculating the average tracking error ATE is where the T represents the total number of out-of-sample trading days (which spans from January 1, 2014 to December 31, 2018 and covers 10 adjustment periods as the tracking portfolio is adjusted every half-year); R It and R pt are the returns of the index and of the tracking portfolio at time t. Table 1 shows the out-of-sample tracking error values for the CSI 300 Index. Figure 10 plots how the tracking error values change as a function of the number of stocks for all stock selection strategies. The tracking errors of all strategies decrease as the number of stocks in the tracking portfolio increases. In particular, the tracking error falls quickly when < 40 stocks are included in the portfolio. Furthermore, when the number of stocks included is < 30, the tracking errors of the six auto-encoder-based strategies are significantly smaller than those of the weight ranking and market-value ranking strategies. However, the falling rate of the tracking error slows down when over 40 stocks are required for inclusion. Moreover, the tracking errors of the six auto-encoder-based strategies exceed those of the weight ranking and the market-value ranking strategies when over 40 and 55 stocks are required for inclusion, respectively. We suggest the following explanations for the above results. When the tracking portfolio is constructed with many stocks selected by the weight ranking and market-value ranking strategies, the cumulative origin weight in the index of the selected stocks is larger, making the performance of the tracking portfolio closer to that of the index. While as the number of "AE" is short for "auto-encoder". stocks in the tracking portfolio increases, the auto-encoderbased strategies append more stocks with medium communal information to the portfolio. The portfolios containing the most-and least-communal stocks are well able to reflect the market information. Thus, there is no benefit in having more medium-communal stocks.
Comparing the auto-encoder-based strategies to one another, the tracking errors of the strategies based on sparse, contractive, stacked, and denoising auto-encoders are almost always < that of the strategy based on single-hidden-layer undercomplete auto-encoder regardless of the number of stocks, although the difference is not sizeable. The explanation is that some of these four types of auto-encoders have a deeper structure that can learn more complex coding and deeper market information, while others are regularized to encourage the model to learn other features (except copying the input to the output) without limiting the model capacity by keeping the encoder and decoder shallow and the code size small. In either case, these auto-encoders can create more information-efficient representations of the market than the  single-hidden-layer undercomplete auto-encoder, so that the stocks selected by the strategies based on them better represent the entire market.
However, the strategy based on the variational auto-encoder does not perform better than that based on single-hiddenlayer undercomplete auto-encoder. This can also be explained.
The purpose of an auto-encoder in the present work is to replicate the original input stock information from the latent space representing the compressed market information. However, a variational auto-encoder (mentioned in section Stock Selection Using Auto-Encoders, and normally used as a generative model) is meant to generate variations on an input vector from a continuous latent space: that is, its encoder only outputs a range of possible representations of the market, and these do not necessarily describe the market's current state. Therefore, the output reconstructed by the decoder is far from being a copy of the original input stock information. From this perspective, it is not surprising that the strategy based on the variational auto-encoder does not yield the desired result.
Although increasing the number of stocks in the tracking portfolio will reduce the tracking error, it will not significantly improve the tracking performance, while it will create additional transaction cost when the number of stocks included reaches a certain value. According to the previous analysis, the tracking error decreases rapidly as the number of stocks increases and the corresponding transaction cost is acceptable if a tracking portfolio is constructed with < 40 stocks. Therefore, the number of stocks in the tracking portfolio should be kept under 40 when balancing the tracking error and the transaction cost.
Considering the absolute tracking error values and the slope of the curves for the auto-encoder-based strategies in Figure 10, we find the tracking performance of auto-encoder-based strategies greatly surpasses that of conventional strategies for a 25stock tracking portfolio. Figure 11 shows the out-of-sample cumulative return curves of the CSI 300 Index and the 25-stock tracking portfolios constructed by our proposed strategies. The relative advantages of auto-encoder-based stock selection strategies can be seen clearly. In particular, the tracking error of the market-value ranking strategy is 5.940 × 10 -3 , and that of the weight ranking strategy is 5.224 × 10 -3 . Among the six auto-encoder-based strategies, the tracking error of the denoising auto-encoder-based strategy is the smallest at 3.940 × 10 -3 , which is 33.67% lower than that of market-value ranking and 24.58% lower than that of weight ranking. The other five auto-encoderbased strategies also track better than the conventional strategies to varying degrees. Even the worst-performing auto-encoderbased strategy (the variational auto-encoder) has reductions of 25.88 and 15.72% compared to market-value ranking and weight ranking strategies, respectively. In conclusion, auto-encoderbased strategies outperform conventional strategies, provided that only a small number of stocks are required for inclusion in a tracking portfolio.

Robust Test
To evaluate the sensitivity of these empirical results to changes in the data sample, we perform various robustness checks.
First, variations in length of the training set may have an impact on the results. As a robustness check, we analyze the tracking performance when each training set length is changed to 3 or 5 years, respectively. Keeping each testing set length at 6 months, the length of the out-of-sample period accordingly changes to 6 and 4 years, respectively. Figures 12,  13 illustrate how the curves of the tracking error vary with the number of stocks when each training set has a length of 3 and 5 years, respectively. The results reveal that the   auto-encoder-based strategies tracks the index better than the conventional strategies when < 30 stocks are included in a tracking portfolio. In particular, the sparse auto-encoder-based strategy gets the lowest tracking error among all the autoencoder-based strategies with 3-year training sets, whereas the denoising auto-encoder-based strategy performs best with 5year training sets. In addition, the tracking error values change little in response to variations in the length of the training set.
Thus, our base case results hold for these alternative trainingset lengths.
Second, the rebalancing frequency, which is the reciprocal of the length of each testing set, will affect the performance of dynamic portfolio management. We compute quarterly and yearly rebalanced portfolios while keeping the training-set length unchanged to investigate the sensitivity of our results to alternative rebalancing frequencies. The results in Figures 14, 15 verify that our base case results are robust to these changes. In particular, with quarterly rebalancing, the sparse auto-encoderbased strategy tracks the index best among all auto-encoderbased strategies and brings greater improvement on conventional strategies' tracking performance compared to the base case results. In the case of a 25-stock portfolio, the tracking error of the sparse auto-encoder-based strategy is 3.861 × 10 -3 , which is 34.91% lower than the market-value ranking and 25.86% lower than the weight ranking. In contrast, with yearly rebalancing, the denoising auto-encoder-based strategy gets the best tracking performance. This proves that the sparse and denoising autoencoder-based strategies are better at index tracking than the other four auto-encoder-based strategies.

CONCLUSIONS
We investigate the index tracking performance of deep learningbased tracking approaches. In particular, we use a variety of advanced auto-encoders: single-hidden-layer undercomplete, sparse, contractive, stacked, denoising, and variational autoencoders to extract the complex non-linear relationship between stocks in a complex stock market system and construct dynamic tracking portfolios with subsets of stocks. Only one or two of these auto-encoders has previously been examined in the context of stock selection. Moreover, we evaluate for the first time whether auto-encoder-based strategies improve the tracking performance over the conventional strategies of weight ranking and market-value ranking.
In general, we find that whether auto-encoder-based strategies outperform conventional ones depends upon the number of stocks included in the tracking portfolio. When only a small number of stocks (probably < 30) are needed to construct a tracking portfolio, the auto-encoder-based strategies are generally superior to conventional strategies in terms of tracking performance. Furthermore, auto-encoders with particular architectures that can learn high-capacity, overcomplete encodings of the input, e.g., sparse and denoising auto-encoders, are better even than other auto-encoders at capturing complex latent representations of the market. The portfolios with stocks selected by these auto-encoders better replicate the index. However, if more than 40 stocks are required for inclusion, the conventional strategies still have the advantage.
Our findings suggest that deep learning algorithms are suitable for index tracking problems if the hierarchical architectures are explicitly designed. We expect these findings to be helpful in making asset-allocation decisions, especially, for indexing investment. Nonetheless, there are some limitations to the study: our analysis concerns a specific dataset; the impact of transaction costs on index tracking performance is not quantified; and hyperparameter optimization is not well performed when constructing the models. Therefore, additional work with a more extensive dataset, optimized model settings, and greater practical realism would help to confirm our findings. This research can easily be extended to test other deep learning frameworks for index tracking in the future.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found at: the WIND database (http://www.wind.com.cn) provided by Shanghai Wind Information Co., Ltd.

AUTHOR CONTRIBUTIONS
CZ: conceptualization, methodology, and formal analysis. CZ and FL: writing-original draft and data curation. LF: writingreview, editing, supervision, and funding acquisition. SL: formal analysis, writing-review, editing, and data curation. All authors contributed to the article and approved the submitted version.