Deep learning-based pairs trading: real-time forecasting of co-integrated cryptocurrency pairs

Tsoku, Johannes Tshepiso; Makatjane, Katleho

doi:10.3389/fams.2026.1749337

ORIGINAL RESEARCH article

Front. Appl. Math. Stat., 30 January 2026

Sec. Statistics and Probability

Volume 12 - 2026 | https://doi.org/10.3389/fams.2026.1749337

This article is part of the Research TopicAdvances in Time Series Forecasting and Applications in FinanceView all articles

Deep learning-based pairs trading: real-time forecasting of co-integrated cryptocurrency pairs

Johannes Tshepiso Tsoku¹^*

Katleho Makatjane²

¹Department of Business Statistics and Operations Research, North West University, Mafikeng, South Africa
²Department of Statistics and Population Studies, University of the Western Cape, Cape Town, South Africa

Statistical arbitrage strategies, including pairs trading, rely on identifying co-movements and static long-term equilibrium relationships between assets, where conventional methods fail to capture non-stationary dynamics, hence reducing trading effectiveness. This study, therefore, addresses this challenge by employing a dynamic co-integration approach combined with deep learning techniques to select suitable cryptocurrency pairs and forecast spread dynamics. The study examines multiple cryptocurrencies, namely: BNB, Ethereum, Litecoin, Ripple, and USDT, using dynamic Johansen co-integration tests to identify pairs with time-varying equilibrium relationships, and model the spread through a Dynamic Weighted Ensemble of Deep Neural Network and Long Short-Term Memory. Forecasting accuracy, trading performance, and predictive uncertainty are evaluated using error metrics, trading outcomes, and 99% prediction intervals. The results indicate that only those cryptocurrencies with dynamically coherent relationships are suitable for mean-reversion strategies. Furthermore, the study found that the Dynamic Weighted Ensemble achieves the best predictive accuracy. At the same time, LSTM captures proportional temporal dynamics effectively, and the ensemble-driven trading signals generate timely buy and sell decisions with low-lag execution and robust management of market volatility. These findings, therefore, highlight the advantages of combining dynamic co-integration and adaptive deep learning for statistical arbitrage.

1 Introduction

Two major bear markets have affected global financial markets over the past two decades: the subprime mortgage Global Financial Crisis and the collapse of the high-technology bubble. These events exposed substantial limitations in conventional financial theories, as many investors experienced significant losses and were forced to delay retirement plans. In particular, heightened volatility and market unpredictability revealed the shortcomings of portfolio management techniques, such as Markowitz's mean–variance optimisation, in safeguarding investments against systemic risks. Consequently, market participants have increasingly questioned the reliability of these widely adopted methods and have shown growing interest in market-neutral long–short equity strategies that aim to minimize exposure to common risk factors while delivering relatively stable returns, even during periods of market stress [1].

Among such strategies, pairs trading has emerged as one of the most prominent and extensively studied market-neutral approaches. The strategy exploits temporary deviations in the price relationship between two assets by establishing offsetting long and short positions and profiting from subsequent convergence or divergence relative to an assumed equilibrium relationship [2]. Pairs trading gained early prominence in the 1980s through the work of Wall Street quantitative researcher Nunzio Tartaglia, who assembled a multidisciplinary team of mathematicians and computer scientists to develop computer-based trading systems that reduced human subjectivity in investment decisions [3, 4]. Although the strategy initially demonstrated strong performance, it subsequently experienced sustained periods of underperformance, highlighting its sensitivity to evolving market conditions [5]. The strategy achieved widespread academic recognition following the influential study of Gatev et al. [3], which formalized pairs trading through the introduction of the distance method (DM). Since then, pairs trading has become a foundational tool in quantitative finance and has been widely applied across different asset classes and market environments. Early empirical evidence suggests that the DM was historically profitable [6, 7]; however, subsequent studies document a deterioration in performance in more recent periods, particularly after accounting for transaction costs. Moreover, Grosu et al. [8] emphasized that profitability has been shown to vary significantly across market regimes, with greater resilience observed during periods of heightened volatility and financial stress. These mixed findings have motivated the exploration of alternative modeling approaches capable of capturing more complex dependence structures between asset prices.

In this context, copula and mixed-copula models have been proposed as flexible tools for modeling nonlinear and asymmetric dependence structures that correlation-based measures, such as Pearson correlation, fail to capture adequately [1]. While prior research has examined copula-based pairs trading strategies in specific markets and limited timeframes [9], comprehensive assessments remain scarce, particularly in large and rapidly evolving markets such as the Chinese equities market. To address this gap, Sun [1] conducted a systematic investigation using copula and mixed-copula methodologies on a diverse sample of Chinese stocks. Despite their flexibility, copula-based approaches have limitations. According to Flos et al. [10] their implementation requires specialized knowledge of copula theory and parameter estimation, making them more complex to specify, validate, and interpret than more conventional methods such as co-integration; accurate analysis of copula parameters is often data-intensive and typically relies on large volumes of high-frequency observations, which may be unavailable or costly to obtain [11]; the effectiveness of resulting trading strategies is highly sensitive to correct copula family selection and parameter calibration, with misspecification potentially leading to distorted tail-dependence characterization and unreliable trading signals; moreover, [12] emphaise that most standard copula models assume a static dependence structure, an assumption that is frequently violated in financial markets where correlations evolve, thereby motivating more complex and computationally demanding time-varying extensions; finally, copula models are largely statistical in nature and do not inherently provide an economic rationale for long-run co-movement between assets, in contrast to co-integration frameworks that are grounded in equilibrium relationships. With respect to correlation analysis, it remains unable to identify long-term equilibrium relationships or adapt to changing market dynamics [13], while econometric techniques such as co-integration tests, error correction models, and Kalman filter approaches, although capable of modeling time-varying spreads, are constrained by assumptions of linearity and stationarity as indicated by Faizullin [14] and Eroǧlu et al. [15]. In practice, estimation error, sensitivity to rolling-window selection, and parameter instability often undermine out-of-sample performance, and both correlation and copula-based methods face challenges when modeling high-dimensional interactions in rapidly evolving markets.

Recent studies published between 2023 and 2025 this including studies by Rayaprolu [16], Rotondi and Russo [17], and Liou et al. [18] have sought to address these methodological limitations through enhanced statistical modeling, deep learning, and ensemble-based approaches. While these contributions have improved forecasting accuracy or volatility modeling, they typically focus on individual components of the pairs trading process, such as dependence modeling, spread construction, or prediction in isolation. In particular, recent ensemble learning applications, such as that of Ferrouhi and Bouabdallaoui [19], often rely on static or fixed-weight combinations that lack adaptability to regime changes, whereas econometric and copula-based methods remain limited in their ability to jointly capture nonlinear, high-dimensional, and time-varying dependencies. Among others, Keshavarz Haddad and Talebi [20] make a hypothetical portfolio composed of pairs of stocks by exploring a significant association between their prices in the Toronto Stock Exchange. The authors therefore compare the profitability of distance, co-integration, and copula functions as the pair's selection and trading strategy devices in TSX over January 2017 to June 2020. Building directly on these recent contributions, the present study advances the literature by integrating dynamic Johansen co-integration with deep neural networks (DNNs), long short-term memory (LSTM) networks, and a dynamic weighted ensemble (DWE) within a unified pairs trading setting. This integration enables the joint modeling of time-varying long-run equilibrium relationships, nonlinear and high-dimensional dependence structures, and evolving predictive relevance, capabilities that recent standalone statistical or machine learning approaches do not simultaneously provide.

A dynamic Johansen co-integration approach is employed in this study to identify cointegrated relationships among six major cryptocurrency pairs: Bitcoin–Ethereum (BTC–ETH), Bitcoin–Litecoin (BTC–LTC), Bitcoin–Ripple (BTC–XRP), Ethereum–Litecoin (ETH–LTC), Ethereum–Ripple (ETH–XRP), and Litecoin–Ripple (LTC–XRP). Unlike static co-integration approaches, which assume a fixed long-run equilibrium over the entire sample period, the dynamic formulation allows both the existence and strength of co-integrating relationships to evolve. This flexibility is particularly important in cryptocurrency markets, where rapid technological innovation, regulatory interventions, liquidity shifts, and speculative behavior frequently induce structural breaks and regime changes. By accommodating such temporal variation, the dynamic co-integration approach provides empirical support for the Adaptive Market Hypothesis (AMH) of Lo [21] as cited by Kellner [22], which posits that market efficiency and asset interactions are inherently time-varying rather than static. Compared to static co-integration methods, dynamic co-integration reduces the risk of model mis-specification arising from unobserved structural shifts and avoids falsely rejecting long-run relationships that may hold intermittently rather than continuously. While correlation-based measures capture only short-term co-movements and are highly sensitive to volatility clustering, dynamic co-integration uncovers time-varying long-run equilibrium relationships even when individual price series are non-stationary [23, 24]. This enables the construction of adaptive mean-reverting spreads that remain economically meaningful across different market regimes, thereby providing a more robust foundation for market-neutral trading strategies and improved control of systemic risk [7].

These dynamically estimated spreads are subsequently used as inputs to Deep Neural Network (DNN) and Long Short-Term Memory (LSTM) architectures, allowing the models to learn from evolving equilibrium deviations rather than from static residual processes. Deep neural networks are well-suited to extracting complex nonlinear and high-dimensional patterns from these spreads, whereas LSTMs explicitly capture temporal dependencies and regime persistence, which are critical in highly volatile and non-stationary cryptocurrency markets [25, 26]. By integrating dynamic co-integration with deep learning models, the proposed approach aligns econometric theory with data-driven adaptability, thereby enhancing both predictive performance and economic interpretability.

Finally, this study introduces a dynamic weighted ensemble (DWE) that adaptively combines the complementary strengths of DNN and LSTM models. Unlike recent ensemble approaches that employ fixed see for instance [27, 28] or complexity-driven weighting schemes, the proposed DWE continuously reallocates model weights based on recent predictive performance, allowing the model combination to evolve as market conditions change. This adaptive mechanism enables the ensemble to respond effectively to regime transitions, structural breaks, and volatility spikes, thereby reducing out-of-sample forecasting errors during periods of rapid changes in model relevance.

1.1 Research highlights and key findings

This study assesses the predictive efficacy of deep learning models within a dynamic co-integration-based pairs trading framework, juxtaposing them with econometric models and introducing a dynamic weighted ensemble that adapts to variable market conditions. Consequently, Table 1 summarizes the principal findings of the research.

• Limitations of previous models: Prior studies on pairs trading predominantly employed Engle-Granger or Johansen co-integration methods in conjunction with linear mean-reversion signals [29–31], assuming stable statistical relationships that rarely persist in real markets marked by structural breaks and regime shifts.

• Constraints of current machine learning ensembles: Recent applications of machine learning [32, 33] often employ fixed-weight or non-adaptive ensembles, which cannot adjust to evolving market dynamics, thus limiting their effectiveness in time-varying conditions.

• Dynamic co-integration method: This research utilizes a dynamic Johansen co-integration approach to ascertain time-varying equilibrium relationships among asset pairs, thus providing a more precise representation of market behavior.

• The dynamic weighted ensemble adjusts model contributions in real time based on current predictive performance, allowing for adaptation to changing market conditions and outperforming static ensemble methods.

• Complementary strengths of constituent models: The dynamic ensemble utilizes deep neural networks to manage nonlinearities, long short-term memory networks to capture temporal dependencies, and co-integration to maintain equilibrium-correcting behavior.

• Empirical advantage: The results demonstrate that the dynamic ensemble achieves higher predictive accuracy and greater stability in trading returns relative to static or fixed-weight ensembles.

• Alignment with the adaptive market hypothesis: This method aligns with Lo [21], emphasizing that model performance depends on variable market conditions and changing dependencies.

• Methodological and practical insights: Hybrid models utilizing adaptive weighting provide a robust framework for forecasting and exploiting co-integrated relationships within financial markets.

Table 1

Table 1. Research Key Findings.

2 Methodology

This study establishes a pairs trading strategy specifically for cryptocurrencies, utilizing historical price data spanning from 02 January 2018 to 31 October 2025, and this gives a total of 2842 observations, which is obtained from https://finance.yahoo.com/ using a yfinance package from Python accessed on 1st November 2025. Co-integrated pairs are identified using a dynamic Johansen co-integration method, which accounts for time-varying long-term equilibrium relationships, with six major cryptocurrency pairs considered: Bitcoin–Ethereum (BTC–ETH), Bitcoin–Litecoin (BTC–LTC), Bitcoin–Ripple (BTC–XRP), Ethereum–Litecoin (ETH–LTC), Ethereum–Ripple (ETH–XRP), and Litecoin–Ripple (LTC–XRP). The most suitable pairs for trading are selected based on their mean-reversion properties. The spread between each pair is modeled using DNN and LSTM networks to generate forecasts and predictive intervals, which support the formulation of dynamic trading signals. Model training and evaluation are conducted on high-performance computing (HPC) resources to handle the computational demands of deep learning, and the system is subsequently deployed for real-time prediction on cloud platforms AWS EC2 and Lambda, enabling automated, on-demand forecasting and trading signal generation.

2.1 Modeling assumptions and constraints

The proposed pairs trading in this study is developed under a set of explicit modeling assumptions designed to reflect the empirical properties of cryptocurrency markets while ensuring methodological tractability. First, individual cryptocurrency price series are assumed to be non-stationary, while linear combinations identified through dynamic Johansen co-integration are expected to exhibit mean-reverting behavior. This assumption underpins the construction of tradable spreads and is consistent with established statistical arbitrage literature.

Second, the long-run equilibrium relationships between asset pairs are assumed to be time-varying rather than static, motivating the use of a dynamic co-integration approach. This allows the model to accommodate regime shifts, structural changes, and evolving dependence structures that are characteristic of high-volatility cryptocurrency markets. Third, it is assumed that the selected cryptocurrency pairs exhibit sufficient liquidity and market depth such that trading signals derived from the model can be executed without significant delays or excessive transaction cost distortions. While transaction costs are not explicitly modeled, the methodology proposed here is intended for highly liquid digital assets where such frictions are relatively limited. Fourth, deep learning architectures used in this study are assumed to be capable of capturing nonlinear, high-dimensional, and temporal dependencies in the co-integration-based spreads without imposing restrictive parametric distributional assumptions. The training procedure is designed to avoid look-ahead bias by strictly separating training, validation, and forecasting periods in a rolling and sequential manner. Finally, the dynamic weighted ensemble assumes that the relative predictive performance of individual models varies over time. Accordingly, model weights are updated adaptively based on recent forecasting accuracy, with the aim of improving robustness and reducing out-of-sample error during periods of rapid market change. These assumptions collectively define the scope and limitations of the proposed methodology and provide a transparent basis for interpreting the empirical results.

2.2 Deep neural network

A deep neural network is a type of artificial neural network with multiple hidden layers between the input and output layers. These hidden layers, according to Osigbemeh et al. [34], allow the network to learn complex patterns and representations from data, making them suitable for tasks such as image recognition, natural language processing, and time-series forecasting. Mathematically, the operation of a single neuron is expressed as follows

\begin{array}{l} y_{j} = f (\sum_{i = 1}^{n} w_{j i} x_{i} + b_{j}), & (1) \end{array}

where x_i are the inputs (i = 1, …, n), w_ji are the weights connecting input i to neuron j, b_j is the bias for neuron j, f(·) is a nonlinear ReLU activation function, and y_j is the output of neuron j. While a single neuron captures only simple relationships, stacking multiple layers enables the network to model highly complex and nonlinear functions. In a DNN with L layers, the computations with respect to Awad and Khanna [35], proceed as follows

\begin{array}{l} y^{l} = f^{l} (W^{l} y^{l - 1} + b^{l}), l = 1, \dots, L, & (2) \end{array}

where y⁰ = x is the input vector. Here, W^l and b^l are the weights and biases for layer l, f^l is the ReLU activation function, and y^l is the output of the l-th layer.

Training a DNN involves adjusting the weights and biases across all layers to minimize a loss function, commonly using stochastic gradient descent (SGD), where the SGD performs a forward pass to make predictions, calculates the loss to assess error, and then executes a backward pass using backpropagation to compute gradients of the loss with respect to the model parameters. These gradients are then used to update the weights. and the network's depth allows it to learn hierarchical representations from input data, with early layers capturing basic features and deeper layers learning more complex patterns [36]. In addition, Fan et al. [37] declared that DNNs are universal function approximators capable of automatically learning important features from raw input, making them widely used in areas such as image recognition, natural language processing, and time-series forecasting. Figure 1 illustrates the proposed DNN architecture used in this study.

Figure 1

Neural network diagram with an input layer of three nodes (X1, X2, X3). Two hidden layers follow: Hidden Layer 1 with three nodes using ReLU activation and dropout of 0.3, and Hidden Layer 2 with two nodes using ReLU activation and dropout of 0.2. The output layer consists of one node with a linear activation function, producing output $ \hat{y} $.

Figure 1. Deep neural network architecture. Source: authors.

2.3 Long short-term memory (LSTM) network

Long short-term memory networks are powerful for sequence modeling tasks such as time-series forecasting. The hidden layer's memory blocks constitute the LSTM's core, comprising cells and three gates: input, output, and forget gates. The forget gate f_t mitigates the limitations of standard recurrent neural networks by selectively forgetting irrelevant past information [38], while the input and output gates regulate information flow from the feature vector x_t to the hidden state h_t. The computations within an LSTM cell are given by

\begin{array}{l} i_{t} = σ (W_{i} x_{t} + U_{i} h_{t - 1} + b_{i}), & (3) \end{array}

\begin{array}{l} {\tilde{C}}_{t} = tanh (W_{c} x_{t} + U_{c} h_{t - 1} + b_{c}), & (4) \end{array}

\begin{array}{l} f_{t} = σ (W_{f} x_{t} + U_{f} h_{t - 1} + b_{f}), & (5) \end{array}

\begin{array}{l} C_{t} = f_{t} ⊙ C_{t - 1} + i_{t} ⊙ {\tilde{C}}_{t}, & (6) \end{array}

\begin{array}{l} o_{t} = σ (W_{o} x_{t} + U_{o} h_{t - 1} + b_{o}), & (7) \end{array}

\begin{array}{l} h_{t} = o_{t} ⊙ tanh (C_{t}), & (8) \end{array}

where i_t, f_t, and o_t are the input, forget, and output gate vectors; $C_{t}$ is the cell state; ${\tilde{C}}_{t}$ is the candidate cell state; W_* and U_* are distinct weight matrices for each gate; b_* are bias vectors; σ and tanh are the sigmoid and hyperbolic tangent activation functions; and ⊙ denotes the Hadamard (element-wise) product, defined as (a⊙b)_i = a_ib_i. The sigmoid function σ(·) maps inputs to (0, 1), while tanh(·) maps inputs to (−1, 1), ensuring stability in the cell state. The derivative of $tanh (C_{t})$ is bounded in (0, 1), which in standard RNNs leads to vanishing gradients during backpropagation through time. The LSTM architecture mitigates this problem through the cell state $C_{t}$ and gating mechanisms, which allow error signals to propagate over longer sequences without vanishing, preserving long-term dependencies. By explicitly using separate weight matrices for each gate and including gating mechanisms, the LSTM can effectively model complex temporal dependencies, even in volatile financial time series. Figure 2 illustrates the standard LSTM architecture and gating mechanisms.

Figure 2

Diagram of an LSTM cell showing inputs and outputs. The input $x_t$ enters the cell, passing through input, forget, and output gates, affecting the cell state and generating the output $\hat{y}_t$. The previous hidden state $h_{t-1}$ and cell state $c_{t-1}$ are also shown. Connections illustrate the flow of information through the candidate, gates, and cell state.

Figure 2. Long short-term memory network architecture. Source: authors.

2.4 Parameter estimation

Parameter estimation in this study is conducted in a structured and transparent manner to ensure statistical validity and reproducibility. Long-run equilibrium relationships among cryptocurrency prices are estimated using the Johansen co-integration procedure applied to log-transformed series. A parsimonious deterministic specification and fixed lag structure are adopted to balance model flexibility and stability, and the leading co-integrating vector is used to construct the equilibrium spread. This approach allows the estimation of co-integration parameters without imposing restrictive distributional assumptions while accommodating non-stationary price dynamics. For the deep learning models, parameters are estimated through supervised learning using rolling-window representations of the standardized co-integration spread. Both the deep neural network and long short-term memory models are trained via backpropagation using adaptive optimisation algorithms. Network weights and biases are iteratively updated by minimizing the mean squared error loss function, with convergence monitored through validation loss. Regularization mechanisms, including dropout, batch normalization, early stopping, and adaptive learning rate adjustment, are incorporated to stabilize estimation and mitigate overfitting. Temporal ordering is preserved throughout training to ensure that parameter estimates reflect genuine predictive structure rather than artifacts of look-ahead bias.

The deep learning models, including the DNN and LSTM, are trained to minimize the mean squared error (MSE) between the predicted outputs and the observed values. Formally, the loss function is defined as follows

\begin{array}{l} L = \frac{1}{N} \sum_{t = 1}^{N} {(y_{t} - ŷ_{t})}^{2}, & (9) \end{array}

where y_t is the true value at time t, ŷ_t is the corresponding model prediction, and N is the number of observations in the training set. Training proceeds via backpropagation with adaptive optimisation algorithms, such as Adam, where weights and biases are iteratively updated to minimize $L$ . Regularization techniques, including dropout, batch normalization, early stopping, and learning rate scheduling, are incorporated to mitigate overfitting and stabilize convergence. For sequence models, temporal ordering is preserved to prevent look-ahead bias, ensuring that the loss minimization reflects genuine predictive performance.

2.5 Prediction interval widths and application to cryptocurrency spread forecasting

In predictive modeling for financial time series, it is crucial not only to produce accurate point forecasts but also to quantify the uncertainty associated with those forecasts. Prediction intervals (PIs) provide a probabilistic range around a point forecast, indicating where future observations are expected to fall at a specified confidence level [39]. The width of these intervals, known as the Prediction Interval Width (PIW), serves as a key measure of forecast uncertainty, with narrower intervals indicating greater confidence and wider intervals indicating higher uncertainty [40]. Parametric methods for constructing PIs often assume normality of forecast errors and static distributional properties; however, such assumptions are frequently violated in financial settings, where non-stationarity, heavy tails, and regime changes are common. To address this, this study employs dynamic, distribution-free prediction intervals that adaptively adjust to changes in forecast uncertainty and do not rely on specific error distribution assumptions. These dynamic intervals are consistent with conformal prediction frameworks developed for time series, such as Temporal Conformal Prediction [41] and Bellman Conformal Inference [42], which construct well-calibrated intervals under distributional shifts and temporal dependence without assuming error normality. This contrasts with the Bayesian approach, which was used by Makatjane [43], which, while theoretically sound, is computationally intensive and sensitive to prior specification. For a forecast horizon t+h, let ŷ_t+h denote the point forecast. The dynamic prediction interval is defined as

\begin{array}{l} {PI}_{t + h} = ŷ_{t + h} \pm q_{α / 2}^{dyn} \cdot {\hat{σ}}_{t + h}, & (10) \end{array}

where ${\hat{σ}}_{t + h}$ is the estimated forecast standard deviation, computed from the variance of an ensemble of model forecasts, and $q_{α / 2}^{dyn}$ is the empirical quantile of forecast errors corresponding to the desired confidence level. The Prediction Interval Width (PIW) is then computed by

\begin{array}{l} {PIW}_{t + h} = 2 \cdot q_{α / 2}^{dyn} \cdot {\hat{σ}}_{t + h}, & (11) \end{array}

where the factor 2 explicitly assumes asymmetric intervals around the point forecast. For the ensemble model, let f_i(t+h) denote the forecast from the i-th base learner in an ensemble of M models. The ensemble mean forecast and variance are computed as follows

\begin{array}{l} ŷ_{t + h}^{ensemble} = \sum_{i = 1}^{M} w_{i} (t + h) f_{i} (t + h), & (12) \end{array}

\begin{array}{l} {\hat{σ}}_{t + h}^{2} = \frac{1}{M} \sum_{i = 1}^{M} {(f_{i} (t + h) - ŷ_{t + h}^{ensemble})}^{2}, & (13) \end{array}

where w_i(t+h) are normalized weights summing to 1; equal weights can be used as a simplification. The ensemble PIW is then given by

\begin{array}{l} {PIW}_{t + h}^{ensemble} = 2 \cdot q_{α / 2}^{dyn} \cdot {\hat{σ}}_{t + h} . & (14) \end{array}

The reliability and calibration of prediction intervals are evaluated using the following metrics:

Mean Prediction Interval Width (MPIW): the average width of prediction intervals over the evaluation period and it is computed as follows

\begin{array}{l} MPIW = \frac{1}{N} \sum_{t = 1}^{N} {PIW}_{t}, & (15) \end{array}

Coverage Probability (CP): the proportion of actual observations lying within the prediction intervals. The indicator function is defined logically, without braces, as follows

\begin{array}{l} CP = \frac{1}{N} \sum_{t = 1}^{N} 1 (y_{t} \in [ŷ_{t} \pm q_{α / 2}^{dyn} {\hat{σ}}_{t}]) . & (16) \end{array}

Minimum and Maximum PIW: reports the range of interval widths, highlighting periods of low or high predictive uncertainty, particularly during market stress. It is noted that CP calculations assume independence of forecast intervals over time. Dynamic, distribution-free methods, such as conformal prediction, explicitly account for temporal dependence and regime shifts, providing robust coverage even in non-stationary, volatile markets.

3 Results

Figure 3 illustrates the closing prices of five cryptocurrencies: Ethereum (black), BNB (blue), Litecoin (green), Ripple (red), and USDT (purple) from 02 January 2018 to 31 October 2025. Ethereum demonstrates significant valuation and volatility, with prominent price increases nearing 5,000 during the major bull cycles of 2021–2022 and 2024–2025. BNB has become the second-most valuable asset, demonstrating significant growth since 2021, with anticipated values exceeding 1,000 by 2024. In contrast, Litecoin, Ripple, and stablecoin USDT exhibit relatively stable price movements, with USDT largely adhering to its expected peg, in alignment with its function as a stablecoin. In addition to individual price dynamics, Figure 3 illustrates clear co-movement patterns among the assets. Ethereum and BNB exhibit significant synchronous fluctuations during market-wide rallies and corrections, suggesting that these assets are responsive to broader cryptocurrency market cycles. Litecoin and Ripple show a weaker but still noticeable correlation with major assets, especially when the market is more volatile. These patterns demonstrate significant variability in price behavior and inter-asset relationships. Ethereum and BNB significantly influence market fluctuations, while Litecoin, Ripple, and USDT have a relatively minor impact. The observed dynamics warrant a focused analysis on Ethereum and BNB in the context of dynamic co-integration and pairs trading, as their significant price movements are expected to yield more statistically reliable signals for mean-reversion strategies.

Figure 3

Line graph comparing the closing prices of BNB, Ethereum, Litecoin, Ripple, and USDT from 2018 to 2026. BNB shows significant fluctuations with high peaks in 2021 and 2025, while other cryptocurrencies remain relatively stable with minor increases.

Figure 3. Time series plot for the five cryptocurrencies.

3.1 Dynamic co-integration analysis for cryptocurrency pairs

Identifying cointegrated cryptocurrencies is essential for developing effective pairs trading strategies, as co-integration signifies a stable long-term equilibrium relationship conducive to mean-reversion trading. This study uses a dynamic co-integration methodology, as outlined by Franses [44], to address evolving market dynamics, differing from the static approach employed by Chen and Alexiou [45]. This approach facilitates the modeling of time-varying interactions among cryptocurrencies, thereby enabling more flexible and adaptive trading strategies in response to fluctuating market conditions. Panel (a) of Table 2 displays the initial dynamic cointegrating vector. USDT is identified as the dominant element of the non-stationary combination, exhibiting a coefficient of −270.904, whereas Ethereum, BNB, Ripple, and Litecoin display relatively minor contributions. The cointegrating vector is used to create a dynamically non-stationary spread, which serves as the foundation for statistical arbitrage and mean-reversion trading. The eigenvalues of the co-integration matrix reinforce this conclusion: the first eigenvalue (0.0773) significantly exceeds the following values, indicating the existence of a predominant non-stationary combination. Panel (b) of Table 2 presents the time-varying lag Johansen trace statistics. The null hypothesis of no co-integration (r ≤ 0) is rejected at the 95% confidence level, as evidenced by a Trace statistic of 276.319, which surpasses the critical value of 69.819. The rejection of this hypothesis indicates the presence of a statistically significant co-integration relationship between USDT and the other four cryptocurrencies. The results indicate that the dynamically computed spread effectively captures a significant long-term equilibrium relationship, thereby validating its application in adaptive mean-reversion trading strategies. This approach explicitly models time-varying co-integration, ensuring that the trading strategy adapts to structural changes in cryptocurrency market behavior rather than depending on static measures.

Table 2

Table 2. Dynamic co-integration analysis.

3.2 Pairs trading strategy: a deep learning approach

The subsequent phase in executing the pairs trading strategy involves calculating the spread between the co-integrated cryptocurrencies. The hedge ratio is derived from the first co-integrating vector identified through the dynamic lagged Johansen test, reflecting the dynamic co-integration relationship among the selected assets, rather than through a static linear regression approach. The spread is defined as the difference in the logarithmic prices of the assets, adjusted by the co-integration coefficients and the dynamic score. This formulation identifies deviations from long-term equilibrium, serving as the basis for predictive modeling and the creation of trading signals. In contrast to Ko et al. [24], which evaluated six statistical methods for spread analysis, the current study utilizes DNN and LSTM models to analyse the computed spread. These models are well-suited for identifying potential mean-reversion opportunities, which are essential for developing effective trade signals. This modeling strategy is reinforced by Sun et al. [46], who conducts a systematic survey of machine learning, deep learning, reinforcement learning, and deep reinforcement learning methods for pairs trading, thereby establishing a methodological foundation for replication and further advancement. Figure 4 presents the calculated spread from January 2018 to October 2025. The series initiates at a value close to zero and progressively decreases until the conclusion of 2021, indicating a substantial negative divergence from equilibrium. The spread demonstrates significant peaks during major market rallies, accompanied by abrupt declines and heightened volatility periods. From 2022 to late 2025, the spread exhibits fluctuations characterized by both upward and downward trends, along with periodic deviations exceeding the ±1 standard deviation. The observed dynamics indicate periods when the market might be overvalued or undervalued relative to its long-term equilibrium, highlighting potential opportunities for mean-reversion. According to Monge et al. [47], these patterns can be used in a data-driven manner to produce actionable trading signals for cryptocurrency pairings.

Figure 4

Line graph showing the dynamic score of price differences from 2018 to 2026. The blue line fluctuates around a mean of zero, with significant deviations evident, especially post-2020. Upper and lower limits are marked at positive and negative one, respectively.

Figure 4. Time-series plot illustrating the spread from 2018 to 2025.

The comparative evaluation of the DNN, LSTM, and proposed Dynamic Ensemble models across various forecasting error metrics is presented in Table 3. The Dynamic Ensemble demonstrates superior performance compared to individual models in magnitude-based forecasting, attaining the lowest MSE of 0.012124, RMSE of 0.110108, and MAE of 0.083607. The results demonstrate that the ensemble achieves the lowest absolute deviations between predicted and actual spread values, resulting in more precise point forecasts compared to the DNN or LSTM individually. The ensemble further shows the lowest mean forecast error (MFE) of −0.043546, indicating minimal systematic bias and reinforcing its predictive reliability. The findings align with [48, 49], which indicate that ensemble methods outperform individual base learners. The LSTM demonstrates the lowest Mean Absolute Percentage Error (MAPE) of 1.490429%, indicating its proficiency in accurately capturing relative, percentage-based deviations and the temporal dependencies present in time series data. Although the magnitude-based errors of the LSTM exceed those of the DNN and ensemble methods, they demonstrate notable effectiveness in modeling proportional changes, which is crucial for dynamic trading strategies. The DNN exhibits strong performance in magnitude-based metrics, achieving the lowest Theil's U statistic (0.371179). Such behavior indicates improved efficiency compared to a naïve benchmark and demonstrates reliable scaled forecasting performance. The results highlight the complementary strengths of the three models: the DNN excels at benchmark-relative forecasting, the LSTM offers precise proportional predictions, and the dynamic ensemble combines these advantages to attain optimal accuracy and minimize bias.

Table 3

Table 3. Performance comparison of DNN, LSTM, and dynamic ensemble.

Figure 5 illustrates the forecasting performance of the DNN, LSTM, and Dynamic Ensemble models during the test period. The ensemble model generates consistent predictions that align closely with the actual spread, while also preserving sensitivity to notable market fluctuations. The LSTM forecasts demonstrate increased variability, responding significantly to temporary market disruptions and swift structural alterations. The LSTM demonstrates a low MAPE; however, the heightened volatility may compromise the reliability of its predictions for trading decisions when used independently. The observed differences highlight the complementary characteristics of the models. The Dynamic Ensemble integrates the strengths of DNN and LSTM, resulting in enhanced accuracy and stability in predictions. This combination improves the reliability of trading signals obtained from the spread, thus facilitating more effective risk management and enhancing risk-adjusted performance in pairs trading strategies. The results underscore the significance of employing ensemble-based methods in cryptocurrency markets, characterized by nonlinear interactions and rapid temporal volatility, to ensure a balance between predictive precision and resilience to extreme fluctuations.

Figure 5

Line graph depicting the dynamic z-score of spread over a period from March 2024 to November 2025. It includes four lines: actual z-score (black), DNN prediction (blue), LSTM prediction (red), and dynamic ensemble (green). Fluctuations are visible, with peaks occurring in late 2024 and early 2025.

Figure 5. Model perfomance visualization.

Following the demonstration that the Dynamic Ensemble surpasses the base learners (DNN and LSTM), the subsequent phase involves the creation of pairs trading signals. Figure 6 presents the dynamic ensemble forecast score alongside the associated trading signals. The signals originate from the predicted dynamic score, which quantifies deviations from the long-term equilibrium of co-integrated cryptocurrency pairs. The figure illustrates the upper and lower percentile thresholds, accompanied by a zero reference line denoting the mean. Trading decisions rely on scores in relation to specific thresholds: a Buy Signal is activated when the score drops below the lower threshold, suggesting a statistically undervalued state likely to increase, while a Sell Signal is generated when the score surpasses the upper threshold, indicating overvaluation and a likely downward adjustment. In contrast to moving average-based trading methods [50], which presuppose relatively stable price behavior and struggle to account for time-varying dynamics, the proposed approach employs statistical deviations from the dynamically calculated spread. This facilitates adaptive trading that adjusts to fluctuating market conditions, especially in volatile cryptocurrency markets. Throughout the analyzed period, the algorithm produced 113 signals, comprising 81 winning trades (71.68%) and 32 losing trades (28.32%), thereby illustrating the strategy's efficacy in recognizing profitable opportunities. In comparison, Carta et al. [51] utilize explainable artificial intelligence (XAI) techniques, such as support vector regression, LightGBM, and random forest models, to demonstrate that trading strategies enhanced by feature selection yield predictive signals with increased informational content and diminished noise. The risk-return analysis indicates that XAI-powered strategies consistently surpass baseline techniques. The present ensemble-based method attains a comparable objective by combining various neural architectures to produce dependable, adaptive trading signals, while explicitly considering time-varying dynamics in the spread.

Figure 6

Line chart showing the Ensemble Dynamic Score from March 2024 to November 2025. Blue line represents the Z-score, with red dashed line indicating the upper 90th percentile and green dashed line indicating the lower 10th percentile. Green triangles mark buy signals, and red triangles mark sell signals. Peaks around early 2025 indicate frequent sell signals, while dips in late 2024 and mid-2025 show buy signals.

Figure 6. Trading signals on the test data.

Evaluating the spread-ensemble trading mechanism, which produces buy and sell signals upon the expected spread crossing designated upper and lower thresholds, is crucial for determining the reliability and robustness of these trading outcomes. Table 4 presents essential performance metrics that assess the effectiveness, risk profile, and overall impact of the ensemble system on the pairs trading strategy. The hit rate of 0.5821 signifies that the ensemble accurately identifies profitable opportunities in approximately 58% of instances, demonstrating a relatively reliable predictive ability. The average profit per trade of 0.0111 indicates that transactions generally produce positive returns, aligning with the anticipated outcomes for high-frequency or mean-reversion strategies. The maximum drawdown (MDD) of -0.2875 indicates the most substantial peak-to-trough decline in cumulative returns. Although significant, this figure remains within acceptable thresholds for statistically driven strategies. This conclusion is corroborated by Choi [52], who assert that drawdown-aware methodologies can decrease turnover compared to benchmark strategies. The risk-adjusted performance is robust, evidenced by a Sharpe Ratio of 1.3662 and a Sortino Ratio of 1.1411. The Sharpe Ratio demonstrates that returns significantly surpass the risk-free benchmark in relation to overall volatility, whereas the Sortino Ratio indicates effective management of downside risk. The average position size of 0.5113 indicates a moderate level of exposure, showcasing prudent risk management and the avoidance of excessive leverage. The estimated signal lag of 0.0000 days indicates the ensemble's responsiveness to real-time market developments. The observed negative correlation with the market (–0.6517) indicates that the ensemble trading strategy may function as a diversifying asset, with the potential to outperform broader market trends during periods of underperformance, as noted in Sigauke et al. [27]. Table 4 demonstrates that the ensemble approach produces statistically significant trading signals while achieving a favorable balance among profitability, risk management, and market responsiveness, thereby confirming its practical applicability in volatile cryptocurrency markets.

Table 4

Table 4. Risk performance for trading signals.

3.3 Real-time signal generation for statistical arbitrage and stability test

The generation of trading signals for the statistical arbitrage strategy is executed in real-time through the deployment of the DNN-LSTM-ensemble on the AWS cloud platform, ensuring a scalable and operationally viable environment. The algorithm continuously acquires cryptocurrency price data from https://www.binance.com/, as illustrated in the Python script in Listing 1, and updates the dynamically calculated spread in real time. The dynamic score predicted by the ensemble is recalculated with each new observation, activating trade alerts upon threshold breaches. This configuration replicates a live market feed, facilitating the evaluation of the system's responsiveness, accuracy, and robustness in streaming data scenarios. Table 5 and Figure 7 demonstrate the behavior of the ensemble in real time. Each entry documents the date index, the anticipated dynamic score, and the associated trading action—BUY, SELL, or HOLD. At Date 223, a notable negative deviation of −4.7369 is recorded; however, no trade is executed. This emphasizes the model's dependence on empirically determined percentile thresholds rather than solely on absolute magnitude. The SELL signals are activated on Dates 225–227, with dynamic scores of 2.2797, 2.6762, and 1.8533, indicating statistically overbought conditions and a projected reversion to the mean. In instances of intermediate or extreme negative deviations (e.g., Dates 223, 228–230), the model generates HOLD signals, reflecting a conservative strategy that reduces the likelihood of false positives. The results demonstrate that the ensemble system effectively differentiates actionable trading opportunities from transient volatility, facilitating precise and timely execution of statistical arbitrage strategies. The integration of real-time dynamic scoring with percentile-based thresholding improves the reliability of trading signals, thereby reinforcing the model's practical applicability in active cryptocurrency markets.

Table 5

Table 5. Ensemble dynamic-scores and trading actions.

Figure 7

Line graph titled “Real-Time Ensemble Signals at 230” showing a green line for dynamic ensemble data fluctuating over time. Red dashed line indicates the upper percentile, and blue dashed line indicates the lower percentile. Red and blue triangles mark points above and below these percentiles, respectively.

Figure 7. Assessment of dynamic weighted ensemble real-time prediction.

Finally, the study assesses the predictive uncertainty of the DNN-LSTM-ensemble model, which is essential for evaluating its capacity to accurately forecast the spread between the chosen cointegrated cryptocurrency pairs and to ascertain its effectiveness in capturing potential deviations. The model's 99% Prediction Interval Width (PIW) offers insights into the reliability and uncertainty of forecasts. During the test period, the average PIW is 0.0772, with a minimum value of 0.0232 and a maximum of 0.3094. This suggests that the model typically generates narrow intervals, demonstrating a high level of confidence in its predictions. The concluding PIW value of 0.0337 at the period's end indicates that the ensemble sustains a relatively narrow prediction range, despite market fluctuations. The results indicate that the model effectively captures anticipated variations in the spread while adjusting to periods of heightened volatility. Figure 8 illustrates the predicted spread with the 99% confidence intervals represented in light blue. The observed spread values primarily align with these bands, validating the model's capacity to represent both central tendencies and tail behavior. This alignment demonstrates the ensemble's capacity to accurately characterize the distributional properties of the spread, including extreme deviations, which is crucial for risk-sensitive trading strategies. The stable and consistent out-of-sample forecasts generated by the DNN-LSTM-ensemble demonstrate its appropriateness for financial applications, where precise predictions of extreme movements and predictive reliability are essential. The findings align with Amnuaypongsa et al. [53], which showed that penalizing larger prediction interval widths in multi-horizon forecasts enhances predictive confidence and decreases the probability of excessively broad intervals, despite their focus on renewable energy forecasting. The findings underscore the utility of the ensemble model in statistical arbitrage and dynamic pairs trading within volatile cryptocurrency markets.

Figure 8

Line chart showing the ensemble of real-time prediction score over time from 2025-10-15 to 2026-01-01. A blue shaded area represents the ninety-nine percent prediction interval. Red markers indicate sell signals, green markers indicate buy signals, and yellow markers indicate hold signals. Red dashed lines denote the upper and lower control limits.

Figure 8. Assessment of dynamic weighted ensemble stability.

4 Conclusion and recommendations

This study establishes a statistical arbitrage application driven by deep learning for dynamically co-integrated cryptocurrency pairs, demonstrating that predictive modeling significantly improves spread forecasting and signal reliability. The dynamic lagged Johansen procedure is employed to identify assets exhibiting non-stable long-run equilibrium behavior, as markets are not stable, especially the cryptocurrency, facilitating the construction of a spread that underpins forecasting and trade signal generation. The comparative analysis of the DNN, LSTM, and the proposed Dynamic Ensemble demonstrates distinct performance differences, where the Dynamic Ensemble demonstrates superior magnitude-based predictive accuracy, achieving the lowest MSE of 0.012124, RMSE, which is found to be 0.110108, and MAE of 0.083607, in addition to exhibiting the least forecast bias. The results support the benefits of combining complementary architectures for modeling nonlinear and temporally complex spread behavior. The LSTM, on the other hand, exhibits the lowest MAPE of 1.490429%, indicating its effectiveness in capturing proportional temporal dynamics, while the DNN shows benchmark-efficient forecasting with the lowest Theil's U value of 0.371179. These outcomes demonstrate that the models exhibit unique strengths: the LSTM is superior in relative percentage forecasting, the DNN is effective in benchmark-scaled scenarios, and the ensemble successfully integrates both attributes to achieve enhanced point-forecast accuracy and minimized bias; which [54] in their study also has found that combined forecasting using the stacking ensemble improves the accuracy of the base learners. The improved predictive capacity results in enhanced trading behavior, which is further evidenced by the ensemble's real-time signal accuracy, low-lag execution, and stable 99% prediction intervals, indicating that most realized spread values are contained within the prediction interval bounds, reflecting effective uncertainty management, which is crucial in high-volatility cryptocurrency markets. These results are further reported by Makatjane and Shoko [55] in their study of explainable deep learning for financial risk. Though these authors examined the issue of underestimating tail risks in volatile cryptocurrency markets, particularly the Bitcoin/USD exchange rate, where models such as generalized autoregressive conditional heteroscedasticity (GARCH) frequently fail to predict extreme losses accurately, their exponentially smoothed recurrent neural network (ESRNN) at the 99% confidence level, effectively predicts the losses, indicating that the ESRNN yields dependable and precise Risk estimates in high-volatility markets, while with the current study, the findings indicate that deep learning, especially when integrated with dynamic weighting, significantly enhances the reliability and operational effectiveness of statistical arbitrage strategies.

Despite these impressive results, significant limitations remain. The empirical research looks at numerous cryptocurrency pairs, namely: BNB, Ethereum, Litecoin, Ripple, and USDT, but the conclusions are mostly limited to this group, which restricts generalisability until subsequently validated across more assets and marketplaces. Transaction costs, liquidity limits, and execution frictions are not explicitly represented; therefore, real profitability may vary from simulation results. The uncertainty analysis is limited to prediction interval widths and excludes higher-order tail-risk measures like Conditional Value-at-Risk (CVaR). Furthermore, the modeling structure does not include feature selection or explainable AI approaches, which could have increased interpretability and decreased structural noise. Therefore, future studies should incorporate a broader variety of cryptocurrencies and conventional asset classes to assess resilience across different market environments. Explainable AI techniques, as proposed by Carta et al. [51], will improve transparency and feature relevance when included, and this will enhance the model's capacity and prediction, and further indicate which features are more important in the prediction and their ability to influence the trading system, as markets are not stable in real time. Furthermore, including transaction-cost modeling, liquidity filters, and stress testing would improve the link between theoretical performance and real execution. Investigating hybrid or transformer-based architectures, attention-augmented recurrent models, and reinforcement learning-based trading algorithms may also help to increase predicted accuracy and trading performance. Finally, including extreme-risk indicators in uncertainty quantification would allow for a more thorough evaluation of risk-return characteristics in statistical arbitrage applications, and all of these will be studied elsewhere.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

JT: Conceptualization, Validation, Writing – review & editing, Methodology, Project administration. KM: Data curation, Formal analysis, Investigation, Visualization, Writing – original draft, Conceptualization.

Funding

The author(s) declared that financial support was not received for this work and/or its publication.

Acknowledgments

The authors are grateful to the numerous people who have provided helpful comments on this paper.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fams.2026.1749337/full#supplementary-material

References

1. Sun Y. Performance of pairs trading strategies based on various copula methods. J Risk Financ Manag. (2025) 18:506. doi: 10.3390/jrfm18090506

Crossref Full Text | Google Scholar

2. Naicker S. Evaluation of the performance of a pairs trading strategy of JSE-listed firms. University of the Witwatersrand, Johannesburg (South Africa). (2015). Available online at: https://www.proquest.com/dissertations-theses/evaluation-performance-pairs-trading-strategy-jse/docview/3159354432/se-2?accountid=15490 (Accessed November 13, 2025).

Google Scholar

3. Gatev E, Goetzmann WN, Rouwenhorst KG. Pairs trading: performance of a relative-value arbitrage rule. Rev Financ Stud. (2006) 19:797–827. doi: 10.1093/rfs/hhj020

Crossref Full Text | Google Scholar

4. Sivasamy R, Sharma DK, Sediakgotla, Mokgweetsi B. Machine learning algorithmic model for pairs trading. In:Sharma DK, Hota HS, Rasheed Rababaah A., , editors. Machine Learning for Real World Applications, Singapore: Springer Nature Singapore. (2024). p. 79–95. doi: 10.1007/978-981-97-1900-6_5

Crossref Full Text | Google Scholar

5. Tsoku JT, Moroke ND. Pairs trading in the JSE financial sector. J Stat Manag Syst. (2018) 21:877–99. doi: 10.1080/09720510.2018.1467647

Crossref Full Text | Google Scholar

6. Wong WK, Chow SC, Hon TY, Woo KY. Empirical study on conservative and representative heuristics of Hong Kong small investors adopting momentum and contrarian trading strategies. Int J Revenue Manag. (2018) 10:146–67. doi: 10.1504/IJRM.2018.091836

Crossref Full Text | Google Scholar

7. Ti YW, Dai TS, Wang KL, Chang HH, Sun YJ. Improving co-integration-based pairs trading strategy with asymptotic analyses and convergence rate filters. Comput Econ. (2024) 64:2717–45. doi: 10.1007/s10614-023-10539-4

Crossref Full Text | Google Scholar

8. Grosu M, Mihalciuc CC, Maha LG, Apostol C. Assessing the resilience of the financial market multistage approach in the context of the COVID-19 pandemic. Eastern Eur Econ. (2025) 63:428–65. doi: 10.1080/00128775.2024.2312109

Crossref Full Text | Google Scholar

9. He F, Yarahmadi A, Soleymani F. Investigation of multivariate pairs trading under a copula approach with a mixture distribution. Appl Math Comput. (2024) 472:128635. doi: 10.1016/j.amc.2024.128635

Crossref Full Text | Google Scholar

10. Flos M, François B, Schicker I, Whan K, Perrone E. COBASE: a new copula-based shuffling method for ensemble weather forecast postprocessing. arXiv preprint arXiv:2510.25610. (2025).

Google Scholar

11. Qian L, Zhao Y, Yang J, Li H, Wang H, Bai C. A new estimation method for copula parameters for multivariate hydrological frequency analysis with small sample sizes. Water Resour Manag. (2022) 36:1141–57. doi: 10.1007/s11269-021-03016-w

Crossref Full Text | Google Scholar

12. de Castro Quadros AV. Statistical Methods with Applications to Pairs Trading and Equipment Lifetime Modelling. Doctoral dissertation, Kansas State University. (2025). Available online at: https://www.proquest.com/dissertations-theses/statistical-methods-with-applications-pairs/docview/3214110755/se-2 (Accessed December 28, 2025).

Google Scholar

13. Mohammed AE, Mwambi H, Omolo B. Time-varying correlations between JSE. JO stock market and its partners using symmetric and asymmetric dynamic conditional correlation models. Stats. (2024) 7:761–776. doi: 10.3390/stats7030046

Crossref Full Text | Google Scholar

14. Faizullin R. Problems of applying cointegrated pairs for pairs trading. In: AIP Conference Proceedings. (2024). p. 050007. doi: 10.1063/5.0193214

Crossref Full Text | Google Scholar

15. Eroǧlu BA Miller JI Yiǧit T. Time-varying co-integration and the Kalman filter. Econ Rev. (2022) 41:1–21. doi: 10.1080/07474938.2020.1861776

Crossref Full Text | Google Scholar

16. Rayaprolu A. An empirical assessment of pairs trading using ensemble Q-learning. Int Res J Modernis Eng Technol Sci. (2025) 7:2152–64.

Google Scholar

17. Rotondi F, Russo F. Machine learning for pairs trading: a clustering-based approach. Available at SSRN 5080998. (2024). doi: 10.2139/ssrn.5080998

Crossref Full Text | Google Scholar

18. Liou JH, Liu YT, Cheng LC. Price spread prediction in high-frequency pairs trading using deep learning architectures. Int Rev Finan Anal. (2024) 96:103793. doi: 10.1016/j.irfa.2024.103793

Crossref Full Text | Google Scholar

19. Ferrouhi EM, Bouabdallaoui I. A comparative study of ensemble learning algorithms for high-frequency trading. Sci African. (2024) 24:e02161. doi: 10.1016/j.sciaf.2024.e02161

Crossref Full Text | Google Scholar

20. Keshavarz Haddad G, Talebi H. The profitability of pair trading strategy in stock markets: evidence from the Toronto Stock Exchange. Int J Finan Econ. (2023) 28:193–207. doi: 10.1002/ijfe.2415

Crossref Full Text | Google Scholar

21. Lo AW. The adaptive markets hypothesis: market efficiency from an evolutionary perspective. J Portfolio Manag. (2004) 30:15–29. doi: 10.3905/jpm.2004.442611

Crossref Full Text | Google Scholar

22. Kellner CA. Statistical arbitrage of stock index options: Testing the adaptive market hypothesis. Doctoral dissertation, Anderson University, Indiana. (2023). Available online at: https://www.proquest.com/dissertations-theses/statistical-arbitrage-stock-index-options-testing/docview/2861048759/se-2 (Accessed December 28, 2025).

Google Scholar

23. Tadi M, Witzany J. Copula-based trading of cointegrated cryptocurrency pairs. Financ Innov. (2025) 11:40. doi: 10.1186/s40854-024-00702-7

Crossref Full Text | Google Scholar

24. Ko P-C, Lin P-C, Do H-T, Kuo Y-H, Mai LM, Huang Y-F. Pairs trading in cryptocurrency markets: a comparative study of statistical methods. Invest Anal J. (2024) 53:102–19. doi: 10.1080/10293523.2023.2268386

Crossref Full Text | Google Scholar

25. Xiang Y, Zhang L, Chen H. Cryptocurrency assets valuation prediction based on LSTM, neural network, and a deep learning hybrid model. In: EWADirect Proceedings. (2024). doi: 10.54254/2755-2721/49/20241346

Crossref Full Text | Google Scholar

26. Seabe PL, Moutsinga CRB, Pindza E. Forecasting cryptocurrency prices using LSTM, GRU, and Bi-directional LSTM: a deep learning approach. Fractal Fract. (2023) 7:203. doi: 10.3390/fractalfract7020203

Crossref Full Text | Google Scholar

27. Sigauke C, Moroke N, Makatjane K, Shoko C. A deep learning forecasting of downside risk: application of a combined ESRNN-VAE. Front Appl Mathem Statist. (2025) 11:1662252. doi: 10.3389/fams.2025.1662252

Crossref Full Text | Google Scholar

28. Hao Z, Zhang H, Zhang Y. Stock portfolio management by using fuzzy ensemble deep reinforcement learning algorithm. J Risk Finan Manag. (2023) 16:201. doi: 10.3390/jrfm16030201

Crossref Full Text | Google Scholar

29. Krauss C. Statistical arbitrage pairs trading strategies: review and outlook. J Econ Surv. (2017) 31:513–45. doi: 10.1111/joes.12153

Crossref Full Text | Google Scholar

30. Do B, Faff R. Does simple pairs trading still work? Finan Anal J. (2010) 66:83–95. doi: 10.2469/faj.v66.n4.1

Crossref Full Text | Google Scholar

31. Vidyamurthy G. Pairs Trading: Quantitative Methods and Analysis. New York: John Wiley and Sons. (2004).

Google Scholar

32. Huck N. Pairs trading with machine learning. Quant Finan. (2019) 19:467–82.

Google Scholar

33. Sirignano J, Cont R. Universal features of price formation in financial markets: perspectives from deep learning. Quantit Finan. (2019) 19:1449–59. doi: 10.1080/14697688.2019.1622295

Crossref Full Text | Google Scholar

34. Osigbemeh M, Azubogu A, Ayomoh M, Okahu A. Efficacy of two hidden layers artificial neural network synapticity for deep learning: a case of pattern recognition. J Appl Artif Intell. (2025) 6:24–38. doi: 10.48185/jaai.v6i1.1408

Crossref Full Text | Google Scholar

35. Awad M, Khanna R. Deep neural networks. In: Efficient Learning Machines, Apress. (2015). p. 123–142. doi: 10.1007/978-1-4302-5990-9

Crossref Full Text | Google Scholar

36. Jawad E. The deep neural network -A review. IJRDO J Mathem. (2023) 9:1–5. doi: 10.53555/m.v9i9.5842

Crossref Full Text | Google Scholar

37. Fan C, Zhang N, Jiang B, Liu WV. Using deep neural networks coupled with principal component analysis for ore production forecasting at open-pit mines. J Rock Mechan Geotechn Eng. (2024) 16:727–40. doi: 10.1016/j.jrmge.2023.06.005

Crossref Full Text | Google Scholar

38. Shoko C, Moroke N, Makatjane K. A deep learning framework for modelling temporal dependencies and hierarchies in hourly electricity demand load. In:Khan MNA, Elhassan AM, , editors. Machine learning and computer vision for renewable energy, IGI Global. (2024). doi: 10.4018/979-8-3693-2355-7.ch003

Crossref Full Text | Google Scholar

39. Amnuaypongsa W, Songsiri J. Large-width penalisation for neural network-based prediction interval estimation. arXiv preprint arXiv:2411.19181. (2024).

Google Scholar

40. Nikulchev E, Chervyakov A. Prediction intervals: a geometric view. Symmetry. (2023) 15:781. doi: 10.3390/sym15040781

Crossref Full Text | Google Scholar

41. Aich A, Aich AB, Jain DC. Temporal conformal prediction (TCP): a distribution-free statistical and machine learning framework for adaptive risk forecasting. arXiv preprint arXiv:2507.05470. (2025).

Google Scholar

42. Yang Z, Candés EJ, Lei L. Bellman conformal inference: calibrating prediction intervals for time series. ArXiv, abs/2402.05203. (2024).

Google Scholar

43. Makatjane K. Forecasting uncertainty intervals for the return period of extreme daily electricity consumption. Int J Energy Econ Policy. (2022) 12:217. doi: 10.32479/ijeep.12901

Crossref Full Text | Google Scholar

44. Franses PH. Time-varying lag co-integration. J Comput Appl Math. (2021) 390:113272. doi: 10.1016/j.cam.2020.113272

Crossref Full Text | Google Scholar

45. Chen K, Alexiou C. Co-integration-based pairs trading: identifying and exploiting similar exchange-traded funds. J Asset Manag. (2025) 26:464–88. doi: 10.1057/s41260-025-00416-0

Crossref Full Text | Google Scholar

46. Sun Y. A survey of statistical arbitrage pair trading with machine learning, deep learning, and reinforcement learning methods. Working paper, Faculty of Economic Science, University of Warsaw, 2025-22 (485). (2025). Available online at: https://www.wne.uw.edu.pl/application/files/5617/5819/7786/WNE_WP485.pdf (Accessed October 30, 2025).

Google Scholar

47. Monge M, Hurtado R, Infante J. Time trends and persistence of the return difference between growth and value investment strategies. PLoS ONE. (2025) 20:e0332690. doi: 10.1371/journal.pone.0332690

PubMed Abstract | Crossref Full Text | Google Scholar

48. Almasri AR, Yahaya NA, Abu-Naser SS. Predicting instructor performance in higher education using stacking and voting ensemble techniques. J Theor Appl Inf Technol. (2025) 103:1–10.

Google Scholar

49. Khoshkroodi A, Parvini Sani H, Aajami M. Stacking ensemble-based machine learning model for predicting deterioration components of steel w-section beams. Buildings. (2024) 14:240. doi: 10.3390/buildings14010240

Crossref Full Text | Google Scholar

50. ACY Securities - Market Analysis and Education Team. Using Moving Averages for Trading: A Comprehensive Guide. (2023). Available online at: https://acy.com/en/market-news/education/using-moving-averages-for-trading-a-comprehensive-guide-162923/ (Accessed November 20, 2025).

Google Scholar

51. Carta S, Consoli S, Podda AS, Recupero DR, Stanciu MM. Statistical arbitrage powered by Explainable Artificial Intelligence. Expert Syst Applic. (2022) 206:117763. doi: 10.1016/j.eswa.2022.117763

Crossref Full Text | Google Scholar

52. Choi J. Maximum drawdown, recovery, and momentum. J Risk Finan Manag. (2021) 14:542. doi: 10.3390/jrfm14110542

Crossref Full Text | Google Scholar

53. Amnuaypongsa W, Wangdee W, Songsiri J. Neural network-based prediction interval estimation with large-width penalisation for renewable energy forecasting and system applications. Energy Conv Manag. (2025) 27:101119. doi: 10.1016/j.ecmx.2025.101119

Crossref Full Text | Google Scholar

54. Shoko C, Sigauke C, Makatjane K. An application of ensemble stacking in machine learning to predict short-term electricity demand in South Africa. Stat Optim Inf Comput. (2025) 13:2412–33. doi: 10.19139/soic-2310-5070-2170

Crossref Full Text | Google Scholar

55. Makatjane K, Shoko C. Explainable deep learning for financial risk: joint VaR and ES forecasting using ESRNN in the Bitcoin market. African Finan J. (2025) 27:53–69.

Google Scholar

Appendix

Listing 1. Python script for fetching real-time cryptocurrency prices from Binance API

Keywords: crytpto prices, hybrid forecasting, neural networks, risk metrics, volatility

Citation: Tsoku JT and Makatjane K (2026) Deep learning-based pairs trading: real-time forecasting of co-integrated cryptocurrency pairs. Front. Appl. Math. Stat. 12:1749337. doi: 10.3389/fams.2026.1749337

Received: 18 November 2025; Revised: 29 December 2025;
Accepted: 12 January 2026; Published: 30 January 2026.

Edited by:

Yu Mu, Libero Capital, United States

Reviewed by:

Youssri Hassan Youssri, Egypt Uinversity of Informatics, Egypt
Shafeeq Ur Rahaman, Monks, United States

Copyright © 2026 Tsoku and Makatjane. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Johannes Tshepiso Tsoku, Sm9oYW5uZXMuVHNva3VAbnd1LmFjLnph

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.