Quantum Prisoner’s Dilemma and High Frequency Trading on the Quantum Cloud

Khan, Faisal Shah; Bao, Ning

doi:10.3389/frai.2021.769392

BRIEF RESEARCH REPORT article

Front. Artif. Intell., 03 November 2021

Sec. AI in Finance

Volume 4 - 2021 | https://doi.org/10.3389/frai.2021.769392

This article is part of the Research TopicQuantum Finance and Quantum Trading SystemsView all 4 articles

Quantum Prisoner’s Dilemma and High Frequency Trading on the Quantum Cloud

Faisal Shah Khan¹*

Ning Bao²

¹Dark Star Quantum Lab, Raliegh, NC, United States
²Computational Science Initiative, Brookhaven National Lab, Upton, NY, United States

High-frequency trading (HFT) offers an excellent use case and a potential killer application of the commercially available, first generation quasi-quantum computers. To this end, we offer here a simple game-theoretic model of HFT as the famous two player game, Prisoner’s Dilemma. We explore the implementation of HFT as an instance of Prisoner’s Dilemma on the (quasi) quantum cloud using the Eisert, Wilkens, and Lewenstein quantum mediated communication protocol, and how this implementation can not only increase transaction speed but also improve the lot of the players in HFT. Using cooperative game-theoretic reasoning, we also note that in the near future when the internet is properly quantum, players will be able to achieve Pareto-optimality in HFT as an instance of reinforced machine learning.

1 Introduction

Non-cooperative game theory is the art of strategic interaction between individuals competing for joint stakes over which they entertain differing preferences. Game-theoretic reasoning can formally be traced back the Ancient Chinese General Sun Tzu (circa 500 BCE) and the ancient Indian minister, Chanakya (circa 300 BCE).

Mathematical formalization of non-cooperative game theory in the 20th century goes back to the work of von Neumann and Nash. The publication the seminal work of von Neumann and Morgenstern titled Theory of Games and Economic Behavior (von Neumann et al., 1944) brought focus upon game theory as the right mathematical language to analyze economic behavior and strategic decision making. The practical usefulness of the subject was made apparent by the awarding of several Noble prizes in Economics to developers of game-theoretic reasoning, including Nash (Nash, 1950), Harsanyi (Harsanyi, 1968), Selten (Selten, 1994), Aumann (Robert, 2005), and Smith (Maynard Smith and Price, 1973) for work in applications of game-theoretic reasoning to economics, political stratagem, and evolutionary biology. With the ongoing Covid-19 pandemic, game theoretic reasoning has also been used to shed light on best practices in developing optimal public health policy (Elgazzar, 2021).

With the recent advent of commercially viable quantum computation and communication technologies, the confluence of ideas from game theory and quantum information processing has gained strong interest. This interest has given birth to the subject known as quantum game theory (Section 3), where the impact of quantum information technology on game-theoretic reasoning is studied. An area where quantum game theory may be of particular interest is the area of high-frequency trading. Here, many players participate in iterated buy/sell interactions at a very high rate, capitalizing on small market fluctuations in either duration, intensity, or both to gain revenue. Because the timing of such interactions is critical to the success of high-frequency firms compared to firms that trade at slower rates, new tools that can improve the degree of synchronicity between the firms and which can provide provably-secure communication, are of great interest.

2 Prisoner’s Dilemma - A Game Theory Primer

Consider the non-cooperative game called Prisoner’s Dilemma, a 2-player non-cooperative game in which each of the two players (prisoners) who committed a crime together are given the opportunity to reduce their time served in prison by helping authorities implicate the other player for the crime. This game is presented in tabular form in Figure 1 where the outcomes of the game are given as ordered pairs of numbers. The first number in each outcome is the payoff to Player I in the form of the number of years commuted from his sentence, and the second number is the payoff to Player II.

FIGURE 1

FIGURE 1. Prisoner’s dilemma.

The players have disparate preferences over the outcomes of the game, which are captured below using the symbol ≻ to denote the notion of “preferred over”:

\begin{aligned} P l a y e r I : (5,0) ≻ (3,3) ≻ (1,1) ≻ (0,5) \\ P l a y e r I I : (0,5) ≻ (3,3) ≻ (1,1) ≻ (5,0) . \end{aligned} (1)

It is assumed that the players are rational, that is, each player will play the game in way that is consistent with his preferences. The game is played by employing strategies to optimize the payoffs. The two strategies available to both players are to either cooperate with the authorities to implicate the other player (C), or to defect from offer to help the authorities (D). The question is: what is the outcome of the game (or the play of the game)?

The answer is provided in the form of Nash equilibrium, a profile of strategies, one per player, in which no player has motivation to deviate from his strategic choice. In other words, Nash equilibrium is a strategy profile in which each player’s strategy is a best reply (with respect to the players’ preferences) to all others. Not all games have a Nash equilibrium.

For Prisoners’ Dilemma, Figure 2 shows that the Nash equilibrium is the strategy profile (D, D). This is the dilemma; for clearly, each player will be better off playing the strategy C, but this is not a best reply to the strategic choice of C by the other player. The strategy profile (C, C) (and its corresponding outcome) is Pareto-optimal, that is, its corresponding outcome is such that moving away from it to make one player better off will necessarily make another player worse off. Note that the strategy profiles (C, D) and (D, C) are also Pareto-optimal; however, no player wishes to complete her full sentence while her partner in crime walks free [as evidenced by the preference relations in expression (1)].

FIGURE 2

FIGURE 2. Nash equilibrium versus Pareto-optimal outcomes in Prisoner’s Dilemma.

2.1 Mixed Strategies and Mediated Communication

When Nash equilibrium is not present in a game, or if it is sub-optimal, game-theorists suggest that players employ randomization over the outcomes as a mechanism for introducing or improving Nash equilibrium. To this end, players are allowed to independently randomize over their respective strategies, a notion referred to as mixed strategies, to produce probability distributions over the outcomes. The resulting mixed game will have at least one Nash equilibrium outcome (John Nash’s Nobel prize winning result Nash, 1950). However, this mixed strategy Nash equilibrium need not be better paying than the one available in the original game, and it need not be Pareto-optimal. Indeed, this holds true for Prisoner’s Dilemma.

Further refinement of the Nash equilibrium may be possible if a referee is inducted into the game at negligible cost. This proper extension of a game is know as the game with mediated communication. In such games, the referee creates a probability distribution over the outcomes of the game that the players could not using mixed strategies. The referee then tells each player in confidence which strategy he should employ. Each player than checks the viability of the referee’s advice with respect to his preferences and the 50–50 chance of the other player agreeing to the advice given to him by the referee. If the viability checks out, the player agrees with the referee. When both players agree to the referees advice, the resulting Nash equilibrium is known as a correlated equilibrium.

Even further refinements of Nash equilibrium are conceivable by simply extending the domain of the game from Euclidean space to more exotic (and non-trivial) mathematical spaces such as Hilbert space and Banach space. The challenge then becomes how to keep the mathematical extensions grounded in physical reality. For the case of games extended to complex projective Hilbert space, the physical context is quantum mechanics. The result of this extension is the theory of “quantum games.”

3 Quantum Games

Foreseeing the rise of quantum technologies like quantum computers and quantum communication devices and protocols, Meyer offered the first game-theoretic model of quantum algorithms. In his seminal work (Meyer, 1999) on the topic, he showed that in a simple penny flipping game, the player with access to quantum physical operations (or “quantum strategies”) acting on the penny always won the game. His work was followed by Eisert et al.’s work (Eisert et al., 1999) where the authors showed how to properly extend a game into the quantum realm with quantum mediated communication. These authors presented a two qubit (two player) quantum circuit that implemented the quantum communication protocol for Prisoner’s Dilemma. This protocol is known as the EWL protocol and appears in Figure 3.

FIGURE 3

FIGURE 3. The quantum circuit implementation of the EWL quantum game protocol. The referee consists of two quantum logic gates, J, which entangles the two qubits, and its inverse, J^†. In the middle of these two operations are the players’ independent quantum strategic choices that each of them enacts on her qubit as unitary operations. We assume the top qubit is Player I’s and the second one is Player II’s.

The EWL protocol is a quantum circuit that takes in as input the two-qubit state

| 00 〉 = (\begin{matrix} 1 \\ 0 \\ 0 \\ 0 \end{matrix}), (2)

With each qubit belonging to one player. This state is acted upon by the referee to produce a higher-order randomization in the form of a quantum superposition followed by measurement. In particular, the referee entangles the two qubits using a general entangling operator

J (γ) = \cos \frac{γ}{2} I \otimes I + i \sin \frac{γ}{2} σ_{x} \otimes σ_{x} (3)

where I is the 2 × 2 identity operator, σ_x is the Pauli-spin flip operator, and $0 \leq γ \leq \frac{π}{2}$ . When γ = 0, the protocol reproduces the original “classical” game.

For $γ = \frac{π}{2}$ , the game exhibits maximal entanglement between the qubits and the remarkable features discussed below. For this value of γ,

J = (\begin{matrix} \frac{1}{\sqrt{2}} & 0 & 0 & \frac{i}{\sqrt{2}} \\ 0 & \frac{1}{\sqrt{2}} & \frac{i}{\sqrt{2}} & 0 \\ 0 & \frac{i}{\sqrt{2}} & \frac{1}{\sqrt{2}} & 0 \\ \frac{i}{\sqrt{2}} & 0 & 0 & \frac{1}{\sqrt{2}} \end{matrix}) . (4)

Therefore,

J | 00 〉 = \frac{1}{\sqrt{2}} (| 00 〉 + i | 11 〉) = (\begin{matrix} \frac{1}{\sqrt{2}} \\ 0 \\ 0 \\ \frac{i}{\sqrt{2}} \end{matrix}) . (5)

The referee forwards the state in Eq. 5 to the players as her advice upon which the players can act with their quantum strategies. Finally, the referee disentangles the resulting two qubit state and makes a measurement, producing a probability distribution over the outcomes of the game (the observable states) from which expected payoffs to the players can be computed. Since the probability distribution was created using higher-order randomization by quantum superpositioning, the correlations it creates between the outcomes of the game after measurement are stronger than those possible classically (Shimamura et al., 2004).

3.1 (Almost) Solving the Dilemma

The remarkable implication of the EWL protocol for Prisoner’s Dilemma is that under the right subset of quantum strategies, this quantum extension of the game eliminates the dilemma and the resulting Nash equilibrium is Pareto-optimal! The quantum strategies that allow this are the two-parameter subset of the set of one qubit gates:

A ≔ \{(\begin{matrix} e^{i ϕ} \cos θ & \sin θ \\ - \sin θ & e^{- i ϕ} \cos θ \end{matrix}) : 0 \leq θ \leq \frac{π}{2}, 0 \leq ϕ \leq \frac{π}{2}\} . (6)

However, when the full set of quantum strategies is made available to the players (Flitney and Hollenberg, 2007), that is,

B ≔ \{(\begin{matrix} e^{i α} \cos θ & e^{i β} \sin θ \\ - e^{- i β} \sin θ & e^{- i α} \cos θ \end{matrix}) : 0 \leq θ \leq \frac{π}{2},, - π \leq α, β \leq π\}, (7)

The dilemma reappears in the quantum version of the game and the Nash equilibrium solution is the same as of the classical game. This is because a best reply to a quantum strategy from set A is a quantum strategy from set B. But now, the other player also responds with a quantum strategy from set B, thus nullifying the quantum solution to the dilemma.

Note that while the EWL “quantum” Prisoner’s Dilemma is a game with quantum mediated communication, the equilibrium in the game is referred to as Nash equilibrium rather than correlated equilibrium. This is because mediated communication attempts to produce randomization over a game’s outcomes that cannot be produced by the player’s mixed strategies only, and therefore, one can view the game with mediated communication as an enlargement of the mixed game. A Nash equilibrium in this larger non-cooperative game is what is called a correlated equilibrium in the original game.

As such, Nash equilibrium in the EWL game with quantum mediated communication is a type of correalted equilibrium in the original Prisoner’s Dilemma. A discussion on the relationship between Nash and correlated equilibrium for classical games can be found in (Steven, 2009), while a more comprehensive discussion about these two notions in quantum games can be found in (Szopa, 2021).

Emulating mixed strategies, a further natural quantum extension is possible by allowing players to randomize over their quantum strategies, giving rise to the notion of mixed quantum strategies. Eisert et al. showed that while the players cannot solve the dilemma by resorting to mixed quantum strategies in Prisoner’s Dilemma, they can come close to it. By using mixed quantum strategies, the players can affect a Nash equilibrium in which the payoff is $(2.5, 2.5)$ . This solution is closer to the Pareto-optimal outcome (3, 3) than the sub-optimal outcome (1, 1). Mixed strategies have a realistic physical interpretation as the result of quantum strategies being transmitted over a noisy communication channel.

Motivated by the results of the seminal works of Meyer and Eisert et al., quantum game theory has become a major area of research since the seminal papers of Meyer and Eisert et al. A relatively recent and comprehensive review of the subject can be found in (Khan et al., 2018).

4 The Dilemma in High Frequency Trading

High-frequency trading (HFT) is defined by Gomber et al. in (Gomber et al., 2011) as follows.

HFT relates to the implementation of proprietary trading strategies by technologically advanced market participants…. HFT enable market participants to dramatically speed up the reception of market data, internal calculation procedures, order submission and reception of execution confirmations.

Our aim here is to show that quantum computing via the cloud can be used to implement HFT as a quantum game. For this, first note that HFT is an instance of Prisoner’s Dilemma where Player I and Player II represent the trading mindset of a market, buying and selling of commodities using the two strategies Buy or Sell. Assuming that in markets there is a preference toward being part of a mass-buy versus a mass-sell, we set the following preferences for the players over the four possible strategy profiles as reasonably reflecting the mood of any market,

\begin{aligned} P l a y e r I : (S e l l, B u y) ≻ (B u y, B u y) ≻ (S e l l, S e l l) ≻ (B u y, S e l l) \\ P l a y e r I I : (B u y, S e l l) ≻ (B u y, B u y) ≻ (S e l l, S e l l) ≻ (S e l l, B u y), \end{aligned} (8)

with a player most preferring to sell on his terms versus buying on the other payers terms.

These preferences are identical to those in Prisoners’ Dilemma when the numerical payoff values from expression (1) are faithfully substituted into expression (8). Figure 4 shows HFT as an instance of Prisoners’ Dilemma. Note that the dilemma in HFT is that the game will reach the sub-optimal Nash equilibrium (Sell, Sell) = (1, 1), which is a highly detrimental outcome for markets.

FIGURE 4

FIGURE 4. High-frequency trading as an instance of Prisoners’ Dilemma, as per the preferences described in expression (8).

4.1 HFT on the Quantum Cloud

Today, the internet is quasi-quantum, meaning that users can access third party, first generation quantum processors via the cloud (the quantum cloud), which can offer transnational speed up. More importantly, the quasi-quantum internet can offer enhanced payoffs in the transaction when implemented using the EWL protocol for Prisoner’s Dilemma.

Due to the quasi-quantum nature of the internet, only noisy quantum communications are possible to date. Therefore, the referee will likely only be able to create limited entanglement between the qubits. This means that HFT on the quantum cloud will improve the lot of the players to only a near Pareto-optimal Nash equilibrium, the upper-limit of which for the moment is the appropriate equivalent of the notional (2.5, 2.5) payoff. Nonetheless, even these small improvement in the payoffs will be worthwhile given the large amounts of money being traded.

In the near future, the internet will be fully quantum, and improved fidelity of the transmission of the quantum information will mean that quantum entanglement between the players’ qubits will be maintained for longer duration. This will allow the realization of the upper limit of the mixed quantum strategy Nash equilibrium, (2.5, 2.5).

4.2 Optimality and Cooperation in HFT on the Quantum Cloud

From a non-cooperative game theory perspective, the pure quantum strategy Nash equilibrium that resolves the dilemma and produces the Pareto-optimal Nash equilibrium (3, 3) is fundamentally irrational. This is due to the fact that the best reply to any strategy from the set A in Eq. 6 is a strategy from the set B in Eq. 7. This would then seem to invalidate the whole idea of implementing HFT on the quantum internet of the near future for optimal benefits. However, there is an appropriate game-theoretic solution for this issue found in the cooperative theory of games. As Aumann points out in (Robert, 2005):

We use the term cooperative to describe any possible outcome of a game, as long as no player can guarantee a better outcome for himself. It is important to emphasize that in general, a cooperative out-come is not in equilibrium; it’s the result of an agreement. For example, in the well-known “prisoner’s dilemma” game, the outcome in which neither prisoner confesses is a cooperative outcome; it is in neither player’s best interests, though it is better for both than the unique equilibrium.

Hence, the solution lies in the notion of agreement contracts and the ability to enforce them. For this, the game has to be played repeatedly and the behavioral history of the players collected and used to develop the contracts and the enforcement methods (incentives and disincentives). It is noteworthy then that quantum games such as the quantum prisoner’s dilemma can be thought of as the available policy space for an agent undergoing reinforcement learning. Here, however, it is known that the quantum policy options, in for example the quantum prisoner’s dilemma, are Pareto-optimal over the classical policy options. Therefore, if the task undertaken in quantum reinforcement learning can be thought of as having instances of the prisoner’s dilemma as subtasks, an agent with quantum strategies available to them will perform strictly better than one with only classical policy options, as observed by Meyer in his seminal work.

5 Conclusion

We established a game-theoretic interpretation of high-frequency trading as the game Prisoner’s Dilemma, and showed how it can be implemented as a quantum game using quantum computing processors available over the cloud. We argue that even today’s nascent quantum technology infrastructure allows substantial improvement in the payoffs of the players of this game, and that in the near future, a fully quantum internet and better performing quantum processors will allow players to completely avoid the dilemma via reinforced learning of contracts, as predicted by cooperative game theory.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author Contributions

FSK crafted the majority of the narrative of the manuscript, produced the tables and figures, and compiled the references. NB offered insights into how to map HFT into Prisoner’s Dilemma.

Conflict of Interest

FSK is employed by Dark Star Quantum Lab Inc.

The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

We would like to thank Nathan Benjamin, James Sully, Nathan Urban, for useful discussions. NB is supported by the Computational Science Initiative at Brookhaven National Laboratory.

References

Eisert, J., Wilkens, M., and Lewenstein, M. (1999). Quantum Games and Quantum Strategies. Phys. Rev. Lett. 83 (15), 3077–3080. doi:10.1103/physrevlett.83.3077

CrossRef Full Text | Google Scholar

Elgazzar, A. S. (2021). Coopetition in Quantum Prisoner's Dilemma and COVID-19. Quan. Inf Process 20, 102. doi:10.1007/s11128-021-03054-8

CrossRef Full Text | Google Scholar

Flitney, A. P., and Hollenberg, L. C. L. (2007). Nash Equilibria in Quantum Games with Generalized Two-Parameter Strategies. Phys. Lett. A 363 (5), 381–388. doi:10.1016/j.physleta.2006.11.044

CrossRef Full Text | Google Scholar

Gomber, P., Arndt, B., Lutat, M., and Uhle, T. (2011). High-frequency Trading. Pre-print at SSSRN.

Google Scholar

Harsanyi, J. C. (1968). Games with Incomplete Information Played by ”bayesian” Players, I-III. Part II. Bayesian Equilibrium Points. Manag. Sci. 14 (5), 320–334.

CrossRef Full Text | Google Scholar

Khan, F. S., Solmeyer, N., Balu, R., and Humble, T. (2018). Quantum Games: a Review of the History, Current State, and Interpretation. Quan. Inf. Process. 17 (309). doi:10.1007/s11128-018-2082-8

CrossRef Full Text | Google Scholar

Maynard Smith, J., and Price, G. R. (1973). The Logic of Animal Conflict. Nature 246, 15–18. doi:10.1038/246015a0

CrossRef Full Text | Google Scholar

Meyer, D. A. (1999). Quantum Strategies. Phys. Rev. Lett. 82, 1052–1055. doi:10.1103/physrevlett.82.1052

CrossRef Full Text | Google Scholar

Nash, J. F. (1950). Equilibrium Points in N-Person Games. Proc. Natl. Acad. Sci. U S A. 36 (1), 48–49. doi:10.1073/pnas.36.1.48

PubMed Abstract | CrossRef Full Text | Google Scholar

Robert, J. (2005). Aumann. War and Peace. Stockholm: Nobel Prize Lecture.

Google Scholar

Selten, R. (1994). Multistage Game Models and Delay Supergames. Theor. Decis. 44, 1–36. doi:10.1023/A:1005099909043

CrossRef Full Text | Google Scholar

Shimamura, J., Zdemir, A. K., Morikoshi, F., and Imoto, N. (2004). Quantum and Classical Cor-Relations between Players in Game Theory. Int. J. Quan. Inf. 02 (01), 1052. doi:10.1142/s0219749904000092

CrossRef Full Text | Google Scholar

Steven, A. B. (2009). Quantized Poker. Ithaca, New York: arXiv:0902.2196.

Google Scholar

Szopa, M. (2021). Efficiency of Classical and Quantum Games Equilibria. Entropy (Basel) 23 (5), 2021. doi:10.3390/e23050506

CrossRef Full Text | Google Scholar

von Neumann, J., Morgenstern, O., and Rubinstein, A. (1944). Theory of Games and Economic Behavior (60th Anniversary Commemorative Edition). Princeton University Press.

Google Scholar

Keywords: quantum games, high-frequency trading (HFT), Pareto optimal, Nash equilbrium, quantum computing (QC)

Citation: Khan FS and Bao N (2021) Quantum Prisoner’s Dilemma and High Frequency Trading on the Quantum Cloud. Front. Artif. Intell. 4:769392. doi: 10.3389/frai.2021.769392

Received: 02 September 2021; Accepted: 18 October 2021;
Published: 03 November 2021.

Edited by:

David Orrell, Systems Forecasting, Canada

Reviewed by:

Sudip Patra, O. P. Jindal Global University, India
Marek Szopa, University of Economics of Katowice, Poland

Copyright © 2021 Khan and Bao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Faisal Shah Khan, ZmFpc2FsQGRhcmtzdGFycXVhbnR1bWxhYi5jb20=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.