General Commentary ARTICLE
Commentary: Heads-up limit hold‘em poker is solved
- 1Technische Universität München, Germany
The game of poker, with its tactics of bluffing and deception, has frequently captured the imagination. In one example from popular culture, James Bond defeats a terrorist financier at the poker table in the film Casino Royale. Bond’s poker skill reflects his abilities as a spy: Spotting lies and deception, and thinking one move ahead of his opponent. But like other domains of human skill, poker has been affected by the rise of the machines. In 2015, a supercomputer with 48 CPUs running for 68 days “solved” heads-up limit hold’em poker, the simplest poker game played for money in casinos and online (Bowling, Burch, Johanson, & Tammelin, 2015). This computer cannot be beaten, even in a human lifetime of play. This commentary analyzes the perfect strategy from Bowling et al.’s target article to ask: Does the computer’s strategy in the game’s key initial decision reflect poker expert wisdom, or does the computer play entirely differently?
Games are a common domain for testing the relative skills of experts and computers. In 1997 Garry Kasparov famously lost to Deep Blue in chess, and in 2016 Lee Sedol lost to AlphaGo. In 2008 a similar expert-computer match occurred for heads-up limit hold’em poker (CPRG, 2008). A team of seven professionals played “Polaris,” a computer designed by researchers from the University of Alberta (who later built the 2015 supercomputer). Polaris was the overall winner, although the professional Matt Hawrilenko, who was viewed by many as the most-skilled in this poker game (Arnett, 2009; Brodie, 2008; Nalbone, 2011), emerged a net winner.
Even a relatively simple card game involving two players and a 52 pack of cards can create significant complexity. More precisely, there are 3.16 x 1017 potential game states in this poker game (Bowling et al., 2015). Humans must initially simplify complex problems to learn and improve their performance (Dreyfus & Dreyfus, 1986). We use simple “heuristics” even for problems much simpler than this poker game (Gigerenzer, Todd, & the ABC Research Group, 1999; Hertwig, Hoffrage, & the ABC Research Group, 2013). Poker theorists suggest two relevant simplifying principles: aggression and information hiding (Chen & Ankenman, 2006). It is generally better to be aggressive by raising the stakes, rather than equalling the stakes by calling. It is generally better to hide information by playing many hands the same way, rather than having a unique strategy for specific hands.
Here is a simple strategy as the first player on the first round reflecting these principles (this situation is both important and relatively simple to analyze). This player can play any individual hand by folding (putting no more money in, and immediately forfeiting the hand), calling (equalling the bet), or by raising (doubling the bet). The strategy involves finding a single threshold point: All hands weaker than this are folded, and those stronger are played by raising (raise-or-fold). Calling, as a potential strategy, is never considered (on the initial first round decision; calling might be done later on). Now, the first player’s first round strategy must specify play in one additional scenario. If the second player re-raises, then the first player is revisited with the fold, call, raise trilemma. (If the second player folds the hand immediately ends; if the second player calls play moves onto the second round.) In this case folding is inadvisable from a risk-reward perspective (Sklansky, 1999). A similar argument means raising accomplishes little (because a skilled second player will never fold). Therefore, always-calling is the recommended simple strategy (information hiding trumps aggression in this instance where the principles conflict). So the first player should raise-or-fold (based on initial hand strength), and then always-call. The entire first round strategy boils down to a single hand: The worst hand worth raising. An effective strategy could not be simpler.
Matt Hawrilenko followed this strategy in the 2008 match (Newall, 2011). Over 1,000 hands, he raised 86.8%, otherwise folding. When facing a re-raise he called every time. Computers do not face the same computational constraints as humans. So it is perhaps not surprising that Polaris, the 2008 computer, used a similar yet more-complex strategy. Polaris raised 85.0% (1.8% lower than Hawrilenko), but called 2.4% (rather than never). When facing a re-raise, Polaris called 83.6% of the time, otherwise raising (compared to Hawrilenko calling 100%).
Polaris from 2008 is significantly weaker at poker and used fewer computational resources than “Cepheus,” the unbeatable 2015 agent (Bowling et al., 2015). So how does Cepheus compare? Surprisingly, the more complex computer agent actually uses a simpler strategy. Table 1 compares the three strategies’ observable behavior (combining data from Bowling et al., 2015, and Newall, 2011). Cepheus initially raises 82.54%, only calling a miniscule 0.06%. Cepheus’s initial calling frequency is closer to Hawrilenko’s than Polaris’s. Cepheus calls 99.1% when facing a re-raise, again much closer to Hawrilenko than Polaris. But you would not be recommended to copy these rare plays. According to one of the study’s co-authors, Cepheus’s deviations from Hawrilenko’s simpler strategy are, “most likely part of the noise that makes it ‘essentially’ solved and not just solved” (Burch, 2015).
In conclusion, the expert’s strategy in 2008 in this key situation closely matched the unbeatable 2015 computer’s strategic frequencies. This is another example of expert performance approaching perfect game theoretic strategy (Chiappori, Levitt, & Groseclose, 2002; Palacios-Huerta, 2003; Walker & Wooders, 2001). Although the 2015 computer is unbeatable, the expert’s knowledge is more robust to other poker games (Lake, Ullmann, Tenenbaum, & Gershman, 2016). And the expert can adjust strategy to take greater advantage of opponents’ mistakes. (Newall, 2013, explores less crucial situations in this poker game where computers play unlike most experts.) Simple heuristics have been recommended for changing environments (Bookstaber & Langsam, 1985), for when computational resources are limited (Simon, 1955), and when effort must be economized (Shah & Oppenheimer, 2008). Yet it appears that the supercomputer’s optimization led it to a simple strategy like the expert’s, even when none of these arguments applied (Parpart, Jones, & Love, 2017).
Keywords: games, Game theory, Expertise, artificial intelligence, Decision Making
Received: 13 Sep 2017;
Accepted: 08 Feb 2018.
Edited by:Ulrich Hoffrage, University of Lausanne, Switzerland
Reviewed by:Adam Goodie, University of Georgia, United States
Copyright: © 2018 Newall. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Dr. Philip Newall, Technische Universität München, Munich, Germany, email@example.com