Poker Ai Neural Network
Artificial Neural Networks (or ANN) are at the very heart of the AI revolution that is shaping every aspect of society and technology. But the ANNs that we have been able to handle so far are. A new poker machine has such smart artificial intelligence that players are hooked even though the house always wins. About 200 machines across the country, called 'Texas Hold ‘Em Heads Up Poker. DeepStack, developed by researchers at the University of Alberta, relies on the use of artificial neural networks that researchers trained ahead of time to develop poker intuition.
Neural Designer is a desktop application for data mining which uses neural networks, a main paradigm of machine learning. The software is developed by the startup company called Artelnics, based in Spain and founded by Roberto Lopez and Ismael Santana.
Poker pro and software developer Nikolai Yakovenko concludes his three-part series examining how far researchers have gotten in their efforts to build a hold’em playing AI system.
In the first two parts of our consideration of the role of counter-factual regret minimization (or CFR) in the advancement of poker-related artificial intelligence, we explained how CFR works and how its implementation has helped researchers come close to “solving” heads-up limit hold’em.
To conclude the discussion, let’s delve a little more deeply into recent efforts to discover solutions for no-limit hold’em and talk about CFR’s important role in that endeavor as well.
NLHE: Five Minutes to Learn, How Long to Create an AI?
It goes without saying that while limit hold’em provides plenty of challenges to researchers, no-limit hold’em is a whole different ball game.
It’s still hold’em, so the boards are the same, and it’s still possible to visit every canonical board, just as many multiple times. However, these similarities ignore the betting. And as anyone who started out with limit hold’em then moved over to no-limit well knows, the betting is hardly something that you can ignore in NLHE.
Of course, we never really ignored the betting in the limit hold’em implementation of CFR. It’s just that for each state, let’s say on the flop, there were two or three possible betting actions, and no more than 20 possible pot sizes, into which we could enter the specific game situation. Furthermore, if we use enough buckets for previous states, the buckets are grouped in a logical way, and we have decent approximate solutions for those other buckets, it is possible to solve each limit hold’em state, largely ignoring how we got here and just looking at the cards and a small number of possible betting contexts.
Once we try applying CFR to no-limit hold’em, it’s less clear that this approach will work. Can we really ignore the order of previous bets and just look at the cards we know and the pot size? And what happens when both we and our opponent can make many differently sized bets, not just the two or three actions of limit poker?
We started with a game that was too big to solve, but figured out that at least you can visit every possible state with a few tricks. You can solve the simplified game, where some of the simplifications are exact and others are very close to exact. Now, if you’re playing NLHE with stacks 100 big blinds deep, you can’t even consider every possible opponent response for a single one of your actions.
In an interview during the man-vs.-machine no-limit hold’em match from earlier this year, Doug Polk talked about the Claudico AI taking 20 seconds to move on the river, even in small-pot, no-action situations. It was not so simple for Claudico to look up the position’s bucket, and to apply an instant strategy.
Poker Ai Neural Networks
Even so, counter-factual regret minimization is a great way to start building a no-limit hold’em AI. Suppose you are playing heads-up no-limit, but only 10 big blinds deep. Can the CFR solve for that? Sure it can. What if you are playing a bit deeper, but limit your bets to a min-raise, 2x a min-raise, half-pot, and all-in? That’s still a much more complicated game than limit hold’em. But while the variance will be higher and you’ll be folding a lot more often, you will get a solution.
Would this CFR play within 1% of a perfect no-limit player (as it can do with LHE)? Not even close. It’s easy to see cases where, if you’re not careful, an opponent could just overbet on any weak board and pick up the pot unless the AI learns to call off sometimes with no hand. That’s tricky to do. That said, the CFR will quickly produce a player that plays every board with at least some semblance of balanced logic.
It’s not hard to imagine such a heads-up no limit player being hard to beat even if it only sticks with a few specific bet sizes, as long as it can handle all different bets made by its opponent. Even if it does fold too much when you overbet on weak boards, sometimes the AI will have a strong hand, which limits your ability to take away every pot, to a point. You’re also less likely to beat the no-limit CFR with small ball, as this is the type of game that you’d expect a balanced Nash equilibrium player to be good at.
Heads-up no-limit hold’em is like a passing down in the NFL. There are too many possible plays for the defense to be able to solve for every possible route that can be taken by the eligible receivers. The CFR approach to this situation would be like playing a zone defense. A pass catcher will always be open, but it won’t be easy for the offense to locate the holes. You can try, but you won’t be able to find the same throw every time, even if the zone defense is not actively adjusting to your play and is just mixing up looks with an balanced approximate Nash equilibrium strategy.
There will be systematic weaknesses, then, but it doesn’t mean you could exploit them on every play. At the very least, you’d have a tough opponent, even if that opponent doesn’t look like a real football team, with its bucketed game situations and Nash equilibrium blitzes.
Then again, if you removed the forward pass, and simplified the game to 5-on-5 football with a single set of downs on a narrow field, it might be possible to solve the game outright.
Imagining a Strong NLHE AI: The CFR Hybrid
We’ve spent a lot of space considering how Texas hold’em can be solved by a equilibrium-finding algorithm. More specifically, we looked at simplifying the game to something close to Texas hold’em but with a magnitude fewer game states, and then applying counter-factual regret minimization to the smaller problem. This yields a near-equilibrium to the faux-hold’em problem, and in practice, often a very good solution for the real Texas hold’em game.
The better we model the game in the simplified problem, the closer we get to an unbeatable strategy for real poker.
Neural Networks Pdf
However, this isn’t the only way to come up with a strong hold’em AI. Rather than taking another 5,000 words to examine the weaknesses of CFR and the strengths of other methods, let’s think about what we might want to see from a strong hold’em AI, once we have that bucketed situation, zone-defense approach described above.
You’d want the AI to play in three-handed and six-handed games, but before we get to that, I think you’d also want it to be able to adjust to opponents. I don’t mean to adjust on the fly, but at least to see what it can learn, say, over the course of playing 10,000 hands against a particular opponent. CFR has no methodology for doing that. By considering every possible response, albeit in a simplified way, there’s no scope for adjusting to the moves that are actually being made against it, and thereby giving those moves more weight going forward than game states that never develop.
Perhaps an even bigger problem is that the no-limit hold’em CFR is pre-trained to play every hand 200 big blinds deep. It could also be trained to play 100 BB deep, or just 10 BB deep, since CFR is a general algorithm, but each stack size would involve a separate week-long training process. Of course, if you have enough computers, you could run a dozen such processes in parallel, then apply the closest one, given the effective stacks. In practice, this should be good enough to play a wide range of stacks, and not really possible for a human to exploit by buying in short.
As with the case of Claudico thinking for 20 seconds on the river, likely because it was running a simulation when it could not simply look up a strategy, the future of strong no-limit hold’em bots appears to be some sort of CFR hybrid. The unexploitable solution to an approximation of hold’em serves as a good baseline. With online search or other methods, it should be possible to fix many of CFR’s weaknesses, one by one, by tweaking that baseline.
One thing it’s hard for CFR to do is to play like a human. A simple tweak to CFR can’t get away from the fact that it’s based on an equilibrium strategy, which plays each hand in a vacuum and restricts itself to a fixed number of bet sizes. Otherwise the problem is too big for equilibrium solutions.
Trying Neural Networks
Another avenue to consider is how applying a neural network on top of CFR might help create what could be regarded as a more adaptable, more human-like player.
It wouldn’t actually have to be a neural network — it could be any machine-learning algorithm that learns to map a game state to a betting strategy. But a neural network is often used in such a context, so let’s ignore other function-learning algorithms and assume we’ll use a neural network to learn our betting function.
We need our betting function to give us one of two things: either a chip-value for each possible bet or a recommended betting policy. What we will ultimately need to use is a betting policy, but as I discussed in “Teaching an Artificial Intelligence System to Play 2-7 Triple Draw,” if you have a value estimate for each action, that also gives you a betting policy.
A machine-learning algorithm needs training data, and in this case, we can get as much data as we want by playing against the CFR’s pretty good (and very fast) algorithm. Better yet, CFR can play itself, and we can train a neural network on the hand histories. The resulting neural network will learn to play much like the CFR.
Why not just use the CFR? The nice thing about a neural network that imitates the CFR strategy is that now you have something that can adapt to human play. For example, you can train the neural network for a week until it plays very close to the CFR. Then you can swap out that training data and keep training — say, just for an hour — on a sample of human hands, or even a single opponent’s hand histories. There’s an NFL comparison here, too. You’re taking a player with years of football experience, and adding a walk-through against this week’s opponent’s offense and defense.
They don’t explain how it’s done, but I assume this is how the ”No-Limit Texas Hold’em” slot machine in Las Vegas that many played against at this year’s WSOP creates a ”Phil Hellmuth” mode and a ”Johnny Chan” mode that you can play against for real money. I know that they use a neural network for their player, and I’d bet they took the original amorphous neural net and trained it against Hellmuth and Chan hands to create slightly different versions of the network. The long-run ability is about the same, but the ”personality” of each network appears different.
You might ask — do you really need to train a neural network to copy the CFR player, just to modify it? In a sense, you don’t. CFR is a specific method for solving for an equilibrium, and it’s very good at it, so you could just use the neural network to adjust those CFR outputs rather than needing the neural network to produce both the baseline and the final answers.
Suppose you have a strong CFR player, but you absolutely need to avoid it betting in 2x min-bet, half-pot, pot, and all-in bet sizes. You could just get an answer from CFR and add random noise to the bet, so that it’s effectively playing CFR but splashing the pot a little bit. Instead of the random noise, a neural network could learn better noise outputs in various cases. In the simplest version, you could play (CFR + noise) against (CFR + noise), and the neural network could use that data to learn what noise sizes worked in different cases over a moderate sample.
There’s even a name for this — it’s called an ”actor-critic” model. The ”actor” network learns all of the action values, and recommends an action policy. Meanwhile the “critic” suggests tweaks to this policy.
Separating these two functionalities is especially useful when learning control over a continuous action space, where it might be possible to count all of the possible actions, but a bit silly to treat them as disconnected buttons. Scientists at Google have recently demonstrated an actor-critic neural network that learns to play a car-racing game just by observing the screen pixels and pressing random buttons. In this case the continuous inputs are left/right and go/brake.
On the down side, neural networks are slow and not as accurate at solving for an equilibrium as CFR. However with a neural network we can be more flexible with the inputs for training, and with how we respond — we haven’t even looked at how a recurrent neural network can remember information from previous hands against an opponent. But it will not be possible to traverse every game board with a neural network, and compute a balanced strategy as we do with CFR.
Conclusion
Perhaps a third method will emerge, but it looks like some combination of an equilibrium-solving strategy like CFR, and on top of it a neural network-based critic, might create unbeatable poker players that can also play a bit like humans, minus the trash talk.
Some players will be concerned about these advances in AI and what they might mean to the future of poker. There is no reason to be. Putting all of these pieces together takes a lot of work, and as an AI problem, poker is not very lucrative. Most of the cutting-edge poker AI work is done by academics (and by amateurs) for science and for the love of the game. Meanwhile beating the stock market with AI might be worth billions, and efficiently routing Ubers might be worth a lot of money as well.
Heads-up poker is not that kind of problem, although it’s a unique crucible in which to test the strength and adaptability of artificial intelligence, especially as poker AIs learn how to play full ring games, adapt to multiplayer dynamics, and deal with variable stack sizes. Perhaps later, they will also tackle games like Omaha, which consists of something like 100 times more game states than Texas hold’em.
I’m especially intrigued by the idea of an AI system that produces a baseline — be that CFR or a neural network that holds its own against a CFR — then uses another network as a critic to adjust that baseline for specific cases, against specific opponents, or to reflect something local over recent hands (be that tilt, the mood of the competitors at your table, or something else). There’s something nice about seeing a baseline, balanced over all possible hands, and then seeing how we might want to deviate from this ”standard” play.
I think that best summarizes how we humans think about poker decisions, or at least how we talk about them with other players. Everyone’s a critic.
Nikolai Yakovenko is a professional poker player and software developer residing in Brooklyn, New York who helped create the ABC Open-Face Chinese Poker iPhone App.
Want to stay atop all the latest in the poker world? If so, make sure to get PokerNews updates on your social media outlets. Follow us on Twitter and find us on both Facebook and Google+!
Tags
artificial intelligencepoker botsno-limit hold’emlimit hold’emcounter-factual regret minimizationonline pokerUniversity of AlbertaCarnegie Mellon UniversityDoug PolkRelated Players
Doug Polk
It is no mystery why poker is such a popular pastime: the dynamic card game produces drama in spades as players are locked in a complicated tango of acting and reacting that becomes increasingly tense with each escalating bet. The same elements that make poker so entertaining have also created a complex problem for artificial intelligence (AI). A study published today in Science describes an AI system called DeepStack that recently defeated professional human players in heads-up, no-limit Texas hold’em poker, an achievement that represents a leap forward in the types of problems AI systems can solve.
DeepStack, developed by researchers at the University of Alberta, relies on the use of artificial neural networks that researchers trained ahead of time to develop poker intuition. During play, DeepStack uses its poker smarts to break down a complicated game into smaller, more manageable pieces that it can then work through on the fly. Using this strategy allowed it to defeat its human opponents.
For decades scientists developing artificial intelligence have used games to test the capabilities of their systems and benchmark their progress. Twenty years ago game-playing AI had a breakthrough when IBM’s chess-playing supercomputer Deep Blue defeated World Chess Champion Garry Kasparov. Last year Google DeepMind’s AlphaGo program shocked the world when it beat top human pros in the game of go. Yet there is a fundamental difference between games such as chess and go and those like poker in the amount of information available to players. “Games of chess and go are ‘perfect information’ games, [where] you get to see everything you need right in front of you to make your decision,” says Murray Campbell, a computer scientist at IBM who was on the Deep Blue team but not involved in the new study. “In poker and other imperfect-information games, there’s hidden information—private information that only one player knows, and that makes the games much, much harder.”
Artificial intelligence researchers have been working on poker for a long time—in fact, AI programs from all over the world have squared off against humans in poker tournaments, including the Annual Computer Poker Competition, now in its 10th year. Heads-up, no-limit Texas hold’em presents a particularly daunting AI challenge: As with all imperfect-information games, it requires a system to make decisions without having key information. Yet it is also a two-person version of poker with no limit on bet size, resulting in a massive number of possible game scenarios (roughly 10160, on par with the 10170 possible moves in go). Until now poker-playing AIs have attempted to compute how to play in every possible situation before the game begins. For really complex games like heads-up, no-limit, they have relied on a strategy called abstraction in which different scenarios are lumped together and treated the same way. (For example, a system might not differentiate between aces and kings.) Abstraction simplifies the game, but it also leaves holes that opponents can find and exploit.
With DeepStack, study author Michael Bowling, a professor of machine learning, games and robotics, and colleagues took a different approach, adapting the AI strategies used for perfect-information games like go to the unique challenges of heads-up, no-limit. Before ever playing a real game DeepStack went through an intensive training period involving deep learning (a type of machine learning that uses algorithms to model higher-level concepts) in which it played millions of randomly generated poker scenarios against itself and calculated how beneficial each was. The answers allowed DeepStack’s neural networks (complex networks of computations that can “learn” over time) to develop general poker intuition that it could apply even in situations it had never encountered before. Then, DeepStack, which runs on a gaming laptop, played actual online poker games against 11 human players. (Each player completed 3,000 matches over a four-week period.)
DeepStack used its neural network to break up each game into smaller pieces—at a given time, it was only thinking between two and 10 steps ahead. The AI solved each mini game on the fly, working through millions of possible scenarios in about three seconds and using the outcomes to choose the best move. “In some sense this is probably a lot closer to what humans do,” Bowling says. “Humans certainly don’t, before they sit down and play, precompute how they’re going to play in every situation. And at the same time, humans can’t reason through all the ways the poker game would play out all the way to the end.” DeepStack beat all 11 professional players, 10 of them by statistically significant margins.
Campbell was impressed by DeepStack’s results. “They're showing what appears to be a quite a general approach [for] dealing with these imperfect-information games,” he says, “and demonstrating them in a pretty spectacular way.” In his view DeepStack is an important step in AI toward tackling messy, real-world problems such as designing security systems or performing negotiations. He adds, however, that even an imperfect-info game like poker is still much simpler than the real world, where conditions are continuously changing and our goals are not always clear.
DeepStack is not the only AI system that has enjoyed recent poker success. In January a system called Libratus, developed by a team at Carnegie Mellon University, beat four professional poker players (the results have not been published in a scientific journal). Unlike DeepStack, Libratus does not employ neural networks. Instead, the program, which runs off a supercomputer, relies on a sophisticated abstraction technique early in the game and shifts to an on-the-fly reasoning strategy similar to that used by DeepStack in the game’s later stages. Campbell, who is familiar with both technologies, says it is not clear which is superior, pointing out that whereas Libratus played more elite professionals, DeepStack won by larger margins. Michael Wellman, a computer scientist at the University of Michigan who was also not involved in the work, considers both successes “significant milestone[s] in game computation.”
Neural Net Ai
Bowling sees many possible directions for future AI research, some related to poker (such as systems that can compete in six-player tournaments) and others that extend beyond it. “I think the interesting problems start to move into what happens if we’re playing a game where we don’t even know the rules,” he says. “We often have to make decisions where we’re not exactly sure how things actually work,” he adds, which will involve “building agents that can cope with that and learn to play those games, getting better as they interact with the world.”