Wednesday, 18 October 2017

AI and the Last Human Game

Poker has long represented a special kind of challenge to AI programmers, and not just because of the huge financial reward to be had by anyone who cracks it. Unlike games such as chess which can be 'solved' by little more than brute computational force, the face down nature of some of the cards in most forms of poker means that each player has only partial information about the state of the game/hand at any given moment and so has to make inferences about what those cards might be from clues they pick up in their opponent's behaviour. Additionally, everyone plays the game differently - some players bluff more frequently than others, or bet with marginal hands when others wouldn’t, or tends to call an opponent’s bets hoping to catch them bluffing more or less, etc. - which reflects the fact that there are many ways to play winning poker against every different opponent.  (Contrary to the picture painted by Hollywood, body language 'tells' that expose whether or not an opponent is bluffing are rare even in 'live' games and in reality clues are gathered from an understanding of an opponent's tendencies built up over time and then combined with their behaviour in the current hand.)

By its very nature then, poker is a game which is impossible to play 'perfectly' (where 'perfect' is defined as the way one would play if you were able to see your opponent's face down 'hole cards' and then adopted a style that optimised your winnings given that information). The aim is always to play in a way that is as close to perfect as one's temperament and analytical skills permit, with a great deal of emphasis placed on adapting one's strategy to best exploit the nature/tendencies of a given opponent.

The phrase ‘risk intelligence’ has been used to describe the kind of thinking crucial to playing good poker. It implies an ability to assess probabilities in a context of limited information, a deficit which is compounded by the fact that much of the information one does have comes via the medium of idiosyncratic and inconsistent human behaviour, and then formulate an exploitative strategy that anticipates future changes in that behaviour in the face of dynamically unfolding events. Put another way, it requires a mix of cognitive empathy and adaptive Game Theorising that engages fully with the intricate messiness that is each unique human mind. No disrespect to Gary Kasparov, but chess is child’s play in comparison.

The kind of flexible, imperfectly informed, real time risk/reward thinking that high level poker demands was seen by many, myself included, as quintessentially human, and, as games like checkers, backgammon and even Go succumbed one by one to challenges from AI, I started to ruefully call poker The Last Human Game.

Even a decade or more ago, talk was always rife in the poker scene about the day when the online game would be destroyed by 'bots' who would play with the optimum combination of aggression and deception, and run rigorously randomised rings around even the best human players. But as is often the case in AI research, that seemingly impending day never quite came. True, Limit Holdem, a far simpler game with fewer variables to consider due to the proscribed bet sizing rules, fell to the machines some years ago. But No Limit Holdem (in which a player can bet any amount they like, henceforth referred to as NLH), especially in its ‘heads up’ (that is, one-on-one) form, seemed an insurmountable challenge. (Somewhat counter-intuitively, it's generally felt that the complexity of the game increases as the number of players at the table falls, with the scope for creative and adaptive play far greater in a ‘heads up’ encounter than when facing a ‘full ring’ of opponents, where a more formulaic approach can be adopted. Thus, heads up NLH has always been seen as the ultimate testing ground for Man vs Machine.)

As recently as 2015, a purpose built AI designed by a team from Carnegie Mellon Institute lost to a team of poker pros playing heads up NLH, in conditions that controlled for variance (or 'luck') by, amongst other things, having each hand dealt twice, the second an inversion of the first, so that both sides got to benefit equally from the 'run of the cards'. This ‘variance control’ also allowed the teams to examine hands from both sides after each session was played, giving them an opportunity to learn how the other was playing and adapt their strategy accordingly.

Although the humans won that battle (an epic three week slog in which the four pros each played in isolation for 10hrs a day), if you spoke to any of them after the match one could sense they felt the writing was on the wall. And so it has proved. Last month the same Carnegie Mellon team came back with Libratus, their improved poker AI, and faced four of the best NLH heads up pros in the game.

It wasn't even close.

Crucial to Libratus's success was its ability to play unpredictably and to learn quickly as it went along. Initially, the human team felt they had identified some significant weaknesses or 'leaks' in the AI's game, and hammered on them relentlessly. But as night fell and the humans slept, the AI was learning - crunching the data from the day’s play, running simulations, refining its strategy. Each day it came back with yesterday's leaks plugged. Soon enough, the pros stopped being able to find any significant holes in Libratus's game at all.

Libratus, however, had found plenty of weak spots in the humans' game. On top of the obvious advantages of never fatiguing or losing its cool, it also played in a style that was as creative and unusual as it was brutally efficient. Rather than adopting a conservative or 'tight' strategy as previous generations of poker AIs have done (and which would work well against weaker, more predictable, less adaptive human opposition), this AI liked to bluff, and bluff big. Often betting far more than the size of the pot - that is, risking far more than it could possibly win in order to make its opponent fold the best hand - it stole hundreds of pots that a human player, faced with the same situation, would usually have given up on. Crucially, it did so with such a well-balanced range of hands - sometimes as a bluff, sometimes with a very strong holding - that the humans couldn't counter it effectively.

Two thirds of the way into the match the AI had built an insurmountable lead and was only getting stronger. Over 80,000 hands it was up $750,000 on the humans. Luckily for Team Humanity, this was only a notional loss with no real money being waged. But as one of the human players, Jason Les, quipped, "It’s not about the money, it’s about preserving human dignity. And it’s not going well.”

Looking ahead at a world of dwindling human exceptionalism, I found myself wondering what areas might yet be preserved for people alone. Les's gallows humour in the face of certain defeat seems like as safe a bet as any. After all, the need for comedic consolation in the face of impending annihilation probably won't be within our omnipotent robot overlords' direct range of experience. But then, who knows? After simulating a few billion iterations of possible existential threats on their way to total domination, they might have stumbled on a choice gag or two. And if we're really lucky, they could even use that humour to temper the brutal efficiency of their rule.

GDP and Well-Being: A fraught relationship.

More or less since 2008 it's been popular to criticise the focus on GDP per capita as a public policy mistake, and a resurgence of scep...