What the AI behind Alphango can teach us about being human
By Cade Metz for the Wired
AJA HUANG DIPS his hand into a wooden bowl of polished black stones and, without looking, thumbs one between his middle and index finger. Peering through wire-rim glasses, he places the black stone on the board, in a mostly empty zone, just below and to the left of a single white stone. In Go parlance it is a “shoulder hit,” in from the side, far away from most of the game’s other action.
Across the table, Lee Sedol, the best Go player of the past decade, freezes. He looks at the 37 stones fanned out across the board, then stands up and leaves.
In the commentary room, about 50 feet away, Michael Redmond is watching the game via closed-circuit. Redmond, the only Western Go player to reach the rank of nine dan, the game’s uppermost designation, literally does a double take. He is just as shocked as Lee. “I don’t really know if it’s a good move or a bad move,” Redmond says to the nearly 2 million people following the game online.
“I thought it was a mistake,” says the other English-language commentator, Chris Garlock, vice president of communications for the American Go Association.
A few minutes later, Lee walks back into the match room. He sits down but doesn’t touch his bowl of white stones. A minute goes by, then another—15 in all, a significant chunk of the initial two hours the players are allowed each game in the tournament. Finally, Lee plucks out a stone and places it on the board, just above the black one Huang played.
Huang’s move was just the 37th in the game, but Lee never recovers from the blow. Four hours and 20 minutes later, he resigns, defeated.
But Huang was not the true winner of this game of Go. He was only following orders—conveyed on a flatscreen monitor to his left, which was connected to a nearby control room here at the Four Seasons Hotel in Seoul and itself networked into hundreds of computers inside Google data centers scattered throughout the world. Huang was just the hands; the mind behind the game was an artificial intelligence named AlphaGo, and it was beating one of the best players of perhaps the most complex game ever devised by humans.
In the same room, another Go expert watches—three-time European champion Fan Hui. At first, Move 37 confuses him too. But he has a history with AlphaGo. He is, more than any other human being, its sparring partner. Over five months, Fan played hundreds of games with the machine, allowing its creators to see where it faltered. Fan lost time and again, but he’s come to understand AlphaGo—as much as anyone ever could. That shoulder hit, Fan thinks, it wasn’t a human move. But after 10 seconds of pondering it, he understands. “So beautiful,” he says. “So beautiful.”
In this best-of-five series, AlphaGo now led Lee—and, by proxy, humanity—two games to none. Move 37 showed that AlphaGo wasn’t just regurgitating years of programming or cranking through a brute-force predictive algorithm. It was the moment AlphaGo proved itunderstands, or at least appears to mimic understanding in a way that is indistinguishable from the real thing. From where Lee sat, AlphaGo displayed what Go players might describe as intuition, the ability to play a beautiful game not just like a person but in a way no person could.
But don’t weep for Lee Sedol in his defeat, or for humanity. Lee isn’t a martyr, and Move 37 wasn’t the moment where the machines began their inexorable rise to power over our lesser minds. Quite the opposite: Move 37 was the moment machines and humanity finally began to evolve together.
They met properly as undergraduates at Cambridge studying computational neuroscience—an effort to understand the human mind and how machines might, one day, become a little bit intelligent themselves. But what they really bonded over was gaming, on boards and on computers.
Chess is a metaphor for war, but it’s really just a single battle. Go is like a global battlespace.
This was 1998, so naturally, after they graduated Hassabis and Silver started a videogame company. Hassabis often played Go with a coworker, and, piqued by his colleague’s interest, Silver began learning on his own.“It became almost like a badge of honor if you could beat Demis at anything,” Silver says. “And I knew that Demis was just starting to get interested in the game.”
They joined a local Go club and played against two- and three-dan players, the equivalent of karate black belts. And there was something more: They couldn’t stop thinking about how this was the one game of intellect that machines had never cracked. In 1995 a computer program called Chinook beat one of the world’s best players at checkers. Two years later, IBM’s Deep Blue supercomputer toppled world chess champion Garry Kasparov. In the years that followed, machines triumphed at Scrabble, Othello, even TV’s Jeopardy! In game-theory terms, Go is a perfect information game like chess and checkers—no elements of chance, no information hidden. Typically those are easy for computers to master. But Go wouldn’t fall.
The thing is, Go looks pretty simple. Created in China more than 3,000 years ago, it pits two players against each other across a 19-by-19 grid. The players take turns putting stones at intersections—black versus white—trying to enclose territory or wall off swaths of their opponent’s color. People say chess is a metaphor for war, but it’s really more a metaphor for a single battle. Go is like a global battlespace, or geopolitics. A move in one corner of the grid can ripple everywhere else. Advantage ebbs and flows. In a game of chess, a player typically has about 35 possible moves to choose from in a given turn. In Go, the number is closer to 200. Over an entire game, that’s a whole other level of complexity. As Hassabis and Silver like to say, the number of possible positions on a Go board exceeds the number of atoms in the universe.
Lee Sedol, seated at right, lost three games in a row to AlphaGo.GEORDIE WOOD
Reporters packed into the press center at the Seoul Four Seasons.GEORDIE WOOD
The upshot is that, unlike in chess, players—whether human or machine—can’t look ahead to the ultimate outcome of each potential move. The top players play by intuition, not raw calculation. “Good positions look good,” Hassabis says. “It seems to follow some kind of aesthetic. That’s why it has been such a fascinating game for thousands of years.”
In 2005, Hassabis and Silver’s game company folded and they went their separate ways. At the University of Alberta, Silver studied a nascent form of AI called reinforcement learning, a way for machines to learn on their own by performing tasks over and over again and tracking which decisions bring the most reward. Hassabis enrolled at University College London and got his PhD in cognitive neuroscience.
In 2010 they found each other again. Hassabis cofounded an AI company in London called DeepMind; Silver joined him. Their ambitions were grandiose: create general artificial intelligence, AI that really thinks. But they had to start somewhere.
That starting point was, of course, games. They’re actually a good test for artificial intelligence. By definition, games are constrained. They’re little bottled universes where, unlike in real life, you can objectively judge success and failure, victory and defeat. DeepMind set out to combine reinforcement learning with deep learning, a newish approach to finding patterns in enormous data sets. To figure out if it was working, the researchers taught their fledgling AI to play Space Invaders and Breakout.
Breakout turned out to be the big one. It’s basically Pong, except instead of bouncing a pixelated ball back and forth with an opponent, you’re bouncing it against a wall of colored bricks. Hit a brick and it disappears; miss the returning ball, or bounce it offscreen, and you lose. After playing just 500 games, DeepMind’s system taught itself to send the ball behind the wall at an angle that would guarantee it would stay up there, bouncing around, knocking out brick after brick without ever returning to the paddle. That’s a classic Breakout move, but DeepMind’s computer did it exactly right every time, at a speed well beyond anything human reflexes could handle.
Trawling for investors, Hassabis buttonholed Peter Thiel, the famed PayPal cofounder and Facebook investor, at a dinner party. He had only a few minutes to hook him. Knowing Thiel was an avid chess player, Hassabis pressed his offense by suggesting that the game had survived for so long because of the creative tension between the skills and weaknesses of knight and bishop. Thiel suggested Hassabis come back the next day to make a proper pitch.
Brute force had never been enough to beat Go. The game presents too many options to consider every outcome, even for a computer.
Once one Silicon Valley billionaire hears about you, others do too. Through Thiel, Hassabis met Elon Musk, who told Google CEO Larry Page about DeepMind. Google soon bought the company for a reported $650 million.
After joining the search giant, Hassabis showed off the Atari demo at a meeting that included Google cofounder Sergey Brin. And the two discovered they had a common passion. In grad school at Stanford, Brin played so much Go that Page worried Google might never happen.
So when Brin met Hassabis, they chatted about the game. “You know, DeepMind could probably beat the world Go champion in a couple years,” Hassabis told him. “If we really put our minds to it.”
“I thought that was impossible,” Brin replied.
That was all Hassabis needed to hear. Game, as they say, on.
With a few keystrokes, Silver calls up the record of AlphaGo’s decisions during the game. He zooms in on what happened right before Move 37.
Before DeepMind and AlphaGo, AI researchers attacked Go with machines that aimed to predict the results of each move in a systematic way, while a match was happening—to tackle the problem with brute computer force. This is pretty much how IBM’s Deep Blue beat Kasparov at chess in 1997. I covered that match as a cub reporter at PC Magazine, and as with Lee versus AlphaGo, people thought it was a signal moment for AI. Weirdly, just as in game two of the Lee match, Deep Blue made a move in its game two against Kasparov that no human would ever make. Kasparov was just as flummoxed as Lee, but Kasparov didn’t have the same fight in him; he resigned almost immediately—folded under the pressure.
But brute force had never been enough to beat Go. The game simply presents too many options to consider every outcome, even for a computer. Silver’s team went with a different approach, building a machine that could learn to play a reasonably good game before ever playing a match.
Inside the DeepMind offices near King’s Cross station in London, the team fed 30 million human Go moves into a deep neural network, a network of hardware and software that loosely mimics the web of neurons in the human brain. Neural networks are actually pretty common; Facebook uses them to tag faces in photos. Google uses them to identify commands spoken into Android smartphones. If you feed a neural net enough photos of your mom, it can learn to recognize her. Feed it enough speech, it can learn to recognize what you say. Feed it 30 million Go moves, it can learn to play Go.
But knowing the rules isn’t the same as being an ace. Move 37 wasn’t in that set of 30 million. So how did AlphaGo learn to play it?
AlphaGo was making decisions based not on a set of rules its creators had encoded but on algorithms it had taught itself.
AlphaGo knew—to the extent that it could “know” anything—that the move was a long shot. “It knew that this was a move that professionals would not choose, and yet, as it started to search deeper and deeper, it was able to override that initial guide,” Silver says. AlphaGo had, in a sense, started to think on its own. It was making decisions based not on a set of rules its creators had encoded in its digital DNA but on algorithms it had taught itself. “It really discovered this for itself, through its own process of introspection and analysis.”
In fact, the machine had calculated the odds that an expert human player would have made the same move at 1 in 10,000. AlphaGo did it anyway.
After it learned to play the game from those human moves, Silver pitted the machine against itself. It played game after game after game versus a (slightly) different version of its own neural network. As it played, it tracked which moves generated the greatest reward in the form of the most territory on the board—the reinforcement learning technique Silver had studied in grad school. AlphaGo began to develop its own inhuman repertoire.
But that was only part of the trick. Silver’s team then fed millions of these inhuman moves into a second neural network, teaching it to look ahead to results the way Kasparov (or Deep Blue) looks into the future of a chess game. It couldn’t calculate all the possible moves like in chess—that was still impossible. But after tapping all the knowledge it had gathered playing so many games on its own, AlphaGo could start to predict how a game of Go would probably play out.
Being able to guess at an outcome from starting conditions you’ve never seen before? That’s called intuition. And what AlphaGo intuited in game two was Move 37, an insight beyond what even the best human players could see. Even its creators didn’t see that one coming. “When I watch these games, I can’t tell you how tense it is,” Silver tells me after his trip to the control room. “I really don’t know what is going to happen.”
OU DON’T PAY $650 million for a company just to have it build a computer that can play board games. Deep learning and neural networks underpin about a dozen Google services, including its almighty search engine. Reinforcement learning, AlphaGo’s other not-so-secret weapon, is already teaching the company’s lab robots to pick up and move all sorts of objects. And you can see how important the tournament is to Googlers. Eric Schmidt—chair and former CEO—flies in before game one. Jeff Dean, the company’s most famous engineer, is there for the first game. Sergey Brin flies in for games three and four, and follows along on his own wooden board.
But more is at stake than a business. During the tournament, I took a walk with Hassabis through Jongno-gu, the 600-year-old cultural and political heart of Seoul. As we chatted, a young woman, eyes wide, recognized Hassabis, whose face was all over Korean TV and newspapers. And then she mimed having a fainting spell, as if he were Taylor Swift or Justin Bieber.
“Did you see that?” I said.
“Yes,” Hassabis answered, deadpan. “It happens all the time.”
He might not be kidding. Computer engineers don’t usually have fans, but 8 million people play Go in Korea, and Lee is a national hero. In China, more than 280 million viewers watched the tournament live.
What many of us realized is that machines have crossed a threshold. They’ve transcended what humans can do.
So perhaps it makes sense that when Lee loses the first game and then the second, the giddy excitement those fans feel is cut with something darker. As game two ends, a Chinese reporter named Fred Zhou stops me in the commentary room, happy to speak with someone who appreciates AlphaGo as a feat of technology, not just a Go killer.
But then I ask him how he feels about Lee’s defeat. Zhou points to his chest, to his heart. “It made me sad,” he says.
I felt that sadness too. Something that belonged uniquely to humans didn’t anymore. What many of us watching the contest unfold came to realize is that machines have crossed a threshold. They’ve transcended what humans can do. Certainly machines can’t yet carry on a real conversation. They can’t think up a good joke. They can’t play charades. They can’t duplicate good old common sense. But AlphaGo’s relentless superiority shows us that machines can now mimic—and indeed exceed—the kind of human intuition that drives the world’s best Go players.
Lee goes on to lose game three, and AlphaGo secures victory in the best-of-five series. At the press conference afterward, with Hassabis sitting next to him, Lee apologizes for letting humanity down. “I should have shown a better result, a better outcome,” he says.
As Lee speaks, an unexpected feeling begins gnawing at Hassabis. As one of AlphaGo’s creators, he is proud, even elated, that the machine has achieved what so many thought it couldn’t. But even he feels his humanness rise. He starts to hope that Lee will win one.
AlphaGo has already won the tournament. Lee isn’t playing for the win anymore. He’s playing for humanity. Seventy-seven moves in, he seems to stall. He rests his chin in his right hand. He sways forward and back. He swivels in his chair and rubs the back of his neck. Two minutes pass, then four, then six.
Then, still gripping the back of his neck with his left hand, he strikes. With the first two fingers of his right hand, Lee puts a white stone near the very center of the board, directly between two black stones. It’s the 78th stone on the board, a “wedge move” between two vast and crowded swaths of territory. It effectively cuts AlphaGo’s defenses in half. And the machine blinks. Not literally, of course. But its next move is horrendous. Lee shoots a pointed stare at Huang, as if Huang is the opponent rather than a billion circuits.
In AlphaGo’s control room, the people running the machine stop what they’re doing and stare at their monitors. Before Lee’s brilliant Move 78, AlphaGo was putting its chances of winning at 70 percent. Eight moves later, the odds drop off the table. Suddenly AlphaGo isn’t Deep Blue’s successor—it’s Kasparov’s. It simply can’t believe a human being would make that move—the odds are a familiar 1 in 10,000.
Just like a human, AlphaGo can be taken by surprise. Four hours and 45 minutes into the game, AlphaGo resigns. Just like us, it can lose.
“All the thinking that AlphaGo had done up to that point was sort of rendered useless,” Hassabis says. “It had to restart.”
AlphaGo can still get taken by surprise—just like a human. Its odds of winning drop off the table.
THE FINAL GAME has begun, and I’m supposed to watch with Hassabis and his team. But just before I head to meet them, a Googler finds me in the press room. “We’re so sorry,” she says, “but the team has changed their mind. They don’t want a reporter in the room for the final match.”
After she walks away, I turn to Geordie Wood, WIRED’s photographer. “You know what that means?” I say. “AlphaGo thinks it’s losing.”
It is. Early in the game AlphaGo makes a rookie mistake. In a crowded area on the lower half of the board, the machine places its white stone too close to Lee’s line of black and loses the entire territory. AlphaGo’s intuition failed it; like a human, the machine still has blind spots.
But as the game stretches into a third hour, AlphaGo claws its way back into the contest. At the three-and-a-half-hour mark, Lee’s play clock runs out. Under the match rules, he now has to make each move in less than a minute or else forfeit, but a wide swath of space on the top right-hand side of the board remains unclaimed. Time and again, he waits until the last second to place his next stone.
Then AlphaGo’s clock runs out too. Both players start moving at what looks like an impossible pace. The board fills with stones. For the first time in the series, the game looks as though it will play out to the very end—that neither side will resign before the final points are tallied. But five hours in, the gulf between Lee and AlphaGo grows too wide. Lee resigns. AlphaGo is fallible but still dominant.
N ALL THE world, only one other person could credibly claim to know how Lee felt: Fan Hui, the three-time European champ and AlphaGo’s de facto trainer. He had lost to the machine five games to nil in a closed-door match back in October, the training montage for the bigger contest in Seoul. Afterward, Fan joined DeepMind as a kind of player for hire, playing game after game against the machine—games he kept losing, one after the other.
But as Fan’s losses piled up against AlphaGo, a funny thing happened. Fan came to see Go in an entirely new way. Against other humans, he started winning more—including four straight games against top players. His ranking shot up. AlphaGo was training him.
So, I ask Fan during the tournament, what should we think of Lee’s fight against the machine?
“Be gentle with Lee Sedol,” Fan says. “Be gentle.”
Playing against Google’s AI had rekindled champion Lee Sedol’s passion for Go.
These days, the world’s biggest, richest tech companies are using the same kinds of technologies on which AlphaGo was built to seek competitive advantage. Which app can recognize a photo better? Which can respond to a voice command? Soon these same kinds of systems may help robots interact with their real-world environments more like people do.
But these practical uses all seem banal next to AlphaGo’s inhuman humanity. A subculture has sprung up around AlphaGo in a way that hasn’t happened around, say, the Google Photo app. In Düsseldorf, Germany, J. Martin—a professor of game design, media, and communications—now runs a Twitter account dedicated to Move 37. After reading my online coverage of the tournament in Seoul, a 45-year-old computer programmer from Florida named Jordi Ensign emailed me to let me know she had AlphaGo’s Move 37 tattooed on the inside of her right arm. On the inside of her left arm, Lee’s Move 78—a move the Go world has dubbed God’s Touch.
Lee replied that playing against the machine had rekindled his passion for Go. As with Fan Hui, AlphaGo had opened his eyes to a new side of the game. “I have improved already,” Lee said. “It has given me new ideas.” He has not lost a match since.
Before the tournament, Hassabis told the world that AlphaGo’s AI tech could drive a new kind of scientific research, where machines point humans toward the next big breakthrough. At the time, without evidence, those claims rang a bit hollow—typical tech hype. But not anymore. The machine did a very human thing even better than a human. But in the process it made those humans better at what they do. Yes, you could see Move 37 as an early sign of machines asserting their superiority to their human creators. Or you could see it as a seed: Without Move 37, we wouldn’t have Move 78.