In February of this year, “Jeopardy” pitted the two winningest human players against a supercomputer. “Jeopardy” is considered a “brainy” game, with an emphasis on puns and indirect references. Players vie to “buzz in” the moment the host (the incomparable Alex Trebek) stops speaking. It is an arena far beyond the prior state of the art in artificial intelligence.
The last popularly known triumph in artificial intelligence was the 1997 triumph by “Deep Blue” over Gary Kasparov, the world champion of chess. That was a culmination of 40 years of research in the domain (chess was sometimes referred to as the “AI’s fruit-fly”).
In that, more significant than the final check- and stalemates was Kasparov’s insistence that the machine’s play during game 2 had the hallmarks of human adaptation. This was a chess-specific “Turing test” moment: a human unable to tell the difference between the behavior of a human and that of a machine.
Fourteen years later, most seemed to take it for granted that a computer could parse questions (or, in “Jeopardy’s” case, answers) such as “Even a broken one of these on your wall is right twice a day.” It was amusing (although, I imagine, somewhat painful for Watson’s developers) that the most common questions about Watson were whether its button-pushing mechanics gave it an advantage and how it determined its wagers, which had seemingly quirky values such as $2,137 and $367.
(Those wagers, to me, were reassuringly mechanical, clearly the result of maximizing some function of confidence in the question area, the amount of money left on the board, and the amounts in the players’ hands.)
It was the parsing that astonished me. Reviewing the questions (available at j-archive.com), one can see that when Watson was wrong, it tended to miss the proper response category (guessing, for instance, “Dorothy Parker” when the answer should have been “The Elements of Style”). But, in general, Watson was somehow able to quickly establish odds on the form of the answer (the answer should be “a person’s name: 80%, a book name: 20%”). Then, in a matter of microseconds, it looked for additional hints in the clue (“New Yorker,” “40s”). With these in hand, rather than attempting to place the fragments as stepping stones on a logical path to the answer, Watson more or less grepped for the fragments in a large (but not Internet-connected) database that included data from previous “Jeopardy” programs, the Internet Movie Database, texts from Project Gutenberg, and so forth.
The results of the grep would reinforce or dampen the probability that a fragment was important, then add more fragments—“Magazine,” “Manual,” “Empire State,” etc.—and grep again. As it did this, one answer would appear more than the others, and this would be the highest-ranked at the moment when an answer was called for, perhaps two or three seconds after the text was downloaded.
At first blush, this statistical approach to question-answering seems profoundly unlike human problem-solving, but on further reflection, we have to admit that we don’t really have a lot of insight into why we “don’t have to think” about the answers to many questions. Perhaps it is a Watson-like series of vague associations, rapidly feeding back, amplifying and reinforcing themselves until they just trip off our tongues. Perhaps we “back-construct” the logical steps that we intuit we follow, and perhaps we’re so talented at these post facto justifications that we convince ourselves that our constructions are more logical than they really are.
The story of Watson is very enjoyably as told by Stephen Baker in the book, “Final Jeopardy: Man vs. Machine and the Quest to Know Everything.” Baker makes it clear what a monumental effort went into the “Jeopardy” challenge and, while he doesn’t include any source code, there is plenty of technical grist for the mill of any entrepreneurial AI researchers.
Part of Watson’s infrastructure was based on the Apache UIMA project, and, while Watson is certainly a specialized machine with a single application, it is certainly the case that within a few years companies will be able to field Watson-like question-answering systems in a variety of domains.
What kind of problems might be solved by a Watson-like approach? Early thoughts include medical diagnoses and regulatory compliance. Less glamorous, but potentially more broadly applicable, would be help-desk systems and log analysis. It occurred to me that Stack Overflow might have enough data to be mined usefully. And, of course, without a doubt one could make a properly inane Twitter bot.
Larry O’Brien is a technology consultant, analyst and writer. Read his blog at www.knowing.net.