sfgate.com: How can you tell if you’re chatting with a bot?

I was interviewed for this article:

How can you tell if you’re chatting with a bot?

Here’s the full text of the email interview I submitted for this article:

1) What do you see as the signficance of passing the Turing test? What will it mean for artificial intelligence or computer science?

This is a kind of loaded question that I need to answer with a bit of context. The quick answer is “it depends”.

Here’s the long answer: most of the current chatbot entries in the Loebner Prize Competition attempt to trick their interrogator by using keyword-spotting tricks and canned responses. This can cause the machine to appear more intelligent than it really is. For example, if I say “I’m worried about my mother.”, a chatbot can simply spot the keyword “mother” in the sentence and spit out the canned response “Tell me more about your family.” Someone who doesn’t know how this is implemented could be fooled into thinking they’ve found a new, understanding soulmate, but if they dug deeper, they’d be sorely disappointed.

If someone tells me that with an arsenal of a gazillion such tricks, a computer succeeds in passing the Turing Test, then that revelation will be about as interesting to me as telling me that I was able to fool a roomful of three-year-olds into believing that I could really pull quarters out of their ears. Why? Because in five minutes, I can teach anyone to bust the bot. It’s really simple: ask any chatbot a question that they are unlikely to have a canned response for, yet which most humans would answer the same way. Here’s an example: “Where on your face is your nose?” Most humans, regardless of race, language, culture, gender, age, etc. will answer this question in exactly the same way, yet no chatbot will answer this unless they have a canned response for this particular question in their database. (They could maybe try looking this up in Wikipedia, but the answer is unlikely to be the three-word answer that you and I both have in our heads right now.)

On the other hand, if a computer manages to convince a savvy interrogator like me, who knows the tricks of the trade, then that’s another story. The ramifications of this would obviously be huge, because we’d be looking at a new class of intelligent entities that coexist with us. I’m not holding my breath for this, though, partly because I believe the deck is unfairly stacked against the machines. Why? Because we all have sensory inputs which are much different than those of a computer, so if I ask a computer to describe a hunger pang, or how something slimy feels when you swallow it, then the programmers would have to spend a ton of time modeling these things that you and I get “for free”. That’s why I don’t think the Turing Test is the best measure of machine intelligence: you can ascribe intelligence to things that don’t involve the huge burden of understanding or mimicking humanness.

2) Have you adjusted your strategy or advanced your technology in ways that you think will improve your of winning this year – and if so, how so?

I have, in ways that I’m not entirely happy or thrilled with. Again, some context is needed to answer this.

Like I’ve mentioned, most chatbots which enter this competition have a massive arsenal of keyword-spotting tricks and canned responses. For me personally, the approach of amassing these canned responses and inventing fake backstories was about as interesting as scraping gum off the bottom of a desk: I have far better things to do with my time. Given that anyone with five minutes of training can bust the bot within seconds, I was more interested in using the contest as a platform to showcase incremental progress in developing intelligent machines.

You might ask: Given that anyone with five minutes of training can spot the bot, what’s the point of the Loebner Prize Competition? Indeed, there are a number of prominent AI experts who pooh pooh this contest precisely for this reason, with Marvin Minsky calling the contest “an obnoxious and unproductive annual publicity campaign” and Stuart Shieber calling it “at best a diversion of effort and attention and at worst a disparagement of the scientific community”. Hugh Loebner wrote an eloquent rebuttal to these assertions and I agree with Hugh. (This is all available on Hugh’s Loebner Prize home page at http://www.loebner.net/Prizef/loebner-prize.html).

The crux of Hugh’s rebuttal is (emphasis mine): “At the current state of the art I suggest that the appropriate orientation for the contest is to determine which of obviously artificial computer entries is the best entry, i.e. most human like, and nominate the authors as “winners.” It should not be to determine if a particular terminal is controlled by a human or a computer. If we maintain this orientation, there should be no problems holding unrestricted tests.”

If the judges were to adhere to the spirit of Hugh’s rebuttal, it would open the door to a wide variety of chatbots which don’t attempt to fool or deceive humans. One famous example of such a chatbot which in my eyes, has real intelligence is SHRDLU, a chatbot that can manipulate and answer questions about blocks (http://hci.stanford.edu/winograd/shrdlu/), which was ironically written in the 1970s and like going to the moon, we haven’t seen anything that good since then. Here’s a snippet from that conversation:

Computer: THE BOX.
Computer: FOUR OF THEM.
Computer: YES, THE RED CUBE.
Computer: YES, BY THE TABLE.
Computer: NO.
Computer: YES.

For me, such a discourse is way more human than a dumb bot which has inane canned responses for “What’s your favorite sport?” or “Do you color your hair?” and represents an endeavor that I’d actually want to waste my precious time coding for.

Chip Vivant, my chatbot, has made it to the Final Four entries in the 2009, 2011 and 2012 competitions. This is partly because Hugh’s prescreening round, which he manages himself, asks questions intended to foil bots that rely solely on canned responses. I started competing in 2008. In previous years, I’ve naively believed that a SHRDLU-like entry would win the actual competition because it would represent breathtaking technological prowess and not just a bunch of canned responses. My strategy had been to more or less announce from the get-go that Chip is a chatbot (“Hello and greetings. Please use full sentences and correct punctuation and grammar. I’m still very young.”) to attempt to bypass the ridiculousness of a fake backstory, etc. I hyperfocused on Chip’s winning fourth conversation in Brighton (in 2009) where despite outing himself, he was ranked higher than his competitors because the conversation was of higher quality.

My LPC (Loebner Prize Competition) 2011 experiences, however, confirmed my naivety and that most contest judges don’t seem to “get it”. (Editor’s Note: I have changed my opinion after the 2012 competition.) Many judges came out guns-ablazin’ and despite Chip being like a dog who submitted immediately to avoid being attacked by the alpha dog, the judges attacked anyway, peppering Chip with questions he couldn’t possibly answer because he had no fake backstory and refusing to answer most of Chip’s questions. The few that did answer Chip’s questions and had higher-quality conversations failed to be moved sufficiently.

How do I reconcile this with my belief that Hugh was correct about the theoretical validity of the LPC to reward new technology by rewarding the best bot? After a lot of soul-searching I realized that just like the content of religious scriptures can have absolutely no bearing on the societal manifestation of that religion at any given moment in time, the societal manifestation of the LPC at this moment in time is nothing more than a glorified creative writing contest once the initial “hazing” (to use Bruce Wilcox’s, my co-finalist’s, term) of the preselection round has been completed.

So to answer your question about how I adjusted my strategy this year: I had to decide whether I wanted to focus my efforts on real AI or simply shoveling more canned responses and keyword-spotting tricks into Chip. I chose for the latter because I am not philosophically opposed to creative writing competitions and submitting Chip again seemed like an easy way to pick up some extra cash and get free publicity from people like you. I tried to author the canned responses myself but couldn’t bring myself to do so, so I outsourced the task to a contractor with the $250 initial winnings from this year’s contest.

3) Do you think we’re getting closer to thinking machines, or are machines just becoming better imitators of human behavior?

In depends on what you mean by “thinking”. Artificial Intelligence is a vast field, and despite the abysmal progress of Turing-Test-capable chatbots, the progress made in more focused problem domains since SHRDLU’s time has been jaw-droppingly astonishing. We’ve got Jeopardy-playing supercomputers, computers that can beat the best human alive in chess, computers that can steer cars autonomously on real-world roads. All of this would have been inconceivable thirty years ago.

I use a simple, utilitarian definition of Artificial Intelligence: a program can be said to be artifically intelligent if it can solve problems that would require intelligence if solved by a human. This definition is pure and simple and tells me that there are tons of artifically intelligent entities in operation today doing a variety of amazing tasks. Fooling people into think they’re human is not one of them, however. Again, there may come a program in the next few years which can fool the uneducated masses, but given that I can educate these masses in a few minutes, that scenario isn’t really of interest to me.

4) What’s the name of your chatbot – and what city are you based in?

My chatbot is named Chip Vivant and I am based in Milwaukee, Wisconsin. You can read more about Chip at http://www.chipvivant.com/.

Wish Chip luck this year!

Like it? Share it!Share on FacebookTweet about this on TwitterShare on LinkedInShare on RedditShare on StumbleUponDigg this

Leave a Comment