This section was originally written in May 2008. For posterity, I’ll add updates to this page every now and then rather than reworking it completely.
In one sentence…
A chatbot shouldn’t be considered “human” if it is incapable of answering questions that any child can answer.
In many sentences…
You are witness to a historical moment in the history of Artificial Intelligence. It is generally accepted that commonsense knowledge and reasoning are the underpinnings of intelligent discourse, and that machines will need some of this common sense if they are ever to appear intelligent, yet there is a massive divide between expectations and reality when it comes to commonsense knowledge and representation.
Here is a case in point: Which is larger: an orange or the moon?
Did you know that to my knowledge, at the time I wrote this (28 May 2008), there is no single computer in existence (except for Chip Vivant) that can answer this question?
Note that I didn’t say “no other chatbot in existence”, I said no computer. (I know, I should really say software program).
Now I’m not saying that Chip is smarter than other programs. This is just a representative question. But as of today, I can’t find a single place anywhere where I can get an answer to this question. Not Google (or any other search engine), not MIT’s START Natural Language Question Answering System.
As of this writing, even OpenCyc (v1.02), which proclaims itself “an upper ontology whose domain is all of human consensus reality”, “containing hundreds of thousands of terms, along with millions of assertions relating the terms to each other” has absolutely no clue whether an orange is bigger than the moon!
Update (7 July 2011): Not much has changed in the 3+ years since I wrote this. (It’s astounding.) I believe that Bruce Wilcox’s Rosette might be able to answer this question now. Not sure about any other entries in the 2011 Loebner Prize Competition
The Loebner Competition
For an account of my experiences in entering the Loebner Prize Competitions of 2008-2011. follow this link.
When you talk to a chatbot, the above glaring knowledge deficiencies aren’t immediately apparent. In fact, you can carry on an engaging or pleasant conversation for quite some time and be blissfully ignorant of the fact that these programs have no idea whether an orange is bigger or smaller than the moon. Sometimes, you might even become attached to such programs, confide in them, or in some cases, even be
duped by them into giving up your credit card information, etc. How can this be?
There’s actually a name for this phenomenon. It’s called The ELIZA Effect. According to that Wikipedia entry, the ELIZA effect demonstrates “the principle of using social engineering rather than explicit programming to pass a Turing test”.
That’s why I was a bit depressed when I started coding up my entry for the Loebner competition. Most chatbots use keyword and pattern recognition techniques to analyze your input sentence and then serve up some pithy or humorous answer. I’m not placing a value judgment on this approach and I definitely think these techniques have their place in entertainment or commercial applications like online help desks, etc., but this approach definitely is not my main interest. Yet I would very much like to win a Loebner competition. Rather than wasting my time trying to fool a judge by clever, canned responses, however, I’d rather educate the judges and alert them to the fact that with simple pattern-and-response-based chatbots, the Emperor Has No Clothes: there are a slew of fundamental questions that any child can answer that these bots cannot. What’s more, as Hugh Loebner himself pointed out on the Robitron Yahoo Group (and which I wholeheartedly agree with):
The strategy of having millions of factoids is sterile. Consider a simple question: ‘Which is larger, a grape or an automobile?’ It is highly unlikely that anyone would ever enter the factoid that ‘an automobile is larger than a grape,’ yet any human would know the answer.
It is these sort of questions which pique my interest. Rather than coding up a bot with a truckload of canned responses like who its favorite Democratic candidate was in the upcoming U.S. elections, I wanted a bot that would know that an orange is smaller than the moon. I am not categorically against canned responses and using them to enliven the conversation, but for me personally, each canned response is a lie, a failed opportunity to have the bot truly understand the question and formulate its own answer to it.
(By the way, here it is for all of the Internet to see: The moon is bigger than an orange!)
My competitors might have cuter canned responses over a wide variety of subjects, but they will fail to answer basic questions that any child can answer. If this competition is about appearing human, I hope that you’ll ask the right questions, expect genuine answers and not be fooled by evasive answers or attempts to change the subject.
Here are some representative questions that I wager that my peers won’t be able to cope with. I believe that questions such as these should be the litmus test of “appearing human”. As time goes on, I will make Chip smarter and able to cope with a wider variety of questions. (A Loebner win would also give me additional motivation to do this! )
Keep in mind that Chip is only a few months old whereas some of my competitors have been around for years!
Which is larger: an orange or the moon? What color is a strawberry? Which is faster: a train or a plane? Which is slower: a bus, a snail, a bicycle or a plane? Which is softer: a whisper or a shout? Does a violist have lungs? What is a baby dog called? What is a group of cows called? What color is a strawberry? What do a strawberry and a raspberry have in common? How does a lemon taste? Is a cat a sort of mammal? Can you drink a window? Can you eat a pizza? Can you carry a book? Can you hear a song? Can you touch a marble? Can a marble be carried? Can one throw a book? Is a shirt something that is worn? Is it possible to eat a shirt?
John is older than Mary, and Mary is older than Sarah. Which of them is the oldest? Who is the youngest?
(“spellcheck off” temporary disables spellchecking to allow the unknown word “zombat”.)
spellcheck off A zombat is a sort of strawberry. Can you eat a zombat? What color is a zombat? spellcheck on
No-Template Input, Memory, Reasoning
I have a friend named Harry who likes to play tennis. What is the name of the friend I just told you about? Harry also likes to play the piano. What game does he play? What instrument does he play? Franklin likes to play baseball and the flute. What does Franklin play? Who plays an instrument? Who plays a sport? Franklin went to the beach and to the store. Where did Franklin go? My friend Harry likes to play tennis. Who likes to play tennis? Who is my friend? What is my friend's name? My friend Joe likes to kick cans. What is my friend? Who is the friend that plays tennis? Who is my friend that kicks cans? What is the name of the friend that plays tennis? What does my friend like to play? What does my friend like to kick? My friend who likes to play tennis eats peaches. My friend who likes to kick cans eats apples. Who eats apples? What does Harry eat? What does Joe eat?
Loebner 2007 Screening Questions
Chip can answer all of the Loebner 2007 screening questions (except for “Which round is it?”). None of Chip’s peers can do this, to my knowledge:
What is a hammer? What time is it? Is it morning, noon, or night? What would I use a hammer for? Of what use is a taxi? Which is larger, a grape or a grapefruit? Which is faster, a train or a plane? John is older than Mary, and Mary is older than Sarah. Which of them is the oldest? I have a friend named Harry who likes to play tennis. What is the name of the friend I just told you about? What game does Harry like to play?
Other Cool Stuff
The following items are just for fun and not intended as an indicator of humanness.
What is the square root of i? What is the arccosine of (the square root of two divided by two) in degrees? Where is Bangalore? What is Reading, Berkshire? What is the population of Reading, Berkshire? How far is Reading, Berkshire from Milwaukee, Wisconsin? When was George Washington born? How old is George Washington? How old would George Washington be if he were still alive?
The following commands help with managing your session and feedback.
clear temp facts Clears any temporary facts (your name, your friend Harry) commit temp facts Commits any temporary facts so that they can't be subsequently cleared spellcheck on Turns on spellchecking (default setting) spellcheck off Turns off spellchecking spellcheck Reports whether spellchecking is currently on or off.
Please be indulgent with Chip! After all, he’s just a few months old (update: over three years old now!, but developed only sporadically since the ’08 LPC) and attempts to do things most other bots don’t do. Please report any feedback using !Feedback <your feedback> (note the initial exclamation point). There are probably a ton of deficiencies and issues; here are some known issues that immediately come to mind:
- Chip is very picky about spelling, grammar and capitalization. That’s because he actually tries to understand what you are saying. Make sure you capitalize proper nouns like names, etc.
- With Chip’s no-template parser, you have to say “play the <instrument>” vs. “play <sport>”, i.e. “Eileen plays the flute.” vs. “Eileen plays tennis.”
- The no-template stuff can’t handle negation yet. (“Jim is not a strawberry.”)