Spotlight on Research is the research blog I author for Hokkaido University, highlighting different topics being studied at the University each month. These posts are published on the Hokkaido University website.

The vacuum cleaner that can tweet it can’t fly

Nestled under the desk in the Graduate School of Information Science and Technology is a Roomba robotic vacuum cleaner with a Twitter account. When there is a spillage in the laboratory, one tweet to ‘Twimba’ will cause her to leap into life to clean-up the issue.

Twimba’s abilities may not initially seem surprising. While the twitter account is a novel way of communicating (the vacuum cleaner is too loud when operating to respond to voice commands), robots that we can interact with have been around for a while. Apple iPhone users will be familiar with ‘Siri’; the smart phone’s voice operated assistant, and we have likely all dealt with telephone banking, where an automatic answer system filters your enquiry.

Yet, Twimba is not like either of the above two systems. Rather than having a database of words she understands, such as ‘clean’, Twimba looks for her answers on the internet.

The idea that the internet holds the solution for successful artificial intelligence was proposed by Assistant Professor Rafal Rzepka for his doctoral project at Hokkaido University. The response was not encouraging.

“Everybody laughed!” Rafal recalls. “They said the internet was too noisy and used too much slang; it wouldn’t be possible to extract anything useful from such a garbage can!”

Rafal was given the go-ahead for his project, but was warned it was unlikely he would have anything worth publishing in a journal during the degree’s three year time period: a key component for a successful academic career. Rafal proved them wrong when he graduated with the highest number of papers in his year. Now faculty at Hokkaido, Rafal heads his own research group in automatic knowledge acquisition.

“Of course, I’m still working on that problem,” he admits. “It is very difficult!”

When Twimba receives a tweet, she searches Twitter for the appropriate response. While not everyone has the same reaction to a situation (a few may love nothing better than a completely filthy room), Twitter’s huge number of users ensures that Twimba’s most common find is the expected human response.

For instance, if Twimba receives a tweet saying the lab floor is dirty, her search through other people’s tweets would uncover that dirty floors are undesirable and the best action is to clean them. She can then go and perform this task.

This way of searching for a response gives Twimba a great deal of flexibility in the language she can handle. In this example, it was not necessary to specifically say the word ‘clean’. Twimba deduced that this was the correct action from people’s response to the word ‘dirty’. A similar result would have appeared if the tweet contained words like ‘filthy’, ‘grubby’ or even more colloquial slang. Twimba can handle any term, provided it is wide enough spread to be used by a significant number of people.

Of course, there are only so many problems that one poor vacuum cleaner is equipped to handle. If Twimba receives a tweet that the bathtub is dirty, she will discover that it ought to be cleaned but she will also search and find no examples of robotic Roombas like herself performing this task.

“A Roomba doesn’t actually understand anything,” Rafal points out. “But she’s very difficult to lie to. If you said ‘fly’, she’ll reply ‘Roombas don’t fly’ because she would not have found any examples of Roombas flying on Twitter.”

The difficulty of lying to one of Rafal’s machines due to the quantity of information at its disposal, is the key to its potential. By being about to sift rapidly through a vast number of different sources, the machine can produce an informed and balanced response to a query more easily than a person.

A recent example where this would be useful was during the airing of a Japanese travel program on Poland, Rafal’s home country. The television show implied that Polish people typically believed in spirits and regularly performed exorcisms to remove them from their homes; a gross exaggeration. A smart phone or computer using Rafal’s technique could swiftly estimate the correct figures from Polish websites to provide a real-time correction for the viewer.

Then there are more complicated situations where tracking down the majority view point will not necessarily yield the best answer. A UFO conspiracy theory is likely to result in many more excitable people agreeing with its occurrence than those discussing more logically why it is implausible. To produce a balanced response, the machine must be able to assess the worth of the information it receives.

A simple way to assess or ‘weight’ sources is to examine their location. Websites with academic addresses ending ‘.edu’ are more trustworthy than those ending ‘.com’. Likewise, .pdf files or those associated with peer reviewed journals have greater standing. Although these rules will have plenty of exceptions, rogue points should be overwhelmed by their correctly weighted counterparts.

A computer can also check the source of the source, tracing the origin of an article to discover if it is associated with a particular group or company. A piece discussing pollution for example, might be less trustworthy if the author is employed by a major oil company. These are all tasks that can be performed by a human, but limited time means this is not often practical and can unintentionally result in a biased view point.

With the number of examples available, one question is whether the machine can go a step further than sorting information and actually predict human behaviour. Rafal cites the example of ‘lying with tears’, where a person might cry not through sorrow, but from an ulterior motive. Humans are aware for such deceptions and frequently respond suspiciously to politicians or other people in power when they show public displays of emotion. Yet can a machine tell the difference?

Rafal’s research suggests it is possible when the computer draws parallels between similar, but not identical, situations that occur around the world. While the machine cannot understand the reasons behind the act, it can pick out the predicaments in which a person is likely to cry intentionally.

This ability to apply knowledge across related areas allows a more human-like flexibility in the artificial intelligence, but it can come at a cost. If the machine is able to correlate events too freely, it could come to the conclusion that it would be perfectly acceptable to make a burger out of a dolphin.

While eating dolphins is abhorrent in most cultures, a computer would discover that eating pigs was generally popular. In Japanese, the Kanji characters for dolphin are ‘海’ meaning ‘sea’ and ‘豚’ meaning ‘pig’. A machine examining the Japanese word might therefore decide that dolphin and pigs were similar and thus dolphins were a good food source. This means it is important to control how the machine reaches out of context to find analogies.

Such problems highlight the issue of ‘machine ethics’ where a computer must be able to identify a choice based more on morals than pure logic.

One famous ethical quandary is laid out by the ‘Trolley problem’, first conceived by Philippa Foot in 1967. The situation is of a runaway train trolley barrelling along a railway line to where five people are tied to the tracks. You are standing next to a leaver that will divert the trolley, but such a diversion will send it onto a side line where one person is also tied. Do you pull that leaver?

Such a predicament might be one that Google’s driverless car may soon face. The issue with how the computer makes that decision and why is therefore central to the success of future technology.

“Knowing why a machine makes a choice is paramount to trusting it,” Rafal states. “If I tell the Roomba to clean and it refuses, I want to know why. If it is because there is a child sleeping and it has discovered that noise from the Roomba wakes a baby, then it needs to tell you. Then you can build up trust in its decisions.”

If we are able to get the ethics right, a machine’s extensive knowledge base could be applied to a huge manner of situations.

“Your doctor may have seen a hundred cases similar to yours,” Rafal suggests. “A computer might have seen a million. Who are you going to trust more?”