In the movies, you never hear robots say “Huh?”
For all his anxiety, "Star Wars"' C-3PO was never befuddled. Sonny, the pivotal non-human in "I, Robot" may have been confused about what he was, but didn’t seem to have any trouble understanding Will Smith.
In real life, though, machines still struggle mightily with human language. Sure, Siri can answer questions if it recognizes enough of the words in a given query. But asking a robot to do something that it hasn’t been programmed, step-by-step, to do? Well, good luck with that.
Part of the problem is that we as humans aren’t very precise in how we speak; when we’re talking to one another, we usually don’t need to be. But ask a robot to “heat up some water” and the appropriate response would be “What?”—unless it had learned how to process the long string of questions related to that seemingly simple act. Among them: What is water? Where do you get it? What can you put it in? What does ‘heat up’ mean? What other object do you need to do this? Is the source in this room?
Now, however, researchers at Cornell University have taken on the challenge of training a robot to interpret what’s not said—or, the ambiguity of what is said. They call the project Tell Me Dave, a nod to HAL, the computer with the soothing voice and paranoid tendencies in the movie "2001: A Space Odyssey."
Their robot, equipped with a 3D camera, has been programmed to associate objects with their capabilities. For instance, it knows that a cup is something you can use to hold water, for drinking, or as a way to pour water into something else; a stove is something that can heat things, but also something upon which you can place things. Computer scientists call the training technique grounding—helping robots connect words to objects and actions in the real world.
“Words do not mean anything to a robot unless they are grounded into actions,” explains Ashutosh Saxena, head of the Tell Me Dave team. The project's robot, he says, has learned to map different phrases, such as “pick it up” or “lift it” to the same action.
That’s a big step forward in human-robot communication, given how many different ways we can describe a simple task.
“All robots, such as those in industrial manufacturing, self-driving cars, or assistive robots, need to interact with humans and interpret their imprecise language,” he said. “Being able to figure out the meaning of words from their environmental context would be useful for all of these robots immediately.”
A Group Effort
Saxena, along with graduate students Dipendra Misra and Jaeyong Sung, have also turned to crowdsourcing to collect as many different variants of the English language as possible.
Visitors to the Tell Me Dave website are asked to direct a virtual robot to complete a certain task, such as “Make ramen.” Because most people tend to give different commands as they lead the robot through the process, the team has been able to collect a large vocabulary related to the same step in the process.
Those commands, recorded in different accents, are associated with stored video simulations of different tasks. So even if the phrases are different—“take the pot to the stove” as opposed to “put the pot on the stove”—the Tell Me Dave machine can calculate the probability of a match with something it has heard before.
At this point, the Tell Me Dave robot completes requested tasks almost two-thirds of the time. That includes cases in which objects are moved to different places in the room, or, the robot is working in a different room altogether. Sometimes, however, the robot is still clueless: When it was told to wait until ice cream became soft, “it couldn’t figure out what to do," Saxena says.
Still, it has become much better at filling in unspecified steps. For instance, when told to “heat the water in the pot,” the robot realized that it first needed to carry the pot over to the tap and fill it with water. It also knows that when instructed to heat something, it can use either the stove or the microwave, depending on which is available.
Saxena says the Tell Me Dave robot training must improve before it can be used in real life settings; being able to follow directions 64 percent of the time isn’t good enough, he says, particularly since humans understand what they’re told 90 percent of the time.
Saxena and his team will present their algorithms for training robots, and show how they’ve expanded the process through crowdsourcing, next week at the Robotics Science and Systems Conference at the University of California, Berkeley; similar research is being done at the University of Washington.
Here’s more recent news about research into communicating with and through robots:
- What’s the gesture for “make sure my seat is warm"?: Mercedes-Benz wants to be the first major car company to start selling driverless cars, perhaps as soon as 2020, and its engineers have started working with robotics experts to develop ways for people to communicate with their vehicles. One method getting a lot of attention is the use of the hand signals that a car’s sensors could comprehend. Experts say that with the right gesture, you could hail your parked car to come pick you up.
- Finally, helper robots for mechanics: At Audi, robot helpers will soon be shipped to the company's mechanics around the world. The robots will be equipped with 3D cameras controlled by an off-site specialist, who can guide the people actually working on the cars through tricky repairs.
- Making Siri smarter: According to a report in Wired, Apple has started hiring top speech recognition experts as it begins focusing on the concept of neural networks, having machines learn words by building connections and mimicking the way neurons function in the human brain.
- Robot needs ride to art show: Later this month, a robot will begin hitchhiking across Canada. Called HitchBOT, it’s been described as a combination art project-social experiment. The goal is to see if HitchBOT can make it from Halifax to a gallery across the country in British Columbia. It won’t be able to move on its own, but it will be equipped with a microphone and camera that will allow it to detect motion and speech. It also will be able to answer questions using a Wikipedia-sourced database.