Voice recognition software, most of us would probably agree, is a pretty cool thing. But the talking to machines part–be it smartphone, TV screen or dashboard–well, not so much. Asking advice of a device? Reeks of geek. Enunciating each word so you can be understood? How cool can you really be?
But Apple, true to form, has taken this head on by hiring three icons of cool to star in their latest ad campaign for Siri, the voice of the iPhone 4S. There’s Zooey Deschanel (Adorable Cool) and John Malkovich (Cerebral Cool) and Samuel L. Jackson (Ultimate Cool), and all make engaging in wordplay a with a phone seem the sport of gods.
Critics, nonetheless, point out that in real life, Siri is neither as responsive nor all-knowing as she’s portrayed in commercials. You, too, I’m sure, are shocked to hear this. Others see the whole thing as ripe for parody–see Zooey’s brother Jooey do a Funny or Die version of Zooey’s and Siri’s rainy day together.
No matter. Siri has become a lead singer in the robot chorus, the “You Got Mail” voice of a new generation.
It is fashionable in some circles to suggest that Siri isn’t Steve Jobs-worthy, that if he were still alive, Jobs would have pulled it off the market or, at the very least, never would have approved such a high-profile ad campaign for so flawed a product.
But as Jobs’ successor, Tim Cook, said earlier this week, iPhone 4S owners like Siri. According to a survey released in March, almost 90 percent say they use it at least once a month. And keep in mind that Siri, one of the very few Apple products said to be in beta when it was released, won’t celebrate her first birthday until October. She’s still learning language and, even more importantly, just beginning to tap the potential of artificial intelligence.
Siri will likely be a centerpiece of Apple TV, expected to make its debut in December. But chances are, the place where talking to machines will go mainstream is in our cars.
Drive, she said
Sure, that’s already happening, but you still have to switch to robot speak if you want to be understood. And even then there’s no guarantee. That will start to change this summer when some new models will come equipped with something called Dragon Drive!
It’s the invention of Nuance Communications, a Massachusetts-based company that’s become a powerhouse in the voice recognition business. (It’s widely believed to be the brains behind Siri.) Nuance and voice recognition in cars took a big leap forward last week when the firm announced that Dragon Drive! will be able to tap into the cloud.
What this means is that the system will dramatically ramp up its computing power and memory capability. And that means that the voice in your dashboard will become more Siri-like and allow you to actually converse with it. No more monosyllabic shouting. The day is coming when you’ll be able to casually mention that you feel like some Allman Brothers and seconds later “Whipping Post” will come pumping through the speakers.
The key is how well we’re able to teach machines context and pragmatics–how language is used in social situations. And that’s tricky business. For starters, even the most sophisticated voice recognition device needs to wait for a human to finish speaking so it’s able to parse and interpret the whole sentence. Then there’s the “theory of mind,” the ability to understand that other people can have different beliefs and intentions than our own. As far as we know, only humans can do this.
A recent study by two Stanford psychologists can give you a sense of what’s involved in helping machines intuit. Researchers Michael Frank and Noah Goodman set up an online experiment in which participants were asked to look at a set of objects and then select which one was being referred to be a particular word. For instance, one group of participants saw a blue square, a blue circle and a red square. The question for that group was: Imagine you are talking to someone and you want to refer to the middle object. Which word would you use, “blue” or “circle”?
The other group was asked: Imagine someone is talking to you and uses the word “blue” to refer to one of these objects. Which object are they talking about?
The responses helped the researchers get a clearer picture of how a listener understands a speaker and how a speaker decides what to say. From that, they developed the kind of mathematical model that can expand and refine a computer’s thought process.
Said Frank: “It will take years of work but the dream is of a computer that really is thinking about what you want and what you mean rather than just what you said.”
A manner of speech
Here are some more recent developments in voice recognition:
- Siri goes silent: IBM tends to be real nervous about corporate secrets from getting out, so it now forbids its employees from using public file transfer sites, such as Dropbox. But it also has a ban on the use of Siri in the office because security execs worry that someone, while talking to their phone, could reveal sensitive info that ends up on Apple’s servers.
- Take that, Apple!: Samsung launched its new Galaxy X III smartphone in London this week, and while its big touchscreen is getting a lot of attention, it also features new voice and face recognition software.
- Do what I say, not what I do: And Samsung’s not stopping there. It recently filed a patent application for a robot that understands human speech. The robot would be able to adjust its “listening” capabilities to take into account ambient noise that might interrupt or disrupt commands it’s been given. It would also be able to recognize who’s speaking to it, even if the background noise is very loud.
Infographic bonus: You think your car is computerized now. Wait until it’s completely plugged into the Internet. Get the lowdown on what a connected car can do.