Karen Jacobsen, an Australian singer and voice actress, got the gig in 2000, soon after arriving in New York. The producers—corporate types—sent her to a recording studio for three weeks, where she spent four hours a day saying things like “at the next intersection, turn left” and “recalculating.” In the end, it wasn’t her voice that was strained. “I said ‘approximately’ approximately 186 times,” Jacobsen recalls. “That kind of thing can make you go loopy.”
Two years later, she got a phone call from a friend. “Karen,” her pal blurted. “I bought my husband one of those new GPS things, and we put it on the Australian voice. It’s you!” That’s how Jacobsen found out her voice was giving directions to 400 million people around the world.
Her work highlights the hybrid of blood and tech that goes into the now ubiquitous voices telling us where to turn: More than a billion people rely on Google Maps each month, and 80 percent activate the voice option.
In the early days of voice synthesis—think of the robotic sounds of a late-1970s Speak & Spell—an algorithm converted text into a monotone stream. Then, as databases grew, you could record a voice actor like Jacobsen pronouncing a corpus of syllables and words, which algorithms would combine and alter according to basic rules. More recently, software coders at firms such as Nuance, which designs navigation interfaces for cars, have developed a third approach—applying deep learning to speech synthesis. It mixes recorded words and synthesized snippets, relying on artificial intelligence to make the pronunciation even more human. “They sound uncannily natural,” says Nuance’s chief technology officer, Vlad Sejnoha.
Of course, language quirks remain a challenge for voice systems. “Think ‘bough’ versus ‘bought,’ or ‘read’ versus ‘read,’” a Google spokesperson says. “But hopefully the user can always guess what we meant.” Now that AI is teaching car nav systems to speak more intelligently, next it will search the online world and figure out where you want to go even before you do.