INNOVATION

How to Build a Human Voice

Using sounds from “donors,” scientists are constructing personalized voices for those who can’t speak

March 25, 2014

A new process is giving human voices to people with speech disabilities. Flickr user Miikka Skaffari

One of the more recognizable voices in the world belongs to Stephen Hawking—although, of course, it is not actually the famous scientist’s voice at all, but one that’s computer-generated in response to his facial motions. He’s been using a synthesizer to speak for almost 25 years now, his voice and ability to move lost long ago to ALS, or Lou Gehrig’s disease. Today, the British astrophysicist is identified through his robotic monotone, one that actually has an American accent.

But the truth is, Hawking shares that computer-generated voice with thousands of other people, some young girls, some older women, and others of all ages and ethnicities across the world. All of them, unable to speak naturally, think of it as their own, though there’s nothing unique about it.

And that just doesn’t seem right to Rupal Patel.

Patel is a speech scientist and director of the the new Center for Speech Science and Technology at Northeastern University. She has long long felt that a voice helps define an individual; it clearly shapes how a person is known in the world. Even if people can’t speak, she says, shouldn’t they have an opportunity to communicate through voices that are truer to who they are?

For several years now, she and fellow speech scientist Tim Bunnell have been developing a way of constructing custom-made voices using as their essence whatever sounds a person can make. They focus on the pitch and volume of those sounds and also how the person may pronounce certain letters, such as “ss” or “ch.” The goal is to zero in on a voice’s identity as much as possible.

Then it becomes a matter of building a new voice—one with much more clarity—by harvesting sounds from a donor of a similar gender, age, size and geographical background. To donate a voice, a person is recorded reading a selection of short sentences that cumulatively cover every combination of sounds in a language. Ideally, he or she records as many as 3,000 different phrases. This takes hours. And while recording doesn’t need to happen in a single session, he more sounds a donor can provide, the better a voice can be produced.

From that collection of sounds, specially-designed software creates words in a reverse-engineered voice that’s close to what a person might sound like if he or she didn’t have a speech disorder.

Is this scalable?

Isn’t it going to take a not-so-small army of donors reading an enormous number of sentences to build a database of sounds that can be turned into personalized voices?

Yes it will, Patel says in a recent TED talk, which is why she is pushing ahead with what she calls the Human Voicebank Initiative.

The project's website, VocaliD.org, has both a sign-up page for donors and another for those hoping to get a personal voice. The latter must submit their names and other relevant information such as their speech ability, which can range from “completely non-vocal” to “can make sounds but not words” to “can use some words for communication.”

While only a handful of voices have actually been created during the project's infancy, more than 10,000 people already have volunteered to be voice donors, Patel says. "Several hundred" others, she says, have signed up to get new voices.

Still, there are several hurdles the voice bank faces, Patel says, among them, getting donors to read through all of the material needed to construct a voice. That challenge is even greater when considering that, at the moment, volunteers need to record in a professional studio to ensure scientists have high-quality samples. Patel says tools are being developed that would allow donors to record their sentences at home.

Her vision is to collect a million different voice samples by 2020. But already her work is making an impact. The site features an audio file, only two sentences long, provided by a young woman described as having a “severe speech impairment.” Her words are as clear as day:

“This voice is only for me. I can’t wait to try it with my friends.”

Here’s Rupal Patel explaining the Human Voicebank Initiative in a TED talk:

Hearing voices

Here’s more recent research on the effect of voices:

Listen to your mother: Just the sound of a mother’s voice can get premature babies to eat better, according to a study published in Pediatrics. With the use of pacifiers equipped with sensors, researchers at Monroe Carell Children’s Hospital in Nashville rewarded babies who sucked correctly with recordings of their moms singing lullabies. Babies in the study who used the special pacifiers—and heard their mom’s voice—were able to come off feeding tubes a week earlier than those who didn’t.
Welcome to the echo chamber: Previous research has suggested people prefer voices that sound like they’re coming from small women or large men, but a new study from the University of British Columbia contends that what we really like are voices that sound like our own, specifically ones that have accents with which we’re familiar. The researchers also said that people seemed to prefer the voices of men who used shorter words and women who sounded breathy.
Elephants never forget a voice: African elephants apparently are pretty good listeners. According to a two-year study in Kenya, they’re able to distinguish human voices by gender, age and even ethnic group. Researchers recorded Maasai men, women and children yelling and played it over a speaker hidden from elephant herds. Only when the animals heard the voice of adult Maasai males—the group with which elephants are much more likely to have confrontations—did they react, huddling protectively around calves. They didn’t respond to voices of adult men from another tribe, the Kamba, who, as farmers, rarely come into conflict with the herds.
They hear your pain: After completing a series of brain scans on canines, scientists in Scotland say dogs are like humans in that they have an area of their brains dedicated to recognizing and interpreting voices. And that, say the researchers, is why your dog can seem so tuned in to your feelings.
I knew something was different about you: Plastic surgery doesn’t just change the way you look; it could also change the way a person sounds. According to a paper published in the journal Plastic and Reconstructive Surgery, researchers in Iran found that patients who had rhinoplasty, or nose jobs, often thought their voices sounded more nasaly after the procedure.