The same technology that powers your chatty mobile assistant could one day provide a voice to those who have lost the ability to speak. As Renae Reints reports for Fortune, neuroscientists from Columbia University recently made a major advancement toward this futuristic goal, successfully translating brain waves into intelligible speech for the first time.
The team’s research, published in Scientific Reports, involves a somewhat unconventional approach. Rather than directly tracking thoughts to produce speech, the researchers recorded neurological patterns generated by test subjects listening to others speak. These brain waves were fed into a vocoder—an artificial intelligence algorithm that synthesizes speech—and then converted into comprehensible, albeit robotic-sounding, speech mirroring the phrases heard by participants.
“Our voices help connect us to our friends, family and the world around us, which is why losing the power of one’s voice due to injury or disease is so devastating,” study author Nima Mesgarani, an engineer in Columbia's neurobiology program, says in a statement. “With today’s study, we have a potential way to restore that power. We’ve shown that, with the right technology, these people’s thoughts could be decoded and understood by any listener.”
It’s worth noting, according to Gizmodo’s George Dvorsky, that scientists haven’t yet figured out how to directly translate internal thoughts, also known as imagined speech, into words. In this ideal scenario, individuals utilizing speech technology would simply envision what they wanted to say, then wait for an artificial voice system to verbalize these thoughts.
The late British physicist Stephen Hawking used a rudimentary version of speech synthesis technology to communicate with others. As Nina Godlewski writes for Newsweek, Hawking was diagnosed with amyotrophic lateral sclerosis (ALS) at age 21. The motor neuron disease eventually claimed his speech abilities, forcing him to use a hand-held clicker to trigger speech.
When Hawking lost the use of his hands, he switched to a system based on facial movements; Gizmodo’s Dvorsky further explains that the scientist used a cheek switch connected to his glasses to choose words spoken by a voice synthesizer.
An advanced iteration of this technology would omit the middle man, enabling users to produce speech without the help of a computer or movement-sensitive system.
Comparatively, Avery Thompson notes for Popular Mechanics, the Columbia team’s study focuses on translating “overheard speech.” Researchers recruited five epilepsy patients set to undergo brain surgery and asked them to listen to an array of spoken words—for example, a recording of someone counting from zero to nine—while hooked up to neural monitoring devices.
The brain waves captured by these tools were put into the vocoder, which synthesized speech with the help of a neural network trained, in the words of Futurism’s Kristin Houser, to “clean up” output and render the sounds intelligible.
Next, the scientists asked 11 other participants to listen to the AI-enabled speech. Significantly, study co-author Mesgarani points out in the Columbia statement, these individuals were able to “understand and repeat” the sounds around 75 percent of the time—“well and beyond” the rates seen in any previous experiments. (You can judge the recordings for yourself here.)
In an interview with Gizmodo’s Dvorsky, Mesgarani says he and his colleagues hope to synthesize more complex phrases in the near future. The researchers also want to record brain signals generated by test subjects who are thinking or imagining the act of speaking rather than simply listening to others speak. Finally, Mesgarani adds in the statement, the team aims to one day transform the technology into an implant capable of translating a wearer’s thoughts directly into words.
Potential limitations of the new research include its small sample size and, according to Newcastle University neuroscientist Andrew Jackson, who was not involved in the study, the fact that neural networks would need to be introduced to a vast number of brain signals from every new participant in order to synthesize speech beyond the numbers zero through nine.
“It will be interesting in future to see how well decoders trained for one person generalize to other individuals,” Jackson tells Gizmodo. “It’s a bit like early speech recognition systems that needed to be individually trained by the user, as opposed to today’s technology, such as Siri and Alexa, that can make sense of anyone’s voice, again using neural networks. Only time will tell whether these technologies could one day do the same for brain signals.”