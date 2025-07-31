Can A.I. Help Revitalize Indigenous Languages? Indigenous researchers and roboticists are crafting innovative tools to help save endangered dialects Serena Jampel Get our newsletter! Get our newsletter!

New artificial intelligence tools are being developed with oversight from traditional speakers to help educate people on how to speak Indigenous languages.

The robot is shaped like a soup can with a little domed head, animal ears and a charming, rounded body. Like a parrot in pirate lore, the robot is intended to perch on a wearer’s shoulder, listening and even joining in on the conversation. Unlike most parrots, it can speak fluent Anishinaabemowin, an Indigenous language spoken by the Anishinaabe nation of North America. The bot, which uses an internally developed artificial intelligence model to learn and reproduce languages, is a new foray into wearable tech for language preservation.

Danielle Boyer, a 24-year-old Anishinaabe roboticist, designed the “Skobot” to converse in endangered Indigenous languages, thereby helping to keep them alive. Made of brightly colored plastic, the bots are customizable and often adorned with accessories from pink tutus to top hats. Growing up below the poverty line in Michigan, Boyer was inspired to make technical education more accessible to Indigenous youth like herself. Her charity——helps distribute the Skobots and teach students about robotics and innovation. Her creation joins a new and burgeoning group of initiatives that utilize A.I. language tools to preserve dying or endangered languages, with some arguing that such technology shows promise for endangered Indigenous languages in particular.

Around 40 percent of the world’s roughly 6,700 languages are at risk of extinction, and many of them are spoken by Indigenous people. Researchers have shown that keeping these languages alive has positive effects beyond just cultural preservation. One study suggested that a connection with linguistic heritage can correspond to lower teen suicide rates. And contact with an ancestral language has also been positively correlated with physical health benefits, like lower rates of diabetes or excessive alcohol consumption.

New A.I. technologies are changing the language-learning game as educational tools, and According to Boyer, Skobot is the one of the first toys that speaks an Indigenous language. Inspired by talking Elmo toys, Boyer’s bot is motion-activated and uses real children’s voices to answer questions in the target language. To use it, a child or teen needs only ask the Skobot how to say a particular word or phrase. The A.I. model inside the toy then interprets the audio input and selects the appropriate pre-recorded audio file, simulating a conversation.

Like the Skobot, most new A.I. technologies developed by Native scientists are designed for a specific language community. Jacqueline Brixey, a computer scientist formerly at the University of Southern California and now joining the University of Wisconsin, created a chatbot called “Masheli” that can communicate in Choctaw. Drawing from a collection of animal stories, the chatbot can listen and respond to users in both English and the target language, helping conversational skills.

In Quebec, the First Languages A.I. Reality (FLAIR) initiative, a project of the A.I. research institute Mila, develops tools to serve Indigenous communities in their language-preservation efforts. FLAIR takes a more global perspective, developing a host of educational tools incorporating A.I. with the goal of creating language preservation technology that can be adapted for different groups’ needs. “The practical outcome is more speakers. That’s the sole objective here,” says Michael Running Wolf, co-founder and lead architect of the initiative. One of the group’s innovations is a “language in a box.” The boxy hardware contains a portable, voice-based guided curriculum that can be programmed for different languages. The creation of these custom language courses is made possible through FLAIR’s foundational research on automatic speech recognition, which allows the models to engage in human-like conversation.

Most A.I. translation systems require vast amounts of training data to produce accurate results. For high-resource languages like English and Spanish, models are typically trained on millions of parallel sentence pairs to learn how to generate accurate translations. From vast amounts of data, tools like Google Translate learn implicit patterns of grammar and language usage that enable it to predict accurate translations from one language to another.

But what happens when a language has little publicly available data? Jared Coleman, a computer scientist at Loyola Marymount University, is working on a novel approach to address this set of challenges. Coleman’s tribal language, Owens Valley Paiute, can be classified as a “no-resource” language, meaning that no public data sets are available to adequately train typical large language A.I. translation tools. Instead, Coleman has programmed a tool that instructs a more typical large language model translator in the rules of the language—grammar, vocabulary—and then asks it to provide translations using that knowledge. Like a human learning a language, the model’s limited training data means that it may use roundabout strategies to achieve the intended meaning. “The approach that we provide guarantees that the sentence, the output sentence, will always be grammatical,” he says.

But when it comes to language learning, many Indigenous innovators are careful to note that A.I. cannot replace human elders and tradition-keepers. The Skobots are meant to be used in conjunction with youth language classes, and Boyer’s team gently denies requests by individuals to purchase the tech. “Our languages are living things. Our languages need to have community relationships. Our languages need to be learned from people,” says Boyer. “It can’t be pure technology.”

Coleman’s attention to grammatical accuracy and Boyer’s commitment to pairing robotic learning with human teachers stems from the worrisome possibility that A.I. will get things wrong. In December 2024, the Montreal Gazette reported on the proliferation of A.I.-generated how-to books for learning the Abenaki language. The books, which were sold on Amazon, contained incorrect translations and even non-Abenaki words. Members of the Abenaki First Nation found the books demeaning to would-be learners, who might find their efforts to revitalize an endangered language stymied by false materials. In a community struggling to retain linguistic sovereignty after centuries of assimilatory pressure, sources told the Gazette, fake content sold for profit is particularly harmful.

Because endangered languages often have less available data, accuracy is harder to achieve. Yet that scarcity is what makes accuracy all the more vital, explains Coleman. “It’s really easy for misinformation about the language to spread, and people start saying things wrong,” he says.

Major large language models like ChatGPT have a poor track record with Indigenous languages. “We should have a right to say how our languages are used,” says Brixey. “ChatGPT could be good in Choctaw, but it’s currently ungrammatical; it shares misinformation about the tribe. It makes up what it claims are tribal stories,” she says.

Running Wolf, who used to work on Amazon’s Alexa, adds that major tech companies aren’t doing enough to mitigate the potential harm of linguistic misinformation. “There needs to be security controls in place, which are not being created and not being done to the level that the communities need to be served at,” he says.

Running Wolf and others stress the importance of scrutinizing the motives and practices of those who would work with endangered languages. That’s no surprise, given the history of exploitation of Indigenous language. In 1890, a white anthropologist named Jesse Walter Fewkes produced wax cylinder recordings of Passamaquoddy stories and songs, some of which were sacred and not meant to be heard by outsiders. For nearly a century, the local tribes had no access to the recordings and the knowledge they held. The recordings, which represent some of the oldest ethnographic sound research that survives to this day, have been at the center of a debate over language autonomy and revitalization in the Northeast.

More recently, the Standing Rock Sioux tribe entered into a lawsuit in late 2024 against an educational materials company it accused of exploitatively recording Lakota language speakers. The tribe contended that the organization wrongly retained and profited from recordings and language materials produced by tribal elders—without proper tribal ownership or consent—and then requested additional payment when the tribe sought access to the materials.

Since different tribes have different cultural traditions, training A.I. models on material in Indigenous languages, particularly ancestral stories and folktales, can lead to unintended consequences. As Coleman explained, in his tribe certain stories are supposed to be told only in the wintertime. “How do you maintain that tradition if it’s available online?” he says. In other words, A.I. models do not understand cultural nuance. If they are not trained appropriately, they can mishandle sensitive cultural information.

In their work, Brixey and Boyer emphasize that participants can rescind their recordings at any time and ask for their knowledge to be excluded from the development of A.I. tools. These measures are part of ensuring that different tribes have full data sovereignty—or control over cultural knowledge—of which language is an important part. UNESCO, the international body overseeing the preservation of cultural heritage, has released a statement advocating for Indigenous data sovereignty with the rise of A.I. language preservation models. According to UNESCO, recognizing contextual nuance and cultural sensitivity when dealing with Indigenous knowledge of any form “is essential to respect, protect and promote global diversity and inclusiveness.”

Indigenous researchers are undertaking A.I. language preservation initiatives to push for more accessibility and diversity at the forefront of technological innovation. As the researchers Uma Pradhan and Joyeeta Dey have explained, A.I. language preservation helps redress historical injustice for communities previously discouraged or even prohibited from speaking their native tongues. These initiatives not only support language revitalization by increasing the number of speakers but also assert the cultural significance of these languages within technological spaces long dominated by English, Mandarin Chinese and a handful of other global languages.

Running Wolf sees the development of Indigenous language A.I. tools as part of keeping abreast of global innovation. He hopes that FLAIR’s work will ensure that Indigenous community voices are represented in future virtual reality, augmented reality or “meta” spaces.

“Our communities are not just communities of the past, but also the present and the future,” says Boyer. “We’ve always been scientists. We’ve always been engineers. We’ve always been innovators.”