Can Computers Decipher a 5,000-Year-Old Language?

A computer scientist is helping to uncover the secrets of the inscribed symbols of the Indus

Over the decades, archaeologists have turned up a great many artifacts from the Indus civilization, including stamp sealings, amulets and small tablets. (Robert Harding / Photo Library)

(Continued from page 1)

Rao and his collaborators—an international group including computer scientists, astrophysicists and a mathematician—used a computer program to measure the conditional entropy of the Indus script. Then they measured the conditional entropy of other types of systems—natural languages (Sumerian, Tamil, Sanskrit, and English), an artificial language (the computer programming language Fortran) and non-linguistic systems (human DNA sequences, bacterial protein sequences, and two artificial datasets representing high and low extremes of conditional entropy). When they compared the amount of randomness in the Indus script with that of the other systems, they found that it most closely resembled the rates found in the natural languages. They published their findings in May in the journal Science.

If it looks like a language, and it acts like a language, then it probably is a language, their paper suggests. The findings don’t decipher the script, of course, but they do sharpen our understanding of it, and have lent reassurance to those archaeologists who had been working under the assumption that the Indus script encodes language.

After publishing the paper, Rao got a surprise. The question of which language family the script belongs to, it turns out, is a sensitive one: because of the Indus civilization’s age and significance, many contemporary groups in India would like to claim it as a direct ancestor. For instance, the Tamil-speaking Indians of the south would prefer to learn that the Indus script was a kind of proto-Dravidian, since Tamil is descended from proto-Dravidian. Hindi speakers in the north would rather it be an old form of Sanskrit, an ancestor of Hindi. Rao’s paper doesn’t conclude which language family the script belongs to, though it does note that the conditional entropy is similar to Old Tamil—causing some critics to summarily “accuse us of being Dravidian nationalists,” says Rao. “The ferocity of the accusations and attacks was completely unexpected."

Rao sometimes takes relief in returning to the less ferociously contested world of neuroscience and robotics. But the call of the Indus script remains alluring, and “what used to be a hobby is now monopolizing more than a third of my time,” he says. Rao and his colleagues are now looking at longer strings of characters than they analyzed in the Science paper. “If there are patterns,” says Rao, “we could come up with grammatical rules. That would in turn give constraints to what kinds of language families” the script might belong to.

He hopes that his future findings will speak for themselves, inciting less rancor from opponents rooting for one region of India versus another. For his part, when Rao talks about what the Indus script means to him, he tends to speak in terms of India as a whole. “The heritage of India would be considerably enriched if we were able to understand the Indus civilization,” he says. Rao and his collaborators are working on it, one line of source code at a time.


Comment on this Story

comments powered by Disqus