Can Computers Decipher a 5,000-Year-Old Language?
A computer scientist is helping to uncover the secrets of the inscribed symbols of the Indus
- By David Zax
- Smithsonian.com, July 20, 2009, Subscribe
The Indus civilization, which flourished throughout much of the third millennium B.C., was the most extensive society of its time. At its height, it encompassed an area of more than half a million square miles centered on what is today the India-Pakistan border. Remnants of the Indus have been found as far north as the Himalayas and as far south as Mumbai. It was the earliest known urban culture of the subcontinent and it boasted two large cities, one at Harappa and one at Mohenjo-daro. Yet despite its size and longevity, and despite nearly a century of archaeological investigations, much about the Indus remains shrouded in mystery.
What little we do know has come from archaeological digs that began in the 1920s and continue today. Over the decades, archaeologists have turned up a great many artifacts, including stamp sealings, amulets and small tablets. Many of these artifacts bear what appear to be specimens of writing—engraved figures resembling, among other things, winged horseshoes, spoked wheels, and upright fish. What exactly those symbols might mean, though, remains one of the most famous unsolved riddles in the scholarship of ancient civilizations.
There have been other tough codes to crack in history. Stumped Egyptologists caught a lucky break with the discovery of the famed Rosetta stone in 1799, which contained text in both Egyptian and Greek. The study of Mayan hieroglyphics languished until a Russian linguist named Yury Knorozov made clever use of contemporary spoken Mayan in the 1950s. But there is no Rosetta stone of the Indus, and scholars don’t know which, if any, languages may have descended from that spoken by the Indus people.
About 22 years ago, in Hyderabad, India, an eighth-grade student named Rajesh Rao turned the page of a history textbook and first learned about this fascinating civilization and its mysterious script. In the years that followed, Rao’s schooling and profession took him in a different direction—he wound up pursuing computer science, which he teaches today at the University of Washington in Seattle—but he monitored Indus scholarship carefully, keeping tabs on the dozens of failed attempts at making sense of the script. Even as he studied artificial intelligence and robotics, Rao amassed a small library of books and monographs on the Indus script, about 30 of them. On a nearby bookshelf, he also kept the cherished eighth-grade history textbook that introduced him to the Indus.
“It was just amazing to see the number of different ideas people suggested,” he says. Some scholars claimed the writing was a sort of Sumerian script; others situated it in the Dravidian family; still others thought it was related to a language of Easter Island. Rao came to appreciate that this was “probably one of the most challenging problems in terms of ancient history.”
As attempt after attempt failed at deciphering the script, some experts began to lose hope that it could be decoded. In 2004, three scholars argued in a controversial paper that the Indus symbols didn’t have linguistic content at all. Instead, the symbols may have been little more than pictograms representing political or religious figures. The authors went so far as to suggest that the Indus was not a literate civilization at all. For some in the field, the whole quest of trying to find language behind those Indus etchings began to resemble an exercise in futility.
A few years later, Rao entered the fray. Until then, people studying the script were archaeologists, historians, linguists or cryptologists. But Rao decided to coax out the secrets of the Indus script using the tool he knew best—computer science.
On a summer day in Seattle, Rao welcomed me into his office to show me how he and his colleagues approached the problem. He set out a collection of replicas of clay seal impressions that archaeologists have turned up from Indus sites. They are small—like little square chocolates—and most of them feature an image of an animal beneath a series of Indus symbols. Most samples of the Indus script are miniatures like these, bearing only a few characters; no grand monoliths have been discovered. Scholars are uncertain of the function of the small seals, Rao told me, but one theory is that they may have been used to certify the quality of traded goods. Another suggests that the seals might have been a way of ensuring that traders paid taxes upon entering or leaving a city—many seals have been found among the ruins of gate houses, which might have functioned like ancient toll booths.
Subscribe now for more of Smithsonian's coverage on history, science and nature.









Comments (38)
+ View All Comments
Maybe we are making it too complicated. It might just be the first appearance of money. Rather than having to trade or tender the actual animal, for the sake of convenience, these tiles were used as currency. The symbols could indicate how many you are trading,the name of the beast or the owners name. Maybe all three. The wheel symbol might mean "we deliver" or "cash and carry".Just kidding.
Posted by Jim on September 29,2012 | 12:21 AM
Here's a thought, maybe the animals are a sign of status, whereas a warrior might have a lion on or some other fierce animal on their tablet. This being like what we do today: If you are a young child, you may pick up a Barney/Clifford book to read, or a young adult who is more interested in Shakespeare might make a different selection of what to read or watch than an senior. My point is, maybe only warriors read warrior tablets, peasants to peasant tablets. So there could be a string of different "languages" here, but some how they are all related.
Posted by Lydia on May 25,2012 | 12:14 PM
I think this is amazing. But maybe you could string together the blocks of symbols in an animal way. Meaning, there are two symbol blocks with an elephant on each one of them. Put together the elephants, and then perhaps there's a lion tile. Don't put the lion tile with the elephants because they would not normally cooperate in a habitat together? CCould it be a + b = c? c being the answer to the code. Or maybe there's a pattern: elephant, giselle, lion, eagle. Maybe? Just an idea. I'm really interested in the Indus Valley code, so I'm writing a research paper about it for school. I'll try and post it here later. Thanks:)
Posted by Lydia on May 25,2012 | 12:05 PM
Hindu Culture has been in existence even before 3000 or 4000 BC , there are many temples in south india which date back to 10,000 BC . Even to build such architecture , how developed should the people be , and how vast should the culture be ?
This Harappa and Mohenjadaro has been made up by the British so that Indians do not follow the Ramayana and Mahabharata as true events but just as myths. This would allow the British to exploit them by just saying that they were stone age people and they really did not have a rich culture.
See the link below which shows a Fine monument in the Mahabharata period and which was way before Harappa and Mohenjadaro civilizations and way more advanced in architecture.
www.youtube.com/watch?v=2CbTyxy1MWo
Posted by Rohan on May 26,2011 | 01:44 PM
I would be skeptical of comparing what appear to be government seals stamped in clay with known languages using only entropy. It seems like it might be one of the less stable statistics of a language.
The demands on a language and script can cause it to adjust entropy very rapidly.
I remember this anecdote: that legal English has less entropy than common English, closer to that of computer language, and that the difference in entropy roughly amounted to the difference in length between 8.5x11 and legal pad document sizes. Presumably the abbreviated lol-talk used in cell phone text communications has much higher entropy than English.
If you were working on a writing system in a time when writing was costly, and it would be used repeatedly for some official capacity, you would expect it to have different entropy than the spoken language.
The need for fewer characters would push it toward higher entropy, but a need for specificity and accuracy would push it toward lower entropy.
Posted by Michael Rule on February 15,2011 | 06:06 PM
First of all, Mr. Witzel is not the only Sanskritist in the world. It has become a habit for Michael Witzel and Steve Farmer to redicule people who try to say that IVC is literate. If it is not literate how can they use mathematical weights with some precision and build bricks with some precision and store food in the granaries and trade with literate sumerian civilization. Little common sense will dictate that IVC is literate. They traded with Sumerians and had their own writing system which can be attested from so many round seals found in and around Dilmun.
Posted by Anonymous on August 16,2010 | 08:19 PM
The only way the Indus script controversy can be successfully resolved is by doing extensive archaeological studies in the Indus sites and analyzing more material than what is available now. Indus script has lot of similarities with the archaic Sumerian script which is deciphered. Don't really understand why this hype and controversy is created by some scholars . Also sanskrit has not originated in Central Asia as some Euro-Centric Indologists try to picturize to the public. If this is the case Sanskrit should have flourished in Central Asia or Europe for last 3000-4000 years which is not the case. You only see Sanskrit surviving to some extent in India in the hands of Traditional Brahmins and in some Vedic schools.
Posted by Anonymous on August 16,2010 | 07:46 PM
Noone is saying that his findings with conditional entrophy is an undisputed and flawless solution to the Indus Valley script, but it's a different view taken on after 90+years of excavation and study of the area and its "language". Obviously, traditional methods have not worked in the symbol system's inner workings, so it's time to use unconventiional methods to get anywhere.
After saying all that, I have a belief of my own.
I don't consider myself a mathematician, and can't even begin to fathom what kind of math he used to use conditional entrophy, but I believe that the study leaves us back where we started.
Linguists and archaeologists believe that these writings represent either one of two things: Symbolic or Religious meanings. The conditional entrophy is figuring out whether some symbols repeat after another in a repetitive manner to indicate that it has the qualities of a "natural language".
Through their studies they have figured out that this written system has basic grammar indicative to other natural languages, but they fail to see that if these carvings are actually religious symbols(found mostly at gates) that means that their test will result in the same answer whether or not the symbols represent a language. Although function words behave and repete in different ways compared to content words, symbolic meanings may repeat one after another therefore creating the effect of a language.
I go back to the point that I have no idea how this math system works, but assuming that it looks for repetion in function words, it's no different than starting over again.
Posted by Sunwoo Yang on December 31,2009 | 06:46 PM
Could these little "tablets" be coins and the inscriptions numbers and not letters??
Posted by eloisa munter on August 24,2009 | 06:28 AM
The frenzy with which Witzel and his cohorts are so eager to lambast even the slightest signs that the Indus Valley civiliation was a literate civlization indicates the angst that they feel whenever the civilizational aspects of the IVC are brought up. The question is why they feel so threatened by something that happened more than 5 millennia ago .
But therin lies the answer. The occidental has never felt comfortable with the notion that the cradle of civilization lay in the IV. If they were not a literate civilization they certainly did all the things that literacy was supposed to help them do like town planning, sewage systems, a vast area of urbanity covering over 1.5 square miles from central India to the borders of Persia. That they could do all this withut the help of a written means of communicaton is beyond comprehension and does not take away from their immense achievements.
Finally what happened to them The answer is simple. The descendants of the IVC are the modern Indians .
Posted by Kosla Vepa on August 13,2009 | 11:22 PM
It is amazing the vitriol that springs forth from professors when their myopic worlds are challenged. I challenged a well-known University of Chicago professor on an issue of business valuation some years ago and drew a heated response that could be felt from coast to coast. Now I do not pretend to know whether Farmer is correct in his criticism, or Rao correct in his computer analysis, but this article has certainly raised the level of the debate. Good hunting to all and here is hoping the Rosetta Stone of the Indus is found soon - before someone has a heart attack.
Posted by Jim Alerding on August 10,2009 | 08:36 AM
Early in history did the Indus trade with any separate entity having its own language and coinage? Could the Indus have drawn some linquistic usages from other trade entities? Did languages there develop along with business and as a "method of doing business"? Is there any long distance between the symbols and even those of current faiths and their histories there? I have read that languages can develop from faith and trade.
Posted by George samuels on August 9,2009 | 05:14 PM
I read the article and was a bit surprised at how sure some letter writers were that Rao's work had been discredited. Conditional entropy doesn't to me doesn't seem to be a concept that proves or disproves that a script is a written language. It seems to me to be just a description of the placement of symbols. Since a symbol in a pictogram might represent a whole word, the analogy between letter placement and picture placement is a bit of a stretch. It would seem to me that a better analogy might be that of word placement, since in a language like English some words are used as objects and only follow prepositions or verbs and different types of sentence structures have different types of words in predictable places. The same is true in other languages such as Denai (sp? Navaho)and Japanese and French. (Or so I've heard...)His research may suggest that the scripts are a language, but the article doesn't seem to say that he says they have to be a language. The fuss to derail the concept seems a little harsh.
Posted by Keith Wellman on August 9,2009 | 02:43 PM
The contention that somehow there are (to paraphrase) "those who have a vested interest in showing that non-Western cultures are less advanced" is astounding. European and American researchers have spent centuries developing and publicizing insights into the sophisticated, highly technical, and highly literate cultures of thousands and thousands of years ago.
No one that I've heard of in academic circles in 'the west' has felt threatened by learning that China, Persia, Egypt, etc. were far advanced of the state their own geographies were in thousands of years ago.
Posted by John Jay on August 8,2009 | 12:24 PM
+ View All Comments