Can Computers Decipher a 5,000-Year-Old Language?
A computer scientist is helping to uncover the secrets of the inscribed symbols of the Indus
- By David Zax
- Smithsonian.com, July 20, 2009, Subscribe
Subscribe now for more of Smithsonian's coverage on history, science and nature.









Comments (39)
It seems obvious to me. The first one says rhinoceros and the second one says camel and the third one says whatever that horned beast is. See. problem solved. :-)
Posted by darin on March 24,2013 | 10:00 PM
Maybe we are making it too complicated. It might just be the first appearance of money. Rather than having to trade or tender the actual animal, for the sake of convenience, these tiles were used as currency. The symbols could indicate how many you are trading,the name of the beast or the owners name. Maybe all three. The wheel symbol might mean "we deliver" or "cash and carry".Just kidding.
Posted by Jim on September 29,2012 | 12:21 AM
Here's a thought, maybe the animals are a sign of status, whereas a warrior might have a lion on or some other fierce animal on their tablet. This being like what we do today: If you are a young child, you may pick up a Barney/Clifford book to read, or a young adult who is more interested in Shakespeare might make a different selection of what to read or watch than an senior. My point is, maybe only warriors read warrior tablets, peasants to peasant tablets. So there could be a string of different "languages" here, but some how they are all related.
Posted by Lydia on May 25,2012 | 12:14 PM
I think this is amazing. But maybe you could string together the blocks of symbols in an animal way. Meaning, there are two symbol blocks with an elephant on each one of them. Put together the elephants, and then perhaps there's a lion tile. Don't put the lion tile with the elephants because they would not normally cooperate in a habitat together? CCould it be a + b = c? c being the answer to the code. Or maybe there's a pattern: elephant, giselle, lion, eagle. Maybe? Just an idea. I'm really interested in the Indus Valley code, so I'm writing a research paper about it for school. I'll try and post it here later. Thanks:)
Posted by Lydia on May 25,2012 | 12:05 PM
Hindu Culture has been in existence even before 3000 or 4000 BC , there are many temples in south india which date back to 10,000 BC . Even to build such architecture , how developed should the people be , and how vast should the culture be ?
This Harappa and Mohenjadaro has been made up by the British so that Indians do not follow the Ramayana and Mahabharata as true events but just as myths. This would allow the British to exploit them by just saying that they were stone age people and they really did not have a rich culture.
See the link below which shows a Fine monument in the Mahabharata period and which was way before Harappa and Mohenjadaro civilizations and way more advanced in architecture.
www.youtube.com/watch?v=2CbTyxy1MWo
Posted by Rohan on May 26,2011 | 01:44 PM
I would be skeptical of comparing what appear to be government seals stamped in clay with known languages using only entropy. It seems like it might be one of the less stable statistics of a language.
The demands on a language and script can cause it to adjust entropy very rapidly.
I remember this anecdote: that legal English has less entropy than common English, closer to that of computer language, and that the difference in entropy roughly amounted to the difference in length between 8.5x11 and legal pad document sizes. Presumably the abbreviated lol-talk used in cell phone text communications has much higher entropy than English.
If you were working on a writing system in a time when writing was costly, and it would be used repeatedly for some official capacity, you would expect it to have different entropy than the spoken language.
The need for fewer characters would push it toward higher entropy, but a need for specificity and accuracy would push it toward lower entropy.
Posted by Michael Rule on February 15,2011 | 06:06 PM
First of all, Mr. Witzel is not the only Sanskritist in the world. It has become a habit for Michael Witzel and Steve Farmer to redicule people who try to say that IVC is literate. If it is not literate how can they use mathematical weights with some precision and build bricks with some precision and store food in the granaries and trade with literate sumerian civilization. Little common sense will dictate that IVC is literate. They traded with Sumerians and had their own writing system which can be attested from so many round seals found in and around Dilmun.
Posted by Anonymous on August 16,2010 | 08:19 PM
The only way the Indus script controversy can be successfully resolved is by doing extensive archaeological studies in the Indus sites and analyzing more material than what is available now. Indus script has lot of similarities with the archaic Sumerian script which is deciphered. Don't really understand why this hype and controversy is created by some scholars . Also sanskrit has not originated in Central Asia as some Euro-Centric Indologists try to picturize to the public. If this is the case Sanskrit should have flourished in Central Asia or Europe for last 3000-4000 years which is not the case. You only see Sanskrit surviving to some extent in India in the hands of Traditional Brahmins and in some Vedic schools.
Posted by Anonymous on August 16,2010 | 07:46 PM
Noone is saying that his findings with conditional entrophy is an undisputed and flawless solution to the Indus Valley script, but it's a different view taken on after 90+years of excavation and study of the area and its "language". Obviously, traditional methods have not worked in the symbol system's inner workings, so it's time to use unconventiional methods to get anywhere.
After saying all that, I have a belief of my own.
I don't consider myself a mathematician, and can't even begin to fathom what kind of math he used to use conditional entrophy, but I believe that the study leaves us back where we started.
Linguists and archaeologists believe that these writings represent either one of two things: Symbolic or Religious meanings. The conditional entrophy is figuring out whether some symbols repeat after another in a repetitive manner to indicate that it has the qualities of a "natural language".
Through their studies they have figured out that this written system has basic grammar indicative to other natural languages, but they fail to see that if these carvings are actually religious symbols(found mostly at gates) that means that their test will result in the same answer whether or not the symbols represent a language. Although function words behave and repete in different ways compared to content words, symbolic meanings may repeat one after another therefore creating the effect of a language.
I go back to the point that I have no idea how this math system works, but assuming that it looks for repetion in function words, it's no different than starting over again.
Posted by Sunwoo Yang on December 31,2009 | 06:46 PM
Could these little "tablets" be coins and the inscriptions numbers and not letters??
Posted by eloisa munter on August 24,2009 | 06:28 AM
The frenzy with which Witzel and his cohorts are so eager to lambast even the slightest signs that the Indus Valley civiliation was a literate civlization indicates the angst that they feel whenever the civilizational aspects of the IVC are brought up. The question is why they feel so threatened by something that happened more than 5 millennia ago .
But therin lies the answer. The occidental has never felt comfortable with the notion that the cradle of civilization lay in the IV. If they were not a literate civilization they certainly did all the things that literacy was supposed to help them do like town planning, sewage systems, a vast area of urbanity covering over 1.5 square miles from central India to the borders of Persia. That they could do all this withut the help of a written means of communicaton is beyond comprehension and does not take away from their immense achievements.
Finally what happened to them The answer is simple. The descendants of the IVC are the modern Indians .
Posted by Kosla Vepa on August 13,2009 | 11:22 PM
It is amazing the vitriol that springs forth from professors when their myopic worlds are challenged. I challenged a well-known University of Chicago professor on an issue of business valuation some years ago and drew a heated response that could be felt from coast to coast. Now I do not pretend to know whether Farmer is correct in his criticism, or Rao correct in his computer analysis, but this article has certainly raised the level of the debate. Good hunting to all and here is hoping the Rosetta Stone of the Indus is found soon - before someone has a heart attack.
Posted by Jim Alerding on August 10,2009 | 08:36 AM
Early in history did the Indus trade with any separate entity having its own language and coinage? Could the Indus have drawn some linquistic usages from other trade entities? Did languages there develop along with business and as a "method of doing business"? Is there any long distance between the symbols and even those of current faiths and their histories there? I have read that languages can develop from faith and trade.
Posted by George samuels on August 9,2009 | 05:14 PM
I read the article and was a bit surprised at how sure some letter writers were that Rao's work had been discredited. Conditional entropy doesn't to me doesn't seem to be a concept that proves or disproves that a script is a written language. It seems to me to be just a description of the placement of symbols. Since a symbol in a pictogram might represent a whole word, the analogy between letter placement and picture placement is a bit of a stretch. It would seem to me that a better analogy might be that of word placement, since in a language like English some words are used as objects and only follow prepositions or verbs and different types of sentence structures have different types of words in predictable places. The same is true in other languages such as Denai (sp? Navaho)and Japanese and French. (Or so I've heard...)His research may suggest that the scripts are a language, but the article doesn't seem to say that he says they have to be a language. The fuss to derail the concept seems a little harsh.
Posted by Keith Wellman on August 9,2009 | 02:43 PM
The contention that somehow there are (to paraphrase) "those who have a vested interest in showing that non-Western cultures are less advanced" is astounding. European and American researchers have spent centuries developing and publicizing insights into the sophisticated, highly technical, and highly literate cultures of thousands and thousands of years ago.
No one that I've heard of in academic circles in 'the west' has felt threatened by learning that China, Persia, Egypt, etc. were far advanced of the state their own geographies were in thousands of years ago.
Posted by John Jay on August 8,2009 | 12:24 PM
East Asia, China in particular, had the good fortune of having a scholar like Joseph Needham research and defend the claims of the Chinese scholarly community against the detractors. Not only is South Asia very fragmented but it lacks a scholar the caliber of Needham to defend the attempts its scholarly community makes to discover its own past.
The eminent scholars Farmer, Witzel, and Sproat can afford to be a little more sympathetic with the manner in which they present their argument. The very fact that it is published, regardless of what its detractors think of it, means that South Asian archaeology is coming of age and such attempts are not career suicide for scholars anymore. I don't have the volume of Science but it should be thought of as normal scholarly discourse to have an attempt made at decipherment of a script and to have it published in a scholarly science periodical.
That said South Asia can use a scholar like Joseph Needham, especially when its scientific endeavors themselves come under undue criticism.
Posted by Kong Bai on August 6,2009 | 04:51 PM
Rao and friends have published another paper in the Proceedings of the National Academy of Sciences, another prestigious journal. Looks like its open access and does not need a subscription to read:
http://www.pnas.org/content/early/2009/08/04/0906237106.abstract
That says something, unless of course there is a deep conspiracy in which all prestigious journals are colluding to undermine standards in computational linguistics.
The archaeological material in Farmer-Sproat-Witzel has been thoroughly discredited by an authority in archaeology, Prof. Massimo Vidale in a hilarious aping of the polemical style of FSW
http://www.docstoc.com/docs/document-preview.aspx?doc_id=9163376
The weight of evidence looks definitely in favour of language now.
Posted by Harappan on August 6,2009 | 03:19 PM
Farmer et al.,
Please publish your rebuttal in a well reviewed high impact scientific journal instead of random blog sites, that will give credence to your griping.
Posted by vprasad on August 6,2009 | 01:50 PM
today's hindustan times is to carry a news item regarding rao's "discovery". there will be a time to argue about their methodology presently we are happy that his group is seriously doing something towards solving this puzzle. i do not know much about computational diciphering but isn't it progress to see an addition in the long list of attempts? --gopal chippalkatti
Posted by gopal chippalkatti on August 5,2009 | 05:19 PM
Actually the refutation that Steve Farmer was referring to did NOT appear in said journal. That was our 2004 paper. The refutation appeared online and has not been published (or, indeed, reviewed). That said, again, surely the issue is whether the arguments are valid, not where they appeared or how rigorously they purported to be reviewed.
But Chris Cornuelle makes an essential point: defenders of the non-script hypothesis would (or should) be ready to throw in the towel if a clear example of a long "bilingual" text in, say, Sumerian and "Harappan" were discovered.
The Phaistos Disk is good one. Text too short to decipher, yet that hasn't stopped dozens of would-be decipherers from trying. And as with the Indus Valley stuff, the popular science press has from time-to-time been duped by convincing-looking demonstrations.
Posted by Richard Sproat on July 28,2009 | 10:36 AM
From my exposure to this issue, all I can say is that this is what one expects from insufficient data. Phaistos disk, anyone? Throwing software at a problem like this one seems promising, but e.g. any neural network needs adequate training data, and my guess is that this topic needs more spadework. A bit of clay from Iraq showing a shipping manifest in both "Harappan" and cuneiform would be first on anyone's wish list - assuming "Harappan" is a language, of course ...
Posted by Chris Cornuelle on July 27,2009 | 05:39 PM
While Richard Sproat makes an important point that just because Science publishes a paper after review, it must not be assumed that the paper's deductions are correct, I wonder why in Steve Farmer's first comment, the one attacking Smithsonian very strongly for accepting Rao's work uncritically, and referring to his own and Richard Sproat-Michael Witzel's refutation, he did not mention that this refutation appeared in the Witzel-edited Electronic Journal of Vedic Studies. That does weaken Mr Sproat's claim considerably. And what makes it worse is Mr Farmer's statement that the refutation "appeared within a few hours" of the publication of Rao's work. Even if Rao's claims are all wrong, Mr Sproat and his colleagues do not exactly cover themselves with glory - Mr Farmer's point that Rao et al had no defenders at the conference stands juxtaposed against the lack of any kind of review of the refuation, and yes, even Galileo had no supporters.
Possibly there are better reasons to disbelieve Rao et al but the moment these have been submerged by this vitriolic campaign of his critics
Posted by Shakti Sinha on July 27,2009 | 12:44 PM
Thinking more about this, I am puzzled by the number of times I have seen the point made about the Rao paper that it was reviewed before appearing in Science. As if somehow that means it necessarily must be correct. In contrast, we never harped on the point about whether our paper was reviewed, not because it wasn't, but because surely the real issue is whether the points made in the paper are valid. In serious academic debate, whether a paper was reviewed or not is irrelevant: one looks at the quality of the arguments.
In any case we know that reviewed papers in Science can turn out to be wrong. Justeson and Kaufman's 1993 article in Science on Epi-Olmec was surely reviewed, but few people in the field now believe their decipherment. (I happen to think the reviewers of *that* paper did a more credible job than the reviewers of the Rao et al. paper, but no matter.) More famously, papers by Hendrick Schon of Bell Labs were also reviewed -- but turned out to be fraud, and were subsequently withdrawn by Science.
Anyone who believes that a paper that appears in Science must be correct because its very appearance implies rigorous and correct reviews, is deluding themselves.
Posted by Richard Sproat on July 26,2009 | 09:38 AM
Computational linguists have sat on the Indus script problem for decades now and their main contribution is the paper by Farmer, Sproat, and Witzel claiming the script is not a script at all, a claim that has now been independently debunked by respected scholars in the field like Asko Parpola, Mahadevan, Massimo Vidale, and others.
It is refreshing to see new interdisciplinary approaches being applied to the problem and I commend scientists like Rao et al for exploring such approaches.
Posted by Don on July 24,2009 | 12:41 AM
I agree with Jake, discoveries of what is, or what isn't, and how things are proven, or how they aren't, is the advancement of truth, which is what most of us strive for in the end. Thanks for the article
Posted by Gordy Byrne on July 24,2009 | 08:04 PM
Well there's "reviewed", and then "reviewed by people who know the technical area". I have no doubt the Science article was reviewed. But was it reviewed by a computational linguist who knows about statistical language modeling? Doubtful. See Pereira's blog, where he notes that this is not the first time that Science has published something on language that fails to meet the standards of the field. Presumably this is different in other fields, but it does not seem that the reviewing standards for Science in things that deal with language is particularly high.
Posted by Richard Sproat on July 24,2009 | 02:28 PM
Great piece of journalism!
Going over the posts above, I see one by Farmer and another by "Word Geek" aka Diana Gainer (who seems to be an echo chamber for Farmer, repeating the same allegations).
Farmer and "Word Geek" in their obfuscated posts above speak as if the paper by Rao and colleagues was published without being reviewed. I see that it was published in Science journal, which is widely regarded as one of the most rigorously reviewed journals in science today. Having tried to publish there myself, I can attest that their reviewing standards are very high and all technical details are thoroughly checked before any paper makes it to the publication stage.
Speaking of standards of reviewing, a colleague who works in this field informs me that Farmer and co-authors Sproat and Witzel published their original paper on this topic in a website called "Electronic journal of vedic studies" whose editor-in-chief is none other than Witzel himself. One wonders if that paper was even reviewed!
Posted by Erik Lutz on July 23,2009 | 12:09 AM
I'd be interested if anyone can find a computational linguist who knows anything about statistical language processing and who thinks that Rao et al's paper actually demonstrates anything.
Posted by Richard Sproat on July 23,2009 | 04:15 PM
Reading over these comments I see the post by Diana Gainer (I had also seen her Word Geek article on this). As far as I can see she pretty much sums up the state of affairs accurately.
The only thing that I would add is the point also made in Liberman and Pereira's blogs, namely that it is not hard to show that conditional entropy is not evidence for "structure" per se, and that it cannot distinguish linguistic from non-linguistic systems. So over and above the broader issues, Rao et al's work simply fails to show what it purports to show even in the narrow way they have been pitching it.
As for accusations of Western chauvinism, or however one wants to put some of the claims that have been flying around, that's just plain silly at a number of levels. As far as anyone has demonstrated, the earliest literate people were the Sumerians. Last time I checked, the Sumerians inhabited Mesopotamia, which is not normally considered to be part of the "West". So, by arguing that the Indus Valley civilization was not literate, one is not committed to a view of Western supremacy.
Posted by Richard Sproat on July 23,2009 | 02:46 PM
John's remarks rely on the notion that India was invaded from the Northwest in pre-Vedic times, an idea that has its strongest basis in the fact that India was indeed successfully invaded from both the near and far Northwest in more recent times, first by land and then by sea. However, extending this idea that India was continually invaded from the Northwest, far into the distant past is likely more mirage than mirror.
The real problem is that John assumes that Europeans cannot possibly have originated in India and that he cannot consider the possibility that the ancestors of present day white people migrated to the Northwest from India in prehistoric times. This organized resistance to the idea that Sanskrit is indigenous to India is also a post colonial refusal by scholars of European origin to admit the possibility of their sharing genetic material with Indians -- and Pakistanis. Rather than admitting that invasion may not be the only means of connection between the racial and cultural groups involved, these scholars apparently plough through data they can only understand dimly, since most have no lifelong knowledge of the descendant cultures in the same regions, and of course must then come up with circuitous logic to bolster their essentially untenable positions.
Posted by Nina on July 22,2009 | 02:56 PM
Farmer has been ranting about this for months now, trolling the web and obsessively planting links to his "rebuttals" where he can.
Enough already! As far as I am concerned, Rao et al have adequately responded to the so-called criticisms with their response on Rao's website.
Great article, by the way!
Posted by Don on July 22,2009 | 01:49 PM
Wow, so much conflict. I don't pretend to know a lot about the topic, but I know that any advance in it is one step closer to the truth, whether it proves what not to do or what is incorrect, or what is a full-on advance. Thanks for the enlightening article.
Posted by Jake on July 22,2009 | 01:45 PM
As I mentioned in an article as the Word Geek in Examiner.com, Rao and colleagues did not actually compare the conditional entropy of Indus symbols as found on Indus seal stones and other objects to cuneiform symbols, to actual Tamul symbols, to actual Vincha symbols, or to actual Near Eastern symbols, although they claimed to do all these things. In their online supplement, they reveal that they made up their data sets for comparison to Vincha and Near Eastern symbols, which show results that are not valid, as a result. The comparisons with other languages are based on transcriptions, which are not valid for other, more complex reasons. Other civilizations were not fully literate, for instance the Aztecs in Central America, and the West African kingdoms in Benin at the time of European arrival there. These both used some symbols but did not write their languages fully. The Indus civilization still may not have been fully literate either, as Farmer et al note. Rao et al haven't proved anything.
Posted by Diana Gainer on July 22,2009 | 12:26 PM
The Hurro-Urartians should not be overlooked in considering candidates for the Harappan upper class -- which lived separated from the masses on raised, ritual-purity platforms.
A connection to Easter Island script is not entirely ludicrous (as serious academics generally hold). The whole known history of the Malay islands is cultures (Islam, Hinduism) arriving by sea from India. The Harappans had sea trade, and a small Harappan kingdom could have survived in Indonesia for 2000 years (confer Bali) before exiles made their way across the Pacific and built those West Eurasian-featured statues.
The claims of Hindu nationalists that Indus script records Sanskrit are extremely dubious -- as the basis of their reasoning is the religious and not scientific notion that Sanskrit arose indigenously in India and is the (20,000 year old) original human language from which all others descend. Scientifically, it's possible that Indo-Europeans could have filtered in from Central Asia in time for the Indus Civilization, but in that era their language would have to have been at least a somewhat earlier phase of Indo-European than Vedic Sanskrit.
Although a Dravidian identity for the Harappan script is entirely likely, it should also be noted that in the hands of Indian researchers this (consciously or unconsciously) ties Pakistani territory to India -- in contrast to the possibility of a more western origin for the Harappans, which would reinforce Pakistan's separate and Middle Eastern-derived identity.
Whatever the language recorded by the Indus script, this may be only the speech of a ruling, or merchant, or craftsman class, in a society multi-lingual by class and region.
Posted by John on July 22,2009 | 06:37 AM
Very interesting and we can understand that if you hace an interest, any branch of science/knowledge could be applied to understand.Definitely Indus civilisation has played a major role and with contributors like Dr.Rao can throw more lioght in future years.
Posted by Prof.M.C.Muralirangan on July 21,2009 | 05:14 PM
Quite an enlightening article! The script can be deciphered for the sheer fun of it, rather than for bolstering any chauvinist agenda. And as Rajesh rightly says it is India's heritage, rather than any state or language's.
Posted by Yelamanda on July 20,2009 | 01:55 AM
David,
You missed that there was a vested move by a group of academics in the West to dismiss this work, because they would rather show us that non-western cultures were less advanced.
Posted by SB on July 20,2009 | 11:47 PM
Thanks, David, for an inspiring article. This will surely inspire many young scientists to get involved in unraveling the mysteries of our treasured world heritage. Science has to become meaningful in everyone's understanding of his or her identity.
Posted by kalyan on July 19,2009 | 10:19 PM
It is too bad that the author of this article didn't bother to talk to anyone in Indus studies about the work of Rao et al., whose work was thoroughly discredited almost immediately by a series of well-known computational linguists -- who quickly demonstrated that "conditional entropy" cannot distinguish linguistic from nonlinguistic symbols -- and the present author and his colleagues, who presented new evidence that the Indus symbols were not part of a writing system at a major Indus conference in Kyoto, Japan, in May. Of the roughly 40 Indus researchers and linguists attending that conference, Rao and his colleagues did not have a single defender.
For our initial refutation of Rao et al. published within hours of publication of his paper, see:
http://www.safarmer.com/Refutation3.pdf .
For data further showing -- using far more data than Rao used in his paper -- showing that "conditional entropy" cannot distinguish linguistic from nonlinguistic signs or even what language families a language belongs to, see:
http://www.safarmer.com/more.on.Rao.pdf
The latter document also has links to scathing discussions by the influential computational linguists Mark Liberman and Fernando Pereira that further undermine the work of Rao et al. You'll also find there a link to the well-known 2004 paper by me and my colleagues (Michael Witzel of Harvard and Richard Sproat of Oregon Health and Science University) that Rao et al. claimed to "refute".
Discussion of the odd handling of the evidence in Rao's work has been discussed widely in the archaeological and computational community, and even in public discussions on Wikipedia. The big question is why Smithsonian magazine would publish this article on Rao et al. without doing a little fact checking first and without talking to Rao's critics: that is not responsible scientific reporting and badly misleads the public.
Steve Farmer, Ph.D.
Palo Alto, California
http://www.safarmer.com
Posted by Steve Farmer, Ph.D. on July 19,2009 | 05:18 PM