• Smithsonian
    Institution
  • Travel
    With Us
  • Smithsonian
    Store
  • Smithsonian
    Channel
  • goSmithsonian
    Visitors Guide
  • Air & Space
    magazine

Smithsonian.com

  • Subscribe
  • History & Archaeology
  • Science
  • Ideas & Innovations
  • Arts & Culture
  • Travel & Food
  • At the Smithsonian
  • Photos
  • Videos
  • Games
  • Shop
  • Archaeology
  • U.S. History
  • World History
  • Today in History
  • Document Deep Dives
  • The Jetsons
  • National Treasures
  • Paleofuture
  • History & Archaeology

Can Computers Decipher a 5,000-Year-Old Language?

A computer scientist is helping to uncover the secrets of the inscribed symbols of the Indus

| | | Reddit | Digg | Stumble | Email |
  • By David Zax
  • Smithsonian.com, July 20, 2009, Subscribe
View More Photos »
Indus script
Over the decades, archaeologists have turned up a great many artifacts from the Indus civilization, including stamp sealings, amulets and small tablets. (Robert Harding / Photo Library)

Photo Gallery (1/4)

Rajesh Rao

Explore more photos from the story

More from Smithsonian.com

  • Common English Words

The Indus civilization, which flourished throughout much of the third millennium B.C., was the most extensive society of its time. At its height, it encompassed an area of more than half a million square miles centered on what is today the India-Pakistan border. Remnants of the Indus have been found as far north as the Himalayas and as far south as Mumbai. It was the earliest known urban culture of the subcontinent and it boasted two large cities, one at Harappa and one at Mohenjo-daro. Yet despite its size and longevity, and despite nearly a century of archaeological investigations, much about the Indus remains shrouded in mystery.

What little we do know has come from archaeological digs that began in the 1920s and continue today. Over the decades, archaeologists have turned up a great many artifacts, including stamp sealings, amulets and small tablets. Many of these artifacts bear what appear to be specimens of writing—engraved figures resembling, among other things, winged horseshoes, spoked wheels, and upright fish. What exactly those symbols might mean, though, remains one of the most famous unsolved riddles in the scholarship of ancient civilizations.

There have been other tough codes to crack in history. Stumped Egyptologists caught a lucky break with the discovery of the famed Rosetta stone in 1799, which contained text in both Egyptian and Greek. The study of Mayan hieroglyphics languished until a Russian linguist named Yury Knorozov made clever use of contemporary spoken Mayan in the 1950s. But there is no Rosetta stone of the Indus, and scholars don’t know which, if any, languages may have descended from that spoken by the Indus people.

About 22 years ago, in Hyderabad, India, an eighth-grade student named Rajesh Rao turned the page of a history textbook and first learned about this fascinating civilization and its mysterious script. In the years that followed, Rao’s schooling and profession took him in a different direction—he wound up pursuing computer science, which he teaches today at the University of Washington in Seattle—but he monitored Indus scholarship carefully, keeping tabs on the dozens of failed attempts at making sense of the script. Even as he studied artificial intelligence and robotics, Rao amassed a small library of books and monographs on the Indus script, about 30 of them. On a nearby bookshelf, he also kept the cherished eighth-grade history textbook that introduced him to the Indus.

“It was just amazing to see the number of different ideas people suggested,” he says. Some scholars claimed the writing was a sort of Sumerian script; others situated it in the Dravidian family; still others thought it was related to a language of Easter Island. Rao came to appreciate that this was “probably one of the most challenging problems in terms of ancient history.”

As attempt after attempt failed at deciphering the script, some experts began to lose hope that it could be decoded. In 2004, three scholars argued in a controversial paper that the Indus symbols didn’t have linguistic content at all. Instead, the symbols may have been little more than pictograms representing political or religious figures. The authors went so far as to suggest that the Indus was not a literate civilization at all. For some in the field, the whole quest of trying to find language behind those Indus etchings began to resemble an exercise in futility.

A few years later, Rao entered the fray. Until then, people studying the script were archaeologists, historians, linguists or cryptologists. But Rao decided to coax out the secrets of the Indus script using the tool he knew best—computer science.

On a summer day in Seattle, Rao welcomed me into his office to show me how he and his colleagues approached the problem. He set out a collection of replicas of clay seal impressions that archaeologists have turned up from Indus sites. They are small—like little square chocolates—and most of them feature an image of an animal beneath a series of Indus symbols. Most samples of the Indus script are miniatures like these, bearing only a few characters; no grand monoliths have been discovered. Scholars are uncertain of the function of the small seals, Rao told me, but one theory is that they may have been used to certify the quality of traded goods. Another suggests that the seals might have been a way of ensuring that traders paid taxes upon entering or leaving a city—many seals have been found among the ruins of gate houses, which might have functioned like ancient toll booths.


The Indus civilization, which flourished throughout much of the third millennium B.C., was the most extensive society of its time. At its height, it encompassed an area of more than half a million square miles centered on what is today the India-Pakistan border. Remnants of the Indus have been found as far north as the Himalayas and as far south as Mumbai. It was the earliest known urban culture of the subcontinent and it boasted two large cities, one at Harappa and one at Mohenjo-daro. Yet despite its size and longevity, and despite nearly a century of archaeological investigations, much about the Indus remains shrouded in mystery.

What little we do know has come from archaeological digs that began in the 1920s and continue today. Over the decades, archaeologists have turned up a great many artifacts, including stamp sealings, amulets and small tablets. Many of these artifacts bear what appear to be specimens of writing—engraved figures resembling, among other things, winged horseshoes, spoked wheels, and upright fish. What exactly those symbols might mean, though, remains one of the most famous unsolved riddles in the scholarship of ancient civilizations.

There have been other tough codes to crack in history. Stumped Egyptologists caught a lucky break with the discovery of the famed Rosetta stone in 1799, which contained text in both Egyptian and Greek. The study of Mayan hieroglyphics languished until a Russian linguist named Yury Knorozov made clever use of contemporary spoken Mayan in the 1950s. But there is no Rosetta stone of the Indus, and scholars don’t know which, if any, languages may have descended from that spoken by the Indus people.

About 22 years ago, in Hyderabad, India, an eighth-grade student named Rajesh Rao turned the page of a history textbook and first learned about this fascinating civilization and its mysterious script. In the years that followed, Rao’s schooling and profession took him in a different direction—he wound up pursuing computer science, which he teaches today at the University of Washington in Seattle—but he monitored Indus scholarship carefully, keeping tabs on the dozens of failed attempts at making sense of the script. Even as he studied artificial intelligence and robotics, Rao amassed a small library of books and monographs on the Indus script, about 30 of them. On a nearby bookshelf, he also kept the cherished eighth-grade history textbook that introduced him to the Indus.

“It was just amazing to see the number of different ideas people suggested,” he says. Some scholars claimed the writing was a sort of Sumerian script; others situated it in the Dravidian family; still others thought it was related to a language of Easter Island. Rao came to appreciate that this was “probably one of the most challenging problems in terms of ancient history.”

As attempt after attempt failed at deciphering the script, some experts began to lose hope that it could be decoded. In 2004, three scholars argued in a controversial paper that the Indus symbols didn’t have linguistic content at all. Instead, the symbols may have been little more than pictograms representing political or religious figures. The authors went so far as to suggest that the Indus was not a literate civilization at all. For some in the field, the whole quest of trying to find language behind those Indus etchings began to resemble an exercise in futility.

A few years later, Rao entered the fray. Until then, people studying the script were archaeologists, historians, linguists or cryptologists. But Rao decided to coax out the secrets of the Indus script using the tool he knew best—computer science.

On a summer day in Seattle, Rao welcomed me into his office to show me how he and his colleagues approached the problem. He set out a collection of replicas of clay seal impressions that archaeologists have turned up from Indus sites. They are small—like little square chocolates—and most of them feature an image of an animal beneath a series of Indus symbols. Most samples of the Indus script are miniatures like these, bearing only a few characters; no grand monoliths have been discovered. Scholars are uncertain of the function of the small seals, Rao told me, but one theory is that they may have been used to certify the quality of traded goods. Another suggests that the seals might have been a way of ensuring that traders paid taxes upon entering or leaving a city—many seals have been found among the ruins of gate houses, which might have functioned like ancient toll booths.

Rao and his colleagues didn’t seek to work miracles—they knew that they didn't have enough information to decipher the ancient script—but they hypothesized that by using computational methods, they could at least begin to establish what sort of writing the Indus script was: did it encode language, or not? They did this using a concept called “conditional entropy.”

Despite the imposing name, conditional entropy is a fairly simple concept: it is a measure of the amount of randomness in a sequence. Consider our alphabet. If you were to take Scrabble tiles and toss them in the air, you might find any old letter turning up after any other. But in actual English words, certain letters are more likely to occur after others. A q in English is almost always followed by a u. A t may be followed by an r or e, but is less likely to be followed by an n or a b.

Rao and his collaborators—an international group including computer scientists, astrophysicists and a mathematician—used a computer program to measure the conditional entropy of the Indus script. Then they measured the conditional entropy of other types of systems—natural languages (Sumerian, Tamil, Sanskrit, and English), an artificial language (the computer programming language Fortran) and non-linguistic systems (human DNA sequences, bacterial protein sequences, and two artificial datasets representing high and low extremes of conditional entropy). When they compared the amount of randomness in the Indus script with that of the other systems, they found that it most closely resembled the rates found in the natural languages. They published their findings in May in the journal Science.

If it looks like a language, and it acts like a language, then it probably is a language, their paper suggests. The findings don’t decipher the script, of course, but they do sharpen our understanding of it, and have lent reassurance to those archaeologists who had been working under the assumption that the Indus script encodes language.

After publishing the paper, Rao got a surprise. The question of which language family the script belongs to, it turns out, is a sensitive one: because of the Indus civilization’s age and significance, many contemporary groups in India would like to claim it as a direct ancestor. For instance, the Tamil-speaking Indians of the south would prefer to learn that the Indus script was a kind of proto-Dravidian, since Tamil is descended from proto-Dravidian. Hindi speakers in the north would rather it be an old form of Sanskrit, an ancestor of Hindi. Rao’s paper doesn’t conclude which language family the script belongs to, though it does note that the conditional entropy is similar to Old Tamil—causing some critics to summarily “accuse us of being Dravidian nationalists,” says Rao. “The ferocity of the accusations and attacks was completely unexpected."

Rao sometimes takes relief in returning to the less ferociously contested world of neuroscience and robotics. But the call of the Indus script remains alluring, and “what used to be a hobby is now monopolizing more than a third of my time,” he says. Rao and his colleagues are now looking at longer strings of characters than they analyzed in the Science paper. “If there are patterns,” says Rao, “we could come up with grammatical rules. That would in turn give constraints to what kinds of language families” the script might belong to.

He hopes that his future findings will speak for themselves, inciting less rancor from opponents rooting for one region of India versus another. For his part, when Rao talks about what the Indus script means to him, he tends to speak in terms of India as a whole. “The heritage of India would be considerably enriched if we were able to understand the Indus civilization,” he says. Rao and his collaborators are working on it, one line of source code at a time.


Single Page 1 2 Next »

    Subscribe now for more of Smithsonian's coverage on history, science and nature.


Related topics: Computer Science Communication Computers Indus


| | | Reddit | Digg | Stumble | Email |
 

Add New Comment


Name: (required)

Email: (required)

Comment:

Comments are moderated, and will not appear until Smithsonian.com has approved them. Smithsonian reserves the right not to post any comments that are unlawful, threatening, offensive, defamatory, invasive of a person's privacy, inappropriate, confidential or proprietary, political messages, product endorsements, or other content that might otherwise violate any laws or policies.

Comments (39)

It seems obvious to me. The first one says rhinoceros and the second one says camel and the third one says whatever that horned beast is. See. problem solved. :-)

Posted by darin on March 24,2013 | 10:00 PM

Maybe we are making it too complicated. It might just be the first appearance of money. Rather than having to trade or tender the actual animal, for the sake of convenience, these tiles were used as currency. The symbols could indicate how many you are trading,the name of the beast or the owners name. Maybe all three. The wheel symbol might mean "we deliver" or "cash and carry".Just kidding.

Posted by Jim on September 29,2012 | 12:21 AM

Here's a thought, maybe the animals are a sign of status, whereas a warrior might have a lion on or some other fierce animal on their tablet. This being like what we do today: If you are a young child, you may pick up a Barney/Clifford book to read, or a young adult who is more interested in Shakespeare might make a different selection of what to read or watch than an senior. My point is, maybe only warriors read warrior tablets, peasants to peasant tablets. So there could be a string of different "languages" here, but some how they are all related.

Posted by Lydia on May 25,2012 | 12:14 PM

I think this is amazing. But maybe you could string together the blocks of symbols in an animal way. Meaning, there are two symbol blocks with an elephant on each one of them. Put together the elephants, and then perhaps there's a lion tile. Don't put the lion tile with the elephants because they would not normally cooperate in a habitat together? CCould it be a + b = c? c being the answer to the code. Or maybe there's a pattern: elephant, giselle, lion, eagle. Maybe? Just an idea. I'm really interested in the Indus Valley code, so I'm writing a research paper about it for school. I'll try and post it here later. Thanks:)

Posted by Lydia on May 25,2012 | 12:05 PM

Hindu Culture has been in existence even before 3000 or 4000 BC , there are many temples in south india which date back to 10,000 BC . Even to build such architecture , how developed should the people be , and how vast should the culture be ?

This Harappa and Mohenjadaro has been made up by the British so that Indians do not follow the Ramayana and Mahabharata as true events but just as myths. This would allow the British to exploit them by just saying that they were stone age people and they really did not have a rich culture.

See the link below which shows a Fine monument in the Mahabharata period and which was way before Harappa and Mohenjadaro civilizations and way more advanced in architecture.

www.youtube.com/watch?v=2CbTyxy1MWo

Posted by Rohan on May 26,2011 | 01:44 PM

I would be skeptical of comparing what appear to be government seals stamped in clay with known languages using only entropy. It seems like it might be one of the less stable statistics of a language.

The demands on a language and script can cause it to adjust entropy very rapidly.

I remember this anecdote: that legal English has less entropy than common English, closer to that of computer language, and that the difference in entropy roughly amounted to the difference in length between 8.5x11 and legal pad document sizes. Presumably the abbreviated lol-talk used in cell phone text communications has much higher entropy than English.

If you were working on a writing system in a time when writing was costly, and it would be used repeatedly for some official capacity, you would expect it to have different entropy than the spoken language.

The need for fewer characters would push it toward higher entropy, but a need for specificity and accuracy would push it toward lower entropy.

Posted by Michael Rule on February 15,2011 | 06:06 PM

First of all, Mr. Witzel is not the only Sanskritist in the world. It has become a habit for Michael Witzel and Steve Farmer to redicule people who try to say that IVC is literate. If it is not literate how can they use mathematical weights with some precision and build bricks with some precision and store food in the granaries and trade with literate sumerian civilization. Little common sense will dictate that IVC is literate. They traded with Sumerians and had their own writing system which can be attested from so many round seals found in and around Dilmun.

Posted by Anonymous on August 16,2010 | 08:19 PM

The only way the Indus script controversy can be successfully resolved is by doing extensive archaeological studies in the Indus sites and analyzing more material than what is available now. Indus script has lot of similarities with the archaic Sumerian script which is deciphered. Don't really understand why this hype and controversy is created by some scholars . Also sanskrit has not originated in Central Asia as some Euro-Centric Indologists try to picturize to the public. If this is the case Sanskrit should have flourished in Central Asia or Europe for last 3000-4000 years which is not the case. You only see Sanskrit surviving to some extent in India in the hands of Traditional Brahmins and in some Vedic schools.

Posted by Anonymous on August 16,2010 | 07:46 PM

Noone is saying that his findings with conditional entrophy is an undisputed and flawless solution to the Indus Valley script, but it's a different view taken on after 90+years of excavation and study of the area and its "language". Obviously, traditional methods have not worked in the symbol system's inner workings, so it's time to use unconventiional methods to get anywhere.

After saying all that, I have a belief of my own.

I don't consider myself a mathematician, and can't even begin to fathom what kind of math he used to use conditional entrophy, but I believe that the study leaves us back where we started.

Linguists and archaeologists believe that these writings represent either one of two things: Symbolic or Religious meanings. The conditional entrophy is figuring out whether some symbols repeat after another in a repetitive manner to indicate that it has the qualities of a "natural language".

Through their studies they have figured out that this written system has basic grammar indicative to other natural languages, but they fail to see that if these carvings are actually religious symbols(found mostly at gates) that means that their test will result in the same answer whether or not the symbols represent a language. Although function words behave and repete in different ways compared to content words, symbolic meanings may repeat one after another therefore creating the effect of a language.

I go back to the point that I have no idea how this math system works, but assuming that it looks for repetion in function words, it's no different than starting over again.

Posted by Sunwoo Yang on December 31,2009 | 06:46 PM

Could these little "tablets" be coins and the inscriptions numbers and not letters??

Posted by eloisa munter on August 24,2009 | 06:28 AM

The frenzy with which Witzel and his cohorts are so eager to lambast even the slightest signs that the Indus Valley civiliation was a literate civlization indicates the angst that they feel whenever the civilizational aspects of the IVC are brought up. The question is why they feel so threatened by something that happened more than 5 millennia ago .

But therin lies the answer. The occidental has never felt comfortable with the notion that the cradle of civilization lay in the IV. If they were not a literate civilization they certainly did all the things that literacy was supposed to help them do like town planning, sewage systems, a vast area of urbanity covering over 1.5 square miles from central India to the borders of Persia. That they could do all this withut the help of a written means of communicaton is beyond comprehension and does not take away from their immense achievements.

Finally what happened to them The answer is simple. The descendants of the IVC are the modern Indians .

Posted by Kosla Vepa on August 13,2009 | 11:22 PM

It is amazing the vitriol that springs forth from professors when their myopic worlds are challenged. I challenged a well-known University of Chicago professor on an issue of business valuation some years ago and drew a heated response that could be felt from coast to coast. Now I do not pretend to know whether Farmer is correct in his criticism, or Rao correct in his computer analysis, but this article has certainly raised the level of the debate. Good hunting to all and here is hoping the Rosetta Stone of the Indus is found soon - before someone has a heart attack.

Posted by Jim Alerding on August 10,2009 | 08:36 AM

Early in history did the Indus trade with any separate entity having its own language and coinage? Could the Indus have drawn some linquistic usages from other trade entities? Did languages there develop along with business and as a "method of doing business"? Is there any long distance between the symbols and even those of current faiths and their histories there? I have read that languages can develop from faith and trade.

Posted by George samuels on August 9,2009 | 05:14 PM

I read the article and was a bit surprised at how sure some letter writers were that Rao's work had been discredited. Conditional entropy doesn't to me doesn't seem to be a concept that proves or disproves that a script is a written language. It seems to me to be just a description of the placement of symbols. Since a symbol in a pictogram might represent a whole word, the analogy between letter placement and picture placement is a bit of a stretch. It would seem to me that a better analogy might be that of word placement, since in a language like English some words are used as objects and only follow prepositions or verbs and different types of sentence structures have different types of words in predictable places. The same is true in other languages such as Denai (sp? Navaho)and Japanese and French. (Or so I've heard...)His research may suggest that the scripts are a language, but the article doesn't seem to say that he says they have to be a language. The fuss to derail the concept seems a little harsh.

Posted by Keith Wellman on August 9,2009 | 02:43 PM

The contention that somehow there are (to paraphrase) "those who have a vested interest in showing that non-Western cultures are less advanced" is astounding. European and American researchers have spent centuries developing and publicizing insights into the sophisticated, highly technical, and highly literate cultures of thousands and thousands of years ago.

No one that I've heard of in academic circles in 'the west' has felt threatened by learning that China, Persia, Egypt, etc. were far advanced of the state their own geographies were in thousands of years ago.

Posted by John Jay on August 8,2009 | 12:24 PM

East Asia, China in particular, had the good fortune of having a scholar like Joseph Needham research and defend the claims of the Chinese scholarly community against the detractors. Not only is South Asia very fragmented but it lacks a scholar the caliber of Needham to defend the attempts its scholarly community makes to discover its own past.

The eminent scholars Farmer, Witzel, and Sproat can afford to be a little more sympathetic with the manner in which they present their argument. The very fact that it is published, regardless of what its detractors think of it, means that South Asian archaeology is coming of age and such attempts are not career suicide for scholars anymore. I don't have the volume of Science but it should be thought of as normal scholarly discourse to have an attempt made at decipherment of a script and to have it published in a scholarly science periodical.

That said South Asia can use a scholar like Joseph Needham, especially when its scientific endeavors themselves come under undue criticism.

Posted by Kong Bai on August 6,2009 | 04:51 PM

Rao and friends have published another paper in the Proceedings of the National Academy of Sciences, another prestigious journal. Looks like its open access and does not need a subscription to read:

http://www.pnas.org/content/early/2009/08/04/0906237106.abstract

That says something, unless of course there is a deep conspiracy in which all prestigious journals are colluding to undermine standards in computational linguistics.

The archaeological material in Farmer-Sproat-Witzel has been thoroughly discredited by an authority in archaeology, Prof. Massimo Vidale in a hilarious aping of the polemical style of FSW

http://www.docstoc.com/docs/document-preview.aspx?doc_id=9163376

The weight of evidence looks definitely in favour of language now.

Posted by Harappan on August 6,2009 | 03:19 PM

Farmer et al.,
Please publish your rebuttal in a well reviewed high impact scientific journal instead of random blog sites, that will give credence to your griping.

Posted by vprasad on August 6,2009 | 01:50 PM

today's hindustan times is to carry a news item regarding rao's "discovery". there will be a time to argue about their methodology presently we are happy that his group is seriously doing something towards solving this puzzle. i do not know much about computational diciphering but isn't it progress to see an addition in the long list of attempts? --gopal chippalkatti

Posted by gopal chippalkatti on August 5,2009 | 05:19 PM

Actually the refutation that Steve Farmer was referring to did NOT appear in said journal. That was our 2004 paper. The refutation appeared online and has not been published (or, indeed, reviewed). That said, again, surely the issue is whether the arguments are valid, not where they appeared or how rigorously they purported to be reviewed.

But Chris Cornuelle makes an essential point: defenders of the non-script hypothesis would (or should) be ready to throw in the towel if a clear example of a long "bilingual" text in, say, Sumerian and "Harappan" were discovered.

The Phaistos Disk is good one. Text too short to decipher, yet that hasn't stopped dozens of would-be decipherers from trying. And as with the Indus Valley stuff, the popular science press has from time-to-time been duped by convincing-looking demonstrations.

Posted by Richard Sproat on July 28,2009 | 10:36 AM

From my exposure to this issue, all I can say is that this is what one expects from insufficient data. Phaistos disk, anyone? Throwing software at a problem like this one seems promising, but e.g. any neural network needs adequate training data, and my guess is that this topic needs more spadework. A bit of clay from Iraq showing a shipping manifest in both "Harappan" and cuneiform would be first on anyone's wish list - assuming "Harappan" is a language, of course ...

Posted by Chris Cornuelle on July 27,2009 | 05:39 PM

While Richard Sproat makes an important point that just because Science publishes a paper after review, it must not be assumed that the paper's deductions are correct, I wonder why in Steve Farmer's first comment, the one attacking Smithsonian very strongly for accepting Rao's work uncritically, and referring to his own and Richard Sproat-Michael Witzel's refutation, he did not mention that this refutation appeared in the Witzel-edited Electronic Journal of Vedic Studies. That does weaken Mr Sproat's claim considerably. And what makes it worse is Mr Farmer's statement that the refutation "appeared within a few hours" of the publication of Rao's work. Even if Rao's claims are all wrong, Mr Sproat and his colleagues do not exactly cover themselves with glory - Mr Farmer's point that Rao et al had no defenders at the conference stands juxtaposed against the lack of any kind of review of the refuation, and yes, even Galileo had no supporters.
Possibly there are better reasons to disbelieve Rao et al but the moment these have been submerged by this vitriolic campaign of his critics

Posted by Shakti Sinha on July 27,2009 | 12:44 PM

Thinking more about this, I am puzzled by the number of times I have seen the point made about the Rao paper that it was reviewed before appearing in Science. As if somehow that means it necessarily must be correct. In contrast, we never harped on the point about whether our paper was reviewed, not because it wasn't, but because surely the real issue is whether the points made in the paper are valid. In serious academic debate, whether a paper was reviewed or not is irrelevant: one looks at the quality of the arguments.

In any case we know that reviewed papers in Science can turn out to be wrong. Justeson and Kaufman's 1993 article in Science on Epi-Olmec was surely reviewed, but few people in the field now believe their decipherment. (I happen to think the reviewers of *that* paper did a more credible job than the reviewers of the Rao et al. paper, but no matter.) More famously, papers by Hendrick Schon of Bell Labs were also reviewed -- but turned out to be fraud, and were subsequently withdrawn by Science.

Anyone who believes that a paper that appears in Science must be correct because its very appearance implies rigorous and correct reviews, is deluding themselves.

Posted by Richard Sproat on July 26,2009 | 09:38 AM

Computational linguists have sat on the Indus script problem for decades now and their main contribution is the paper by Farmer, Sproat, and Witzel claiming the script is not a script at all, a claim that has now been independently debunked by respected scholars in the field like Asko Parpola, Mahadevan, Massimo Vidale, and others.

It is refreshing to see new interdisciplinary approaches being applied to the problem and I commend scientists like Rao et al for exploring such approaches.

Posted by Don on July 24,2009 | 12:41 AM

I agree with Jake, discoveries of what is, or what isn't, and how things are proven, or how they aren't, is the advancement of truth, which is what most of us strive for in the end. Thanks for the article

Posted by Gordy Byrne on July 24,2009 | 08:04 PM

Well there's "reviewed", and then "reviewed by people who know the technical area". I have no doubt the Science article was reviewed. But was it reviewed by a computational linguist who knows about statistical language modeling? Doubtful. See Pereira's blog, where he notes that this is not the first time that Science has published something on language that fails to meet the standards of the field. Presumably this is different in other fields, but it does not seem that the reviewing standards for Science in things that deal with language is particularly high.

Posted by Richard Sproat on July 24,2009 | 02:28 PM

Great piece of journalism!

Going over the posts above, I see one by Farmer and another by "Word Geek" aka Diana Gainer (who seems to be an echo chamber for Farmer, repeating the same allegations).

Farmer and "Word Geek" in their obfuscated posts above speak as if the paper by Rao and colleagues was published without being reviewed. I see that it was published in Science journal, which is widely regarded as one of the most rigorously reviewed journals in science today. Having tried to publish there myself, I can attest that their reviewing standards are very high and all technical details are thoroughly checked before any paper makes it to the publication stage.

Speaking of standards of reviewing, a colleague who works in this field informs me that Farmer and co-authors Sproat and Witzel published their original paper on this topic in a website called "Electronic journal of vedic studies" whose editor-in-chief is none other than Witzel himself. One wonders if that paper was even reviewed!

Posted by Erik Lutz on July 23,2009 | 12:09 AM

I'd be interested if anyone can find a computational linguist who knows anything about statistical language processing and who thinks that Rao et al's paper actually demonstrates anything.

Posted by Richard Sproat on July 23,2009 | 04:15 PM

Reading over these comments I see the post by Diana Gainer (I had also seen her Word Geek article on this). As far as I can see she pretty much sums up the state of affairs accurately.

The only thing that I would add is the point also made in Liberman and Pereira's blogs, namely that it is not hard to show that conditional entropy is not evidence for "structure" per se, and that it cannot distinguish linguistic from non-linguistic systems. So over and above the broader issues, Rao et al's work simply fails to show what it purports to show even in the narrow way they have been pitching it.

As for accusations of Western chauvinism, or however one wants to put some of the claims that have been flying around, that's just plain silly at a number of levels. As far as anyone has demonstrated, the earliest literate people were the Sumerians. Last time I checked, the Sumerians inhabited Mesopotamia, which is not normally considered to be part of the "West". So, by arguing that the Indus Valley civilization was not literate, one is not committed to a view of Western supremacy.

Posted by Richard Sproat on July 23,2009 | 02:46 PM

John's remarks rely on the notion that India was invaded from the Northwest in pre-Vedic times, an idea that has its strongest basis in the fact that India was indeed successfully invaded from both the near and far Northwest in more recent times, first by land and then by sea. However, extending this idea that India was continually invaded from the Northwest, far into the distant past is likely more mirage than mirror.

The real problem is that John assumes that Europeans cannot possibly have originated in India and that he cannot consider the possibility that the ancestors of present day white people migrated to the Northwest from India in prehistoric times. This organized resistance to the idea that Sanskrit is indigenous to India is also a post colonial refusal by scholars of European origin to admit the possibility of their sharing genetic material with Indians -- and Pakistanis. Rather than admitting that invasion may not be the only means of connection between the racial and cultural groups involved, these scholars apparently plough through data they can only understand dimly, since most have no lifelong knowledge of the descendant cultures in the same regions, and of course must then come up with circuitous logic to bolster their essentially untenable positions.

Posted by Nina on July 22,2009 | 02:56 PM

Farmer has been ranting about this for months now, trolling the web and obsessively planting links to his "rebuttals" where he can.

Enough already! As far as I am concerned, Rao et al have adequately responded to the so-called criticisms with their response on Rao's website.

Great article, by the way!

Posted by Don on July 22,2009 | 01:49 PM

Wow, so much conflict. I don't pretend to know a lot about the topic, but I know that any advance in it is one step closer to the truth, whether it proves what not to do or what is incorrect, or what is a full-on advance. Thanks for the enlightening article.

Posted by Jake on July 22,2009 | 01:45 PM

As I mentioned in an article as the Word Geek in Examiner.com, Rao and colleagues did not actually compare the conditional entropy of Indus symbols as found on Indus seal stones and other objects to cuneiform symbols, to actual Tamul symbols, to actual Vincha symbols, or to actual Near Eastern symbols, although they claimed to do all these things. In their online supplement, they reveal that they made up their data sets for comparison to Vincha and Near Eastern symbols, which show results that are not valid, as a result. The comparisons with other languages are based on transcriptions, which are not valid for other, more complex reasons. Other civilizations were not fully literate, for instance the Aztecs in Central America, and the West African kingdoms in Benin at the time of European arrival there. These both used some symbols but did not write their languages fully. The Indus civilization still may not have been fully literate either, as Farmer et al note. Rao et al haven't proved anything.

Posted by Diana Gainer on July 22,2009 | 12:26 PM

The Hurro-Urartians should not be overlooked in considering candidates for the Harappan upper class -- which lived separated from the masses on raised, ritual-purity platforms.

A connection to Easter Island script is not entirely ludicrous (as serious academics generally hold). The whole known history of the Malay islands is cultures (Islam, Hinduism) arriving by sea from India. The Harappans had sea trade, and a small Harappan kingdom could have survived in Indonesia for 2000 years (confer Bali) before exiles made their way across the Pacific and built those West Eurasian-featured statues.

The claims of Hindu nationalists that Indus script records Sanskrit are extremely dubious -- as the basis of their reasoning is the religious and not scientific notion that Sanskrit arose indigenously in India and is the (20,000 year old) original human language from which all others descend. Scientifically, it's possible that Indo-Europeans could have filtered in from Central Asia in time for the Indus Civilization, but in that era their language would have to have been at least a somewhat earlier phase of Indo-European than Vedic Sanskrit.

Although a Dravidian identity for the Harappan script is entirely likely, it should also be noted that in the hands of Indian researchers this (consciously or unconsciously) ties Pakistani territory to India -- in contrast to the possibility of a more western origin for the Harappans, which would reinforce Pakistan's separate and Middle Eastern-derived identity.

Whatever the language recorded by the Indus script, this may be only the speech of a ruling, or merchant, or craftsman class, in a society multi-lingual by class and region.

Posted by John on July 22,2009 | 06:37 AM

Very interesting and we can understand that if you hace an interest, any branch of science/knowledge could be applied to understand.Definitely Indus civilisation has played a major role and with contributors like Dr.Rao can throw more lioght in future years.

Posted by Prof.M.C.Muralirangan on July 21,2009 | 05:14 PM

Quite an enlightening article! The script can be deciphered for the sheer fun of it, rather than for bolstering any chauvinist agenda. And as Rajesh rightly says it is India's heritage, rather than any state or language's.

Posted by Yelamanda on July 20,2009 | 01:55 AM

David,

You missed that there was a vested move by a group of academics in the West to dismiss this work, because they would rather show us that non-western cultures were less advanced.

Posted by SB on July 20,2009 | 11:47 PM

Thanks, David, for an inspiring article. This will surely inspire many young scientists to get involved in unraveling the mysteries of our treasured world heritage. Science has to become meaningful in everyone's understanding of his or her identity.

Posted by kalyan on July 19,2009 | 10:19 PM

It is too bad that the author of this article didn't bother to talk to anyone in Indus studies about the work of Rao et al., whose work was thoroughly discredited almost immediately by a series of well-known computational linguists -- who quickly demonstrated that "conditional entropy" cannot distinguish linguistic from nonlinguistic symbols -- and the present author and his colleagues, who presented new evidence that the Indus symbols were not part of a writing system at a major Indus conference in Kyoto, Japan, in May. Of the roughly 40 Indus researchers and linguists attending that conference, Rao and his colleagues did not have a single defender.

For our initial refutation of Rao et al. published within hours of publication of his paper, see:

http://www.safarmer.com/Refutation3.pdf .

For data further showing -- using far more data than Rao used in his paper -- showing that "conditional entropy" cannot distinguish linguistic from nonlinguistic signs or even what language families a language belongs to, see:

http://www.safarmer.com/more.on.Rao.pdf

The latter document also has links to scathing discussions by the influential computational linguists Mark Liberman and Fernando Pereira that further undermine the work of Rao et al. You'll also find there a link to the well-known 2004 paper by me and my colleagues (Michael Witzel of Harvard and Richard Sproat of Oregon Health and Science University) that Rao et al. claimed to "refute".

Discussion of the odd handling of the evidence in Rao's work has been discussed widely in the archaeological and computational community, and even in public discussions on Wikipedia. The big question is why Smithsonian magazine would publish this article on Rao et al. without doing a little fact checking first and without talking to Rao's critics: that is not responsible scientific reporting and badly misleads the public.

Steve Farmer, Ph.D.

Palo Alto, California
http://www.safarmer.com

Posted by Steve Farmer, Ph.D. on July 19,2009 | 05:18 PM



Advertisement


Most Popular

  • Viewed
  • Emailed
  • Commented
  1. When an Army of Artists Fooled Hitler
  2. The Rise and Fall and Rise of Zahi Hawass
  3. For 40 Years, This Russian Family Was Cut Off From All Human Contact, Unaware of WWII
  4. The True Story of the Battle of Bunker Hill
  5. Seven Famous People Who Missed the Titanic
  6. We Had No Idea What Alexander Graham Bell Sounded Like. Until Now
  7. A Brief History of the Salem Witch Trials
  8. Women Spies of the Civil War
  9. Tattoos
  10. Bodybuilders Through the Ages
  1. When an Army of Artists Fooled Hitler
  2. The Dark Side of Thomas Jefferson
  3. The Rise and Fall and Rise of Zahi Hawass
  4. The Little-Known Legend of Jesus in Japan
  5. We Had No Idea What Alexander Graham Bell Sounded Like. Until Now
  1. A Brief History of the Salem Witch Trials
  2. The Rise and Fall and Rise of Zahi Hawass
  3. Who Was Cleopatra?
  4. The Secrets of Ancient Rome’s Buildings
  5. Harriet Tubman's Amazing Grace
  6. Women Spies of the Civil War
  7. Europe’s Hypocritical History of Cannibalism
  8. Gobekli Tepe: The World’s First Temple?
  9. We Had No Idea What Alexander Graham Bell Sounded Like. Until Now
  10. For 40 Years, This Russian Family Was Cut Off From All Human Contact, Unaware of WWII

View All Most Popular »

Advertisement

Follow Us

Smithsonian Magazine
@SmithsonianMag
Follow Smithsonian Magazine on Twitter

Sign up for regular email updates from Smithsonian.com, including daily newsletters and special offers.

In The Magazine

June 2013

  • The Mind on Fire
  • Burning Desire
  • 10 Epiphanies
  • Rocket Fuel
  • Accounting for Taste

View Table of Contents »






First Name
Last Name
Address 1
Address 2
City
State   Zip
Email


Travel with Smithsonian




Smithsonian Store

Stars and Stripes Throw

Our exclusive Stars and Stripes Throw is a three-layer adaption of the 1861 “Stars and Stripes” quilt... $65



View full archiveRecent Issues


  • Jun 2013


  • May 2013


  • Apr 2013

Newsletter

Sign up for regular email updates from Smithsonian magazine, including free newsletters, special offers and current news updates.

Subscribe Now

About Us

Smithsonian.com expands on Smithsonian magazine's in-depth coverage of history, science, nature, the arts, travel, world culture and technology. Join us regularly as we take a dynamic and interactive approach to exploring modern and historic perspectives on the arts, sciences, nature, world culture and travel, including videos, blogs and a reader forum.

Explore our Brands

  • goSmithsonian.com
  • Smithsonian Air & Space Museum
  • Smithsonian Student Travel
  • Smithsonian Catalogue
  • Smithsonian Journeys
  • Smithsonian Channel
  • About Smithsonian
  • Contact Us
  • Advertising
  • Subscribe
  • RSS
  • Topics
  • Member Services
  • Copyright
  • Site Map
  • Privacy Policy
  • Ad Choices

Smithsonian Institution