• Smithsonian
    Institution
  • Travel
    With Us
  • Smithsonian
    Store
  • Smithsonian
    Channel
  • goSmithsonian
    Visitors Guide
  • Air & Space
    magazine

Smithsonian.com

  • Subscribe
  • History & Archaeology
  • Science
  • Ideas & Innovations
  • Arts & Culture
  • Travel & Food
  • At the Smithsonian
  • Photos
  • Videos
  • Games
  • Shop
  • Archaeology
  • U.S. History
  • World History
  • Today in History
  • Document Deep Dives
  • The Jetsons
  • National Treasures
  • Paleofuture
  • History & Archaeology

Can Computers Decipher a 5,000-Year-Old Language?

A computer scientist is helping to uncover the secrets of the inscribed symbols of the Indus

| | | Reddit | Digg | Stumble | Email |
  • By David Zax
  • Smithsonian.com, July 20, 2009, Subscribe
View More Photos »
Indus script
Over the decades, archaeologists have turned up a great many artifacts from the Indus civilization, including stamp sealings, amulets and small tablets. (Robert Harding / Photo Library)

Photo Gallery (1/4)

Rajesh Rao

Explore more photos from the story

More from Smithsonian.com

  • Common English Words

The Indus civilization, which flourished throughout much of the third millennium B.C., was the most extensive society of its time. At its height, it encompassed an area of more than half a million square miles centered on what is today the India-Pakistan border. Remnants of the Indus have been found as far north as the Himalayas and as far south as Mumbai. It was the earliest known urban culture of the subcontinent and it boasted two large cities, one at Harappa and one at Mohenjo-daro. Yet despite its size and longevity, and despite nearly a century of archaeological investigations, much about the Indus remains shrouded in mystery.

What little we do know has come from archaeological digs that began in the 1920s and continue today. Over the decades, archaeologists have turned up a great many artifacts, including stamp sealings, amulets and small tablets. Many of these artifacts bear what appear to be specimens of writing—engraved figures resembling, among other things, winged horseshoes, spoked wheels, and upright fish. What exactly those symbols might mean, though, remains one of the most famous unsolved riddles in the scholarship of ancient civilizations.

There have been other tough codes to crack in history. Stumped Egyptologists caught a lucky break with the discovery of the famed Rosetta stone in 1799, which contained text in both Egyptian and Greek. The study of Mayan hieroglyphics languished until a Russian linguist named Yury Knorozov made clever use of contemporary spoken Mayan in the 1950s. But there is no Rosetta stone of the Indus, and scholars don’t know which, if any, languages may have descended from that spoken by the Indus people.

About 22 years ago, in Hyderabad, India, an eighth-grade student named Rajesh Rao turned the page of a history textbook and first learned about this fascinating civilization and its mysterious script. In the years that followed, Rao’s schooling and profession took him in a different direction—he wound up pursuing computer science, which he teaches today at the University of Washington in Seattle—but he monitored Indus scholarship carefully, keeping tabs on the dozens of failed attempts at making sense of the script. Even as he studied artificial intelligence and robotics, Rao amassed a small library of books and monographs on the Indus script, about 30 of them. On a nearby bookshelf, he also kept the cherished eighth-grade history textbook that introduced him to the Indus.

“It was just amazing to see the number of different ideas people suggested,” he says. Some scholars claimed the writing was a sort of Sumerian script; others situated it in the Dravidian family; still others thought it was related to a language of Easter Island. Rao came to appreciate that this was “probably one of the most challenging problems in terms of ancient history.”

As attempt after attempt failed at deciphering the script, some experts began to lose hope that it could be decoded. In 2004, three scholars argued in a controversial paper that the Indus symbols didn’t have linguistic content at all. Instead, the symbols may have been little more than pictograms representing political or religious figures. The authors went so far as to suggest that the Indus was not a literate civilization at all. For some in the field, the whole quest of trying to find language behind those Indus etchings began to resemble an exercise in futility.

A few years later, Rao entered the fray. Until then, people studying the script were archaeologists, historians, linguists or cryptologists. But Rao decided to coax out the secrets of the Indus script using the tool he knew best—computer science.

On a summer day in Seattle, Rao welcomed me into his office to show me how he and his colleagues approached the problem. He set out a collection of replicas of clay seal impressions that archaeologists have turned up from Indus sites. They are small—like little square chocolates—and most of them feature an image of an animal beneath a series of Indus symbols. Most samples of the Indus script are miniatures like these, bearing only a few characters; no grand monoliths have been discovered. Scholars are uncertain of the function of the small seals, Rao told me, but one theory is that they may have been used to certify the quality of traded goods. Another suggests that the seals might have been a way of ensuring that traders paid taxes upon entering or leaving a city—many seals have been found among the ruins of gate houses, which might have functioned like ancient toll booths.


The Indus civilization, which flourished throughout much of the third millennium B.C., was the most extensive society of its time. At its height, it encompassed an area of more than half a million square miles centered on what is today the India-Pakistan border. Remnants of the Indus have been found as far north as the Himalayas and as far south as Mumbai. It was the earliest known urban culture of the subcontinent and it boasted two large cities, one at Harappa and one at Mohenjo-daro. Yet despite its size and longevity, and despite nearly a century of archaeological investigations, much about the Indus remains shrouded in mystery.

What little we do know has come from archaeological digs that began in the 1920s and continue today. Over the decades, archaeologists have turned up a great many artifacts, including stamp sealings, amulets and small tablets. Many of these artifacts bear what appear to be specimens of writing—engraved figures resembling, among other things, winged horseshoes, spoked wheels, and upright fish. What exactly those symbols might mean, though, remains one of the most famous unsolved riddles in the scholarship of ancient civilizations.

There have been other tough codes to crack in history. Stumped Egyptologists caught a lucky break with the discovery of the famed Rosetta stone in 1799, which contained text in both Egyptian and Greek. The study of Mayan hieroglyphics languished until a Russian linguist named Yury Knorozov made clever use of contemporary spoken Mayan in the 1950s. But there is no Rosetta stone of the Indus, and scholars don’t know which, if any, languages may have descended from that spoken by the Indus people.

About 22 years ago, in Hyderabad, India, an eighth-grade student named Rajesh Rao turned the page of a history textbook and first learned about this fascinating civilization and its mysterious script. In the years that followed, Rao’s schooling and profession took him in a different direction—he wound up pursuing computer science, which he teaches today at the University of Washington in Seattle—but he monitored Indus scholarship carefully, keeping tabs on the dozens of failed attempts at making sense of the script. Even as he studied artificial intelligence and robotics, Rao amassed a small library of books and monographs on the Indus script, about 30 of them. On a nearby bookshelf, he also kept the cherished eighth-grade history textbook that introduced him to the Indus.

“It was just amazing to see the number of different ideas people suggested,” he says. Some scholars claimed the writing was a sort of Sumerian script; others situated it in the Dravidian family; still others thought it was related to a language of Easter Island. Rao came to appreciate that this was “probably one of the most challenging problems in terms of ancient history.”

As attempt after attempt failed at deciphering the script, some experts began to lose hope that it could be decoded. In 2004, three scholars argued in a controversial paper that the Indus symbols didn’t have linguistic content at all. Instead, the symbols may have been little more than pictograms representing political or religious figures. The authors went so far as to suggest that the Indus was not a literate civilization at all. For some in the field, the whole quest of trying to find language behind those Indus etchings began to resemble an exercise in futility.

A few years later, Rao entered the fray. Until then, people studying the script were archaeologists, historians, linguists or cryptologists. But Rao decided to coax out the secrets of the Indus script using the tool he knew best—computer science.

On a summer day in Seattle, Rao welcomed me into his office to show me how he and his colleagues approached the problem. He set out a collection of replicas of clay seal impressions that archaeologists have turned up from Indus sites. They are small—like little square chocolates—and most of them feature an image of an animal beneath a series of Indus symbols. Most samples of the Indus script are miniatures like these, bearing only a few characters; no grand monoliths have been discovered. Scholars are uncertain of the function of the small seals, Rao told me, but one theory is that they may have been used to certify the quality of traded goods. Another suggests that the seals might have been a way of ensuring that traders paid taxes upon entering or leaving a city—many seals have been found among the ruins of gate houses, which might have functioned like ancient toll booths.

Rao and his colleagues didn’t seek to work miracles—they knew that they didn't have enough information to decipher the ancient script—but they hypothesized that by using computational methods, they could at least begin to establish what sort of writing the Indus script was: did it encode language, or not? They did this using a concept called “conditional entropy.”

Despite the imposing name, conditional entropy is a fairly simple concept: it is a measure of the amount of randomness in a sequence. Consider our alphabet. If you were to take Scrabble tiles and toss them in the air, you might find any old letter turning up after any other. But in actual English words, certain letters are more likely to occur after others. A q in English is almost always followed by a u. A t may be followed by an r or e, but is less likely to be followed by an n or a b.

Rao and his collaborators—an international group including computer scientists, astrophysicists and a mathematician—used a computer program to measure the conditional entropy of the Indus script. Then they measured the conditional entropy of other types of systems—natural languages (Sumerian, Tamil, Sanskrit, and English), an artificial language (the computer programming language Fortran) and non-linguistic systems (human DNA sequences, bacterial protein sequences, and two artificial datasets representing high and low extremes of conditional entropy). When they compared the amount of randomness in the Indus script with that of the other systems, they found that it most closely resembled the rates found in the natural languages. They published their findings in May in the journal Science.

If it looks like a language, and it acts like a language, then it probably is a language, their paper suggests. The findings don’t decipher the script, of course, but they do sharpen our understanding of it, and have lent reassurance to those archaeologists who had been working under the assumption that the Indus script encodes language.

After publishing the paper, Rao got a surprise. The question of which language family the script belongs to, it turns out, is a sensitive one: because of the Indus civilization’s age and significance, many contemporary groups in India would like to claim it as a direct ancestor. For instance, the Tamil-speaking Indians of the south would prefer to learn that the Indus script was a kind of proto-Dravidian, since Tamil is descended from proto-Dravidian. Hindi speakers in the north would rather it be an old form of Sanskrit, an ancestor of Hindi. Rao’s paper doesn’t conclude which language family the script belongs to, though it does note that the conditional entropy is similar to Old Tamil—causing some critics to summarily “accuse us of being Dravidian nationalists,” says Rao. “The ferocity of the accusations and attacks was completely unexpected."

Rao sometimes takes relief in returning to the less ferociously contested world of neuroscience and robotics. But the call of the Indus script remains alluring, and “what used to be a hobby is now monopolizing more than a third of my time,” he says. Rao and his colleagues are now looking at longer strings of characters than they analyzed in the Science paper. “If there are patterns,” says Rao, “we could come up with grammatical rules. That would in turn give constraints to what kinds of language families” the script might belong to.

He hopes that his future findings will speak for themselves, inciting less rancor from opponents rooting for one region of India versus another. For his part, when Rao talks about what the Indus script means to him, he tends to speak in terms of India as a whole. “The heritage of India would be considerably enriched if we were able to understand the Indus civilization,” he says. Rao and his collaborators are working on it, one line of source code at a time.


Single Page 1 2 Next »

    Subscribe now for more of Smithsonian's coverage on history, science and nature.


Related topics: Computer Science Communication Computers Indus


| | | Reddit | Digg | Stumble | Email |
 

Add New Comment


Name: (required)

Email: (required)

Comment:

Comments are moderated, and will not appear until Smithsonian.com has approved them. Smithsonian reserves the right not to post any comments that are unlawful, threatening, offensive, defamatory, invasive of a person's privacy, inappropriate, confidential or proprietary, political messages, product endorsements, or other content that might otherwise violate any laws or policies.

Comments (39)

+ View All Comments

It seems obvious to me. The first one says rhinoceros and the second one says camel and the third one says whatever that horned beast is. See. problem solved. :-)

Posted by darin on March 24,2013 | 10:00 PM

Maybe we are making it too complicated. It might just be the first appearance of money. Rather than having to trade or tender the actual animal, for the sake of convenience, these tiles were used as currency. The symbols could indicate how many you are trading,the name of the beast or the owners name. Maybe all three. The wheel symbol might mean "we deliver" or "cash and carry".Just kidding.

Posted by Jim on September 29,2012 | 12:21 AM

Here's a thought, maybe the animals are a sign of status, whereas a warrior might have a lion on or some other fierce animal on their tablet. This being like what we do today: If you are a young child, you may pick up a Barney/Clifford book to read, or a young adult who is more interested in Shakespeare might make a different selection of what to read or watch than an senior. My point is, maybe only warriors read warrior tablets, peasants to peasant tablets. So there could be a string of different "languages" here, but some how they are all related.

Posted by Lydia on May 25,2012 | 12:14 PM

I think this is amazing. But maybe you could string together the blocks of symbols in an animal way. Meaning, there are two symbol blocks with an elephant on each one of them. Put together the elephants, and then perhaps there's a lion tile. Don't put the lion tile with the elephants because they would not normally cooperate in a habitat together? CCould it be a + b = c? c being the answer to the code. Or maybe there's a pattern: elephant, giselle, lion, eagle. Maybe? Just an idea. I'm really interested in the Indus Valley code, so I'm writing a research paper about it for school. I'll try and post it here later. Thanks:)

Posted by Lydia on May 25,2012 | 12:05 PM

Hindu Culture has been in existence even before 3000 or 4000 BC , there are many temples in south india which date back to 10,000 BC . Even to build such architecture , how developed should the people be , and how vast should the culture be ?

This Harappa and Mohenjadaro has been made up by the British so that Indians do not follow the Ramayana and Mahabharata as true events but just as myths. This would allow the British to exploit them by just saying that they were stone age people and they really did not have a rich culture.

See the link below which shows a Fine monument in the Mahabharata period and which was way before Harappa and Mohenjadaro civilizations and way more advanced in architecture.

www.youtube.com/watch?v=2CbTyxy1MWo

Posted by Rohan on May 26,2011 | 01:44 PM

I would be skeptical of comparing what appear to be government seals stamped in clay with known languages using only entropy. It seems like it might be one of the less stable statistics of a language.

The demands on a language and script can cause it to adjust entropy very rapidly.

I remember this anecdote: that legal English has less entropy than common English, closer to that of computer language, and that the difference in entropy roughly amounted to the difference in length between 8.5x11 and legal pad document sizes. Presumably the abbreviated lol-talk used in cell phone text communications has much higher entropy than English.

If you were working on a writing system in a time when writing was costly, and it would be used repeatedly for some official capacity, you would expect it to have different entropy than the spoken language.

The need for fewer characters would push it toward higher entropy, but a need for specificity and accuracy would push it toward lower entropy.

Posted by Michael Rule on February 15,2011 | 06:06 PM

First of all, Mr. Witzel is not the only Sanskritist in the world. It has become a habit for Michael Witzel and Steve Farmer to redicule people who try to say that IVC is literate. If it is not literate how can they use mathematical weights with some precision and build bricks with some precision and store food in the granaries and trade with literate sumerian civilization. Little common sense will dictate that IVC is literate. They traded with Sumerians and had their own writing system which can be attested from so many round seals found in and around Dilmun.

Posted by Anonymous on August 16,2010 | 08:19 PM

The only way the Indus script controversy can be successfully resolved is by doing extensive archaeological studies in the Indus sites and analyzing more material than what is available now. Indus script has lot of similarities with the archaic Sumerian script which is deciphered. Don't really understand why this hype and controversy is created by some scholars . Also sanskrit has not originated in Central Asia as some Euro-Centric Indologists try to picturize to the public. If this is the case Sanskrit should have flourished in Central Asia or Europe for last 3000-4000 years which is not the case. You only see Sanskrit surviving to some extent in India in the hands of Traditional Brahmins and in some Vedic schools.

Posted by Anonymous on August 16,2010 | 07:46 PM

Noone is saying that his findings with conditional entrophy is an undisputed and flawless solution to the Indus Valley script, but it's a different view taken on after 90+years of excavation and study of the area and its "language". Obviously, traditional methods have not worked in the symbol system's inner workings, so it's time to use unconventiional methods to get anywhere.

After saying all that, I have a belief of my own.

I don't consider myself a mathematician, and can't even begin to fathom what kind of math he used to use conditional entrophy, but I believe that the study leaves us back where we started.

Linguists and archaeologists believe that these writings represent either one of two things: Symbolic or Religious meanings. The conditional entrophy is figuring out whether some symbols repeat after another in a repetitive manner to indicate that it has the qualities of a "natural language".

Through their studies they have figured out that this written system has basic grammar indicative to other natural languages, but they fail to see that if these carvings are actually religious symbols(found mostly at gates) that means that their test will result in the same answer whether or not the symbols represent a language. Although function words behave and repete in different ways compared to content words, symbolic meanings may repeat one after another therefore creating the effect of a language.

I go back to the point that I have no idea how this math system works, but assuming that it looks for repetion in function words, it's no different than starting over again.

Posted by Sunwoo Yang on December 31,2009 | 06:46 PM

Could these little "tablets" be coins and the inscriptions numbers and not letters??

Posted by eloisa munter on August 24,2009 | 06:28 AM

The frenzy with which Witzel and his cohorts are so eager to lambast even the slightest signs that the Indus Valley civiliation was a literate civlization indicates the angst that they feel whenever the civilizational aspects of the IVC are brought up. The question is why they feel so threatened by something that happened more than 5 millennia ago .

But therin lies the answer. The occidental has never felt comfortable with the notion that the cradle of civilization lay in the IV. If they were not a literate civilization they certainly did all the things that literacy was supposed to help them do like town planning, sewage systems, a vast area of urbanity covering over 1.5 square miles from central India to the borders of Persia. That they could do all this withut the help of a written means of communicaton is beyond comprehension and does not take away from their immense achievements.

Finally what happened to them The answer is simple. The descendants of the IVC are the modern Indians .

Posted by Kosla Vepa on August 13,2009 | 11:22 PM

It is amazing the vitriol that springs forth from professors when their myopic worlds are challenged. I challenged a well-known University of Chicago professor on an issue of business valuation some years ago and drew a heated response that could be felt from coast to coast. Now I do not pretend to know whether Farmer is correct in his criticism, or Rao correct in his computer analysis, but this article has certainly raised the level of the debate. Good hunting to all and here is hoping the Rosetta Stone of the Indus is found soon - before someone has a heart attack.

Posted by Jim Alerding on August 10,2009 | 08:36 AM

Early in history did the Indus trade with any separate entity having its own language and coinage? Could the Indus have drawn some linquistic usages from other trade entities? Did languages there develop along with business and as a "method of doing business"? Is there any long distance between the symbols and even those of current faiths and their histories there? I have read that languages can develop from faith and trade.

Posted by George samuels on August 9,2009 | 05:14 PM

I read the article and was a bit surprised at how sure some letter writers were that Rao's work had been discredited. Conditional entropy doesn't to me doesn't seem to be a concept that proves or disproves that a script is a written language. It seems to me to be just a description of the placement of symbols. Since a symbol in a pictogram might represent a whole word, the analogy between letter placement and picture placement is a bit of a stretch. It would seem to me that a better analogy might be that of word placement, since in a language like English some words are used as objects and only follow prepositions or verbs and different types of sentence structures have different types of words in predictable places. The same is true in other languages such as Denai (sp? Navaho)and Japanese and French. (Or so I've heard...)His research may suggest that the scripts are a language, but the article doesn't seem to say that he says they have to be a language. The fuss to derail the concept seems a little harsh.

Posted by Keith Wellman on August 9,2009 | 02:43 PM

+ View All Comments



Advertisement


Most Popular

  • Viewed
  • Emailed
  • Commented
  1. Myths of the American Revolution
  2. For 40 Years, This Russian Family Was Cut Off From All Human Contact, Unaware of WWII
  3. Seven Famous People Who Missed the Titanic
  4. Women Spies of the Civil War
  5. A Brief History of the Salem Witch Trials
  6. The History of the Short-Lived Independent Republic of Florida
  7. We Had No Idea What Alexander Graham Bell Sounded Like. Until Now
  8. Tattoos
  9. The True Story of the Battle of Bunker Hill
  10. Gobekli Tepe: The World’s First Temple?
  1. For 40 Years, This Russian Family Was Cut Off From All Human Contact, Unaware of WWII
  1. Women Spies of the Civil War
  2. Meet the Real-Life Vampires of New England and Abroad
  3. Seven Famous People Who Missed the Titanic
  4. The Great New England Vampire Panic
  5. The Space Race
  6. Document Deep Dive: The Heartfelt Friendship Between Jackie Robinson and Branch Rickey
  7. The Freedom Riders, Then and Now

View All Most Popular »

Advertisement

Follow Us

Smithsonian Magazine
@SmithsonianMag
Follow Smithsonian Magazine on Twitter

Sign up for regular email updates from Smithsonian.com, including daily newsletters and special offers.

In The Magazine

May 2013

  • Patriot Games
  • The Next Revolution
  • Blowing Up The Art World
  • The Body Eclectic
  • Microbe Hunters

View Table of Contents »






First Name
Last Name
Address 1
Address 2
City
State   Zip
Email


Travel with Smithsonian




Smithsonian Store

Stars and Stripes Throw

Our exclusive Stars and Stripes Throw is a three-layer adaption of the 1861 “Stars and Stripes” quilt... $65



View full archiveRecent Issues


  • May 2013


  • Apr 2013


  • Mar 2013

Newsletter

Sign up for regular email updates from Smithsonian magazine, including free newsletters, special offers and current news updates.

Subscribe Now

About Us

Smithsonian.com expands on Smithsonian magazine's in-depth coverage of history, science, nature, the arts, travel, world culture and technology. Join us regularly as we take a dynamic and interactive approach to exploring modern and historic perspectives on the arts, sciences, nature, world culture and travel, including videos, blogs and a reader forum.

Explore our Brands

  • goSmithsonian.com
  • Smithsonian Air & Space Museum
  • Smithsonian Student Travel
  • Smithsonian Catalogue
  • Smithsonian Journeys
  • Smithsonian Channel
  • About Smithsonian
  • Contact Us
  • Advertising
  • Subscribe
  • RSS
  • Topics
  • Member Services
  • Copyright
  • Site Map
  • Privacy Policy
  • Ad Choices

Smithsonian Institution