Keeping you current

The Library of Congress Will Stop Archiving Twitter

Because tweets have become too long and too numerous, the Library will only archive tweets of ‘historic value”


Back in 2010, no one expected that the hub of the United States's political discourse would soon shift to Twitter, the social messaging application that gave users the opportunity to “microblog” 140-character messages (though that has recently been raised to a breezy 280 characters). At the time, Twitter began sending the Library of Congress every public tweet ever sent, even going back to its earliest days of existence in 2006. After 12 years of grabbing every single hot take, fast-food feud, racist re-tweet, Russian bot and weird musing of Twitter star dril, the Library has had enough. Harper Neidig at The Hill reports that the LOC announced yesterday that after December 31, it will only collect Tweets it deems of historic importance.    

In a white paper on the topic, the Library cites several reasons for the change. First, the volume of tweets has grown dramatically since an agreement was first signed with Twitter seven years ago, making management of the collection burdensome. The nature of tweets has also changed. The library only receives the texts of the tweets and does not receive any images, videos or animated gifs associated with them. Over time, as that has become a bigger part of Twitter culture, the collection has lost a lot of content and context.

The Library also cites the recent expansion of the tweet character limit as a reason for the change, explaining that Twitter is morphing and may change more in the future. “The Library generally does not collect comprehensively. Given the unknown direction of social media when the gift was first planned, the Library made an exception for public tweets,” the Library explains in the paper. “With social media now established, the Library is bringing its collecting practice more in line with its collection policies.”

Currently, the 12-year archive of Tweets is not publically accessible and the LOC has no current timetable for when it might be available. They now say it will serve as a snapshot of a the first 12 years of an emerging form of social communication, as if the Library had every telegraph ever sent during the first 12 years of that technology.

The move was not completely out of the blue. Andrew McGill at The Atlantic explains that the LOC did not have the proper resources or experience for the project and had no engineers working full time on the tweets. The Library more or less tossed batches of unprocessed tweets, 500 million produced every day, into a server to be dealt with at a later date. “This is a warning as we start dealing with big data—we have to be careful what we sign up for,” Michael Zimmer of the University of Wisconsin-Milwaukee tells McGill. “When libraries didn’t have the resources to digitize books, only a company the size of Google was able to put the money and the bodies into it. And that might be where the Library of Congress is stuck.”  

Back in 2010, the number of tweets was about one-tenth of current traffic, and the “retweet” function was still new and threads weren’t active. Over time, however, tweets embedded in threads, photos and videos and the new character limit have made each tweet bigger and the volume of daily data staggering.  By 2013, McGill reports,  the Library already admitted it was struggling and said that conducting one search of the 2006 to 2010 tweet archive would take 24 hours on the LOC’s current system.

In the original agreement, the Library agreed to embargo the tweets for six months and to remove any deleted tweets and private tweets. Researchers were excited to access the data, but have been disappointed by the lack of public access. Still, some hope the Library will eventually find a way to make the Tweets accessible, which could be very valuable to sociologists, psychologists, political scientists and other researchers.

“I’m no Ph.D., but it boggles my mind to think what we might be able to learn about ourselves and the world around us from this wealth of data,” LOC former director of communications Matt Raymond wrote when the partnership was announced. “And I’m certain we’ll learn things that none of us now can even possibly conceive.”

Even though we don’t have the archives to look through, Twitter has still taught us lots of things. Regular people can be more hilarious than the best comedians. They can also be abysmally dumb. Nazis still exist and have no problem expressing their horrific thoughts. Trolls will ruin any conversation, no matter how banal. And of course brevity is the soul of wit. And witlessness in equal measure.  

About Jason Daley

Jason Daley is a Madison, Wisconsin-based writer specializing in natural history, science, travel, and the environment. His work has appeared in Discover, Popular Science, Outside, Men’s Journal, and other magazines.

Read more from this author |

Comment on this Story

comments powered by Disqus