Explore 300 Terabytes of CERN Data Now Free to Download

CERN’s latest data dump includes raw information from the Large Hadron Collider

A CMS collision event as seen in the built-in event display on the CERN Open Data Portal. (CERN)

The Large Hadron Collider (LHC) is one of the scientific community’s most impressive tools. By firing particles at each other in a 17-mile-long device, scientists have unlocked all sorts of secrets of the physical world, from the existence of the Higgs-Boson particle to new forms of exotic matter. Now, anyone can take a look at how the LHC explores the universe thanks to a massive public data dump from the European Organization for Nuclear Research (CERN).

Late last week, CERN published more than 300 terabytes of data gathered from the LHC’s operations online for free. The information is a mix of raw and processed data, with the intention that everyone from high school students to up-and-coming physicists can take and use this information in their own studies, Andrew Liptak reports for Gizmodo.

“As scientists, we should take the release of data from publicly funded research very seriously,” CERN physicist Salvatore Rappoccio says in a statement. “In addition to showing good stewardship of the funding we have received, it also provides a scientific benefit to our field as a whole.”

The CERN data includes 100 terabytes of raw information gathered during 2011 by the LHC’s Compact Muon Solenoid (CMS) detector, which analyzes particle collisions for a variety of experiments including the search for the Higgs-Boson and dark matter. While that might seem like a daunting amount of information in and of itself, it is only about half of the raw data gathered by the CMS detector during 2011 alone, James Vincent reports for The Verge. Even so, the release contains raw data from about 250 trillion particle collisions.

“Once we’ve exhausted our exploration of the data, we see no reason not to make them available publicly,” physicist Kati Lassila-Perini, who is in charge of preserving data from the CMS detector, said in a statement. “The benefits are numerous, from inspiring high-school students to the training of the particle physicists of tomorrow. And personally, as CMS’s data-preservation co-ordinator, this is a crucial part of ensuring the long-term availability of our research data.”

CERN has released raw data to the public in the past, but this is far and away the largest raw dump the research institution has ever released. The last time CERN made raw data from its experiments publically available was in 2014, when researchers published 27 terabytes of data on the internet.

The data can be either downloaded or analyzed using online tools developed by CERN researchers. It also comes in two forms: the entire dataset formatted in the same way that professional physicists use, or narrowed down to the data that captures the most significant particle behavior that the CMS recorded at the time, Christopher Groskopf reports for Quartz.

While CERN scientists have already analyzed the entirety of the data, that doesn’t mean that they have learned everything there is to know about the datasets—and anyone can download it for free. In the past, outside researchers have both confirmed CERN’s findings from independent analysis of their data, as well as used it in ways the original researchers didn’t expect. Even so, it will probably help to have a background in advanced physics to make heads or tails of the information. 

But even if you don’t have a doctorate in physics, making this data open to the public might serve to break down some of the mystery surrounding one of the world’s most advanced physics laboratories.

