Why Google Flu Trends Can’t Track the Flu (Yet)

The vaunted big data project falls victim to periodic tweaks in Google’s own search algorithms

Image via Lance McCord

In 2008, Google announced an intriguing new service called Google Flu Trends. Engineers at the company had observed that certain search queries (such as those including the words "fever" or cough") seemed to spike every flu season. Their idea was to use the frequency of these searches to calculate nationwide flu rates faster than could be done with conventional data (which generally takes a few weeks to collect and analyze), letting people know when to take extra precautions to avoid getting the virus.

Media outlets (this reporter included) rushed to congratulate Google on such an insightful, innovative and disruptive use of big data. The only problem? Google Flu Trends hasn't performed very well.

The service has consistently overestimated flu rates, when compared to conventional data collected afterward by the CDC, estimating the incidence of flu to be higher than it actually was for 100 out of 108 weeks between August 2011 and September 2013. In January 2013, when national flu rates peaked but Google Flu Trends estimates were twice as high as the real data, its inaccuracy finally started garnering press coverage.

The most common explanation for the discrepancy has been that Google hasn't taken into account the uptick in flu-related queries that occur as a result of the media-driven flu hysteria that occurs every winter. But this week in Science, a group of social scientists led by David Lazer propose an alternate explanation: that Google's own tweaks to its search algorithm are to blame.

It's admittedly hard for outsiders to analyze Google Flu Trends, because the company doesn't make public the specific search terms it uses as raw data, or the particular algorithm it uses to convert the frequency of these terms into flu assessments. But the researchers did their best to infer the terms by using Google Correlate, a service that allows you to look at the rates of particular search terms over time.

When the researchers did this for a variety of flu-related queries over the past few years, they found that a couple key searches (those for flu treatments, and those asking how to differentiate the flu from the cold) tracked more closely with Google Flu Trends' estimates than with actual flu rates, especially when Google overestimated the prevalence of the ailment. These particular searches, it seems, could be a huge part of the inaccuracy problem.

There's another good reason to suspect this might be the case. In 2011, as part of one of its regular search algorithm tweaks, Google began recommending related search terms for many queries (including listing a search for flu treatments after someone Googled many flu-related terms) and in 2012, the company began providing potential diagnoses in response to symptoms in searches (including listing both "flu" and "cold" after a search that included the phrase "sore throat," for instance, perhaps prompting a user to search for how to distinguish between the two). These tweaks, the researchers argue, likely artificially drove up the rates of the searches they identified as responsible for Google's overestimates.

Of course, if this hypothesis were true, it wouldn't mean Google Flu Trends is inevitably doomed to inaccuracy, just that it needs to be updated to take into account the search engine's constant changes. But Lazer and the other reserachers argue that tracking the flu from big data is a particularly difficult problem.

A huge proportion of the search terms that correlate with CDC data on flu rates, it turns out, are caused not by people getting the flu, but by a third factor that affects both searching patterns and flu transmission: winter. In fact, the developers of Google Flu Trends reported coming across particular terms—those related to high school basketball, for instance—that were correlated with flu rates over time but clearly had nothing to do with the virus.

Over time, Google engineers manually removed many terms that correlate with flu searches but have nothing to do with flu, but their model was clearly still too dependent on non-flu seasonal search trends—part of the reason why Google Flu Trends failed to reflect the 2009 epidemic of H1N1, which happened during summer. Especially in its earlier versions, Google Flu Trends was "part flu detector, part winter detector," the authors of the Science paper write.

But all of this can be a lesson for the use of big data in projects like Google Flu Trends, rather than a blanket indictment of it, the researchers say. If properly updated to take into account tweaks to Google's own algorithm, and rigorously analyzed to remove purely seasonal factors, it could be useful in documenting nationwide flu rates—especially when combined with conventional data.

As a test, the researchers created a model that combined Google Flu Trends data (which is essentially real-time, but potentially inaccurate) with two-week old CDC data (which is dated, because it takes time to collect, but could still be somewhat indicative of current flu rates). Their hybrid matched the actual and current flu data much more closely than Google Flu Trends alone, and presented a way of getting this information much faster than waiting two weeks for the conventional data. 

"Our analysis of Google Flu demonstrates that the best results come from combining information and techniques from both sources," Ryan Kennedy, a University of Houston political science professor and co-author, said in a press statement. "Instead of talking about a 'big data revolution,' we should be discussing an 'all data revolution.'"

Get the latest Science stories in your inbox.