In 2015, a high-profile outbreak of measles at Disneyland shocked parents into a fundamental change in perspective on vaccinations. In the years prior, the perceived need for the MMR vaccine had dropped, and with it, the percent of kids who were protected from measles. After hundreds of people got sick, prompting parents to vaccinate, rates climbed again.
Maybe it should be obvious that skipping vaccinations would lead to more sick kids, but most American parents these days have never had to worry about measles. There’s a dynamic interaction between perceived risk of disease and perceived risk of vaccines, explains Chris Bauch. A professor of applied mathematics at the University of Waterloo, Bauch looked at social media trends before and after the Disneyland outbreak, and noticed that, statistically speaking, he could track the public sentiment toward vaccines and see the heightened disease risk before it happened. He and his collaborators published the work in the Proceedings of the National Academy of Sciences in November.
“Everyone has some intuition for tipping points from see-saws. If you have more weight on one side than the other, it tips down on the heavier side. But as you add more and more weight to the opposing side, eventually it’ll tip over,” he says. “These tipping points exhibit characteristic signals before they occur … the question is, can we look for the presence of a tipping point leading to a large decline in vaccine uptake, like a vaccine scare?”
Vaccine scares are just one example. Epidemiologists, computer scientists and health professionals are now applying computer learning to data from new sources — especially social media — to create predictive models similar to the CDC’s, but much faster. Tweets about sore throats or doctor’s visits, Google searches for cold remedies, and even your Fitbit or Apple Watch can all give hints to the health trends in an area, if matched to location data. And people are tracking it and uploading it.
“Suddenly we have access to some of the data,” says Marcel Salathe, head of the digital epidemiology lab at Switzerland’s EPFL institute. “That to me is really the bigger picture of what’s happening here, because to some extent this is a profound change of the data flow of traditional epidemiology.”
For Bauch and Salathe, who collaborated on the study, Twitter was the primary source of data. They built a bot to search for tweets mentioning vaccines and assess the sentiment of those tweets — whether they indicated acceptance or doubt of vaccines. Then, they looked at the results as a complex system with a feedback loop, applying a mathematical model to see if it would retroactively predict the vaccination slow-down that led to the Disneyland outbreak. It did.
In systems like this, certain measurable signals occur as the system approaches a tipping point. In this case, the researchers saw a “critical slowing down,” where sentiment about vaccines was slower to return to normal after a news article or a tweet from a celebrity influenced it. Being able to see this lead-up to the tipping point means that, given location data, public health officials could build campaigns targeting areas that are at increased risk of a vaccine scare, and thus an outbreak.
There are barriers to using publically available data from social media sources, of course, including privacy, though the researchers who use Twitter data point out that it’s sort of assumed that if you tweet about your health, someone may read it. It can also be challenging to build computer programs to parse the information contained, points out Graham Dodge, co-founder and CEO of Sickweather, an app-based service that generates health forecasts and live maps of illness reports.
Dodge and his cofounders collaborated with researchers from Johns Hopkins to analyze billions of tweets mentioning illnesses. The process involved separating intentional, qualified reports (“I have the flu”) from more vague comments (“I feel sick”) and even misleading phrasing (“I’ve got Bieber fever”). They’ve also had to compensate for absent or inaccurate location data — all the Twitter users who simply mark “Seattle” as their location, for example, are dropped into a small downtown Seattle zip code, rather than spread throughout the city.
Sickweather launched in 2013 with a mobile app that allows users to report illnesses directly to Sickweather, as well as view conditions in their location. Clinical researchers and pharmaceutical companies use the app’s predictive model to anticipate disease peaks several weeks ahead of the CDC, but with comparable accuracy.
“Once this is in the hands of millions of people, instead of 270,000, how this plays out at scale could really stave the spread of illness in many places,” says Dodge.
Other projects have tried different approaches. Flu Near You captures symptoms by a self-reported survey, GoViral has been sending a kit for self-analysis of mucus and saliva, and Google Flu Trends leveraged that company’s data to track the flu, and published its results in Nature, though the project shut down after a misfire in 2013. The experiment, in which Google used flu-related searches to estimate how many people were sick, overestimated prevalence of the disease, possibly because media coverage of a bad flu season caused people to search flu-related terms more often.
While Twitter can be used to track the diseases themselves, Salathe says some of the challenges mentioned by Dodge explain why the meta-analysis of vaccine acceptance makes more sense than self-reported illnesses.
“I’m not sure Twitter is the best data source for that, because people give such weird statements about themselves when they have to self diagnose,” says Salathe. “It’s not actually so much about tracking the disease itself, but rather tracking the human response to it.”
GoViral has a further advantage, explains Rumi Chunara, the NYU computer science and engineering professor who runs that project. It relies not on self-reporting, but on lab tests that definitively assess the spread of viruses and compares them to symptom reports.
“There’s a lot of opportunity, but there’s challenges as well, and I think that’s where a lot of the science could be focused,” says Chunara. How does it complement clinical data? How do we reduce noise and apply the information? What more specific fields or human behavior can we look at?
Newer technologies — especially fitness trackers and other direct measures of health — will give more, better data that’s less subjective, she says.
“A lot of times, we get this buzz of, this is something awesome, social media health,” she says. “The question of it getting used is something I think the whole community should be looking towards.”