#DiseaseSpread

By Laura Spadanuta

01 March 2013

Print Issue: March 2013

A GROUP OF researchers is looking for a better way to use Twitter to detect the spread of disease by narrowing the field of users they study. It’s the latest in an area that in recent years has attracted several researchers who are trying to leverage the social media network’s ability to provide real-time monitoring.

In one example, Rumi Chunara, Jason R. Andrews, and John Brownstein of Harvard Medical School have assessed the effectiveness of using Twitter to gain epidemic information more quickly than it could be assessed using traditional paths. A look at Twitter updates during the 2010 Haiti cholera outbreak found that the outbreak could be identified from Twitter weeks before it would be publicly acknowledged by health authorities. Brownstein says that means that even in places, like Haiti, with lower levels of technology, it can still be an effective tool.

Henry Kautz, professor and chair of computer science at the University of Rochester, has also conducted research on Twitter monitoring and finds that it can yield a huge amount of important health data at almost no cost. Social media could “provide a very powerful tool that could revolutionize public health information and decision making,” he says.

Kautz is hoping the work will help public health organizations to quickly identify epidemics. He says it can also provide more information on correlations. For example, his work has looked at the connection between users who report feeling sick and also live near pollution.

Brownstein and Chunara work on HealthMap, a Web site that pulls from online sources and real-time surveillance to track disease outbreaks and other public health situations. Twitter is one of the sources of information for HealthMap. “We can produce a picture of the world of public health in a way that’s very different [from] what traditional public health is putting together in that we’re providing an early window into disease outbreaks and also finding things that were not being reported officially,” says Brownstein.

Twitter fits right into that, says Brownstein. Twitter is helpful because users who are self-reporting illnesses often provide geographic location information. The types of information that researchers can glean from Twitter include symptoms that might not be reported to doctors or acknowledged publicly.

However, there is a lot of “noise” on Twitter to sift through. Kautz points out that simply looking for words like “sick” will yield plenty of tweets that have nothing to do with illness. So the right combination of words has to be developed to ensure a high potential of actually finding tweets about emerging illnesses.

A recent approach is an attempt to narrow and refine the Twitter information even further to make sure it is high quality and easier to work with. The work, described in the paper Using Friends as Sensors to Detect Global-Scale Contagious Outbreaks, comes from researchers from American (including Harvard University, University of California, San Diego, and other organizations) and Spanish institutions.

In what the researchers call the “sensor hypothesis,” the key is to choose a group of highly connected Twitter followers as the group to watch in comparison with randomly selected users.

Highly connected tweeters were chosen by getting the random control group first, and then selecting friends of those users. According to the paper: “This procedure generates a sensor group with higher centrality than the control group because of the ‘friendship paradox’: high-degree individuals are more likely to be connected to a randomly chosen person than low-degree individuals. In other words, ‘your friends have more friends than you do.’”

The study finds that using a sensor group is more effective than studying large random sets of users. For example, the sensor groups provided earlier warning of the usage of certain hashtags, which is a way to mark topics in a tweet.

The study also found that there was the potential to distinguish how the hashtags were being spread between users. The study was awaiting peer review when this article went to press. The researchers hope to be able to generalize conditions in which this sort of analysis would be effective in the future.