Wikipedia Positioned To Track Disease Outbreak: The Model That Could Rival Current Resources
I can still see the faces of my peers the minute after my college English professor said she loved to use Wikipedia. It’s a great place to start, she said. Knowing Wikipedia is an online encyclopedia where news anchors work “part-time as Death” and Spot the Dog “sips whiskey,” my peers and I were confused. But new research from PLoS Computational Biology suggests it was us, not our professor, who was wrong.
Since its launch in 2001, Wikipedia has become the sixth most visited site in the world. Researchers reported the site contains around 30 million articles in 287 languages and it serves roughly 850 million article requests per day. More importantly, it’s a free, open source of data that is gaining traction as an “effective and timely disease surveillance.”
“Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social Internet data, such as social media and search queries, are emerging,” researchers wrote. “These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness.”
In order to see if these challenges can be overcome, researchers used two data sources — Wikipedia article access logs and official disease incidence reports from the World Health Organization — to build a linear model to analyze around three years of data for seven diseases (cholera, dengue, Ebola, HIV/AIDs, influenza, plague, and tuberculosis) in nine different locations (Haiti, Brazil, Thailand, Uganda, China, Japan, Poland, United States, and Norway). Basically, the Internet keeps track of a user's health-related searches, and these searches can be captured and used to derive actionable information.
With the WHO's data and online traffic of select Wikipedia articles, researchers were able to warn against (forecast) incidences of disease at least 28 days ahead of time. The one excepetion were rates of tuberculosis in China.
"A global disease-forecasting system will change the way we respond to epidemics," Dr. Sara Del Valle, lead study author of the Los Alamos National Laboratory in New Mexico, said in a press release. "In the same way we check the weather each morning, individuals and public health officials can monitor disease incidence and plan for the future based on today's forecast. The goal of this research is to build an operational disease monitoring and forecasting system with open data and open source code. This paper shows we can achieve that goal."
Social media continues to get spun on its head as science looks to it as an impactful resource. Just last month, a study published in the journal Trends in Microbiology found social media users who are unaware of HIV precautions and treatments can learn from transparent users. Specifically, these users are encouraged to get tested, and they refer to these accounts as a rich source of psychological and health-related data. Suddenly 140 characters doesn't seem so bad.
Source: Generous N, Fairchild G, Deshpande A, Del Valle SY, Priedhorsky R. Global Disease Monitoring and Forecasting with Wikipedia. PLoS Computational Biology. 2014.