Wikipedia May Track Flu Levels Better Than The CDC And Google, Based On Articles Being Searched
Thanks to a new model that analyzes search traffic based on statistics from Wikipedia, researchers from Boston’s Children's Hospital and Harvard Medical School have discovered a potentially quicker way to spot emerging flu trends in the United States.
The study, published in PLoS Computational Biology, comes on the heels of the newly introduced Google Flu Trends, which tracks incidence rates across the country to predict when the most severe influenza seasons tend to arise in a given year. In addition, the Centers for Disease Control and Prevention (CDC) releases its yearly data to alert Americans when to expect the most contagious months. These tracking systems, however, may not be the most superior sources of flu data.
Internet-based forms of data collection have become hugely popular as online activity continues to accelerate. Now scientists suggest the neurotic, sometimes hypochondria-fueled searches users regularly perform may actually be good for something: Tracking the flu through Wikipedia searches yielded estimates of flu levels in the U.S. two weeks earlier than the CDC’s efforts and produced a peak season 17 percent more accurate than Google’s estimates.
“What we were really looking for was to find a way to make these estimates using data that is completely open and freely available to everybody,” co-researcher Dr. David McIver told Medical Daily. This, he says, is what separates his model from some of the other tools.
The stakes in crafting a more perfect model are decidedly high. Despite the vaccine’s wide availability, influenza continues to total between 3,000 and 50,000 deaths each year in the U.S. The CDC releases weekly and annual reports; however, as the researchers point out, “because it can take a long time to collect and analyze all of this information, the data that is being reported each week is typically between 1 to 2 weeks old at the time of publishing.”
To optimize the accuracy and timeliness of this data collection, McIver and his co-researcher, Dr. John Brownstein, took to Wikipedia, currently the largest online encyclopedia and one of the largest websites in existence, with nearly 506 million visitors per month, 27 billion total page views, and 17,800 new articles each day. Using publicly-accessible data via Wikimedia Statistics, they collected search data on 32 influenza-related articles, including “Avian Influenza,” “Fever,” and “Tamiflu” between December 2007 and August 2013.
Their results showed an impressive reliability that can potentially be accessed in near-real-time. “In practice, if this Wikipedia-based … surveillance system were to be implemented on a more permanent basis,” the team writes, “it is possible that updates to the Wikipedia-estimated proportion [of] activity in the United States could be available on a daily or even hourly basis, although this application has not yet been explored.”
Seasonal flu isn’t the only avenue open to Wikipedia-based estimates. More lethal forms, such as H1N1, which led to the 2009 pandemic, and the H5N1 avian flu, whose genetic instability still threatens certain populations, may also be compatible with the tracking model.
“We're hoping that with this new method of influenza monitoring, we can harness publicly available data to help people get accurate, near-real-time information about the level of disease burden in the population,” the team explained in a statement.
Ultimately, McIver says, the model isn’t intended to compete with the CDC or Google Flu Trends; it’s to cooperate with them. “It’s important for everyone to understand that these tools are not replacements for traditional things like the CDC,” he said. “They’re meant to be used one alongside the other.”
Source: McIver D, Brownstein J. Wikipedia Usage Estimates Prevalence of Influenza-Like Illness in the United States in Near Real-Time. PLoS Computational Biology. 2014.