The Parable of Google Flu: Traps in Big Data Analysis

The Parable of Google Flu: Traps in Big Data

In February 2013, Google Flu Trends (GFT) made headlines but not for a reason that Google executives or the creators of the flu tracking system would have hoped. Nature reported that GFT was predicting more than double the proportion of doctor visits for influenza-like illness (ILI) than the Centers for Disease Control and Prevention (CDC), which bases its estimates on surveillance reports from laboratories across the United States (1, 2). This happened despite the fact that GFT was built to predict CDC reports. Given that GFT is often held up as an exemplary use of big data (3, 4), what lessons can we draw from this error?

2 replies »

  1. Well and I suppose obvious flaw is that GFT only looks at search tearms and assumes a veyr strong correlation with actual disease spread. This doesn’t account for hysteria and other factors (plain interest in flu) that might have one searching about flu. Also what about people that won’t use Google and just go to doc? And that could be anyone from tech savvy 20-somethings to OAPs.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s