Review - Big Data: A Revolution That Will Transform How We Live, Work, and Think

Articles —> Review - Big Data: A Revolution That Will Transform How We Live, Work, and Think

The concept of big data has revolutionized our outlook on how we can learn from data. As an example, in 2009 an article published by google1 suggested that the search engine could track the spread of the flu virus based upon keyword search terms and user locations (see Google Flu Trends). A remarkable feat in that a seemingly benign piece of data - search terms - can be used to help epidemiologists track the spread of disease. But google isn't the only one to harness data in large volumes to address a question or solve a problem. From the New York City planning department to corporate offices such as Google and Amazon, the capture of data is increasing at an astounding rate and the use and value of such data is appreciated more every day.

The book Big Data2 - A Revolution That Will Transform How We Live, Work, and Think provides a high level overview of the field of 'Big Data'. Big Data is filled with well written contemporary examples of how data is captured, how it has (or can be) used, and how it can be abused. The examples Big Data provides - whether new or known beforehand - are described in a way which make every topic fresh and - aside from some passages seemingly overly-impregnated with the phrase "Big Data" - the majority of the book is well written in an effective and clear manner. Sometimes one can gauge how effective a book is by what you've learned when you read the final page, and in this case I learned plenty - from learning how google's ReCaptcha service is used to aid in their optical character recognition to learning about data resources such as data.gov.

Big Data rightly emphasizes a major point that is always in need of further emphasis: the difference between correlation and causation. Machine learning often doesn't address the latter, rather it looks for the former - figuring out the what and not the why. The distinction between the two is important and points to a major phrase used often in statistics: correlation does not imply causation. Just because sales in ice cream is correlated with drowning rates does not imply one causes the other, and just because certain search terms are correlated with flu rates does not imply that those searching have the flu (they may be looking to prevent catching it).

While Big Data offers up story after story and fact after fact, the book can at times be frustrating: not for its lack of good writing or fascinating detail, but for its lack of easy to use referencing. Accompanied with an appendix of notes associated with each chapter, there are ample resources for further reading. Yet if one wishes to seek out a reference, read further into a subject, or simply know where the facts come from one is forced to spend time digging deep into a pile of references in the appendix (perhaps a task easier in paper copy than kindle edition).

Big Data is directed towards a very general audience, and for such a difficult subject I found it easy to read. Despite my frustrations with references and seemingly keyword injected passages, I would not hesitate to recommend Big Data to someone wishing to know a bit more about how data is being captured, used, and possibly abused.

  • 1Ginsberg et. al. (2009) Nature, 457, 1012-1014. http://www.nature.com/nature/journal/v457/n7232/full/nature07634.html
  • 2Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schonberger and Kenneth Cukier. ISBN-10: 9780544002692


There are no comments on this article.

Back to Articles


© 2008-2017 Greg Cope