Big Data isn’t the answer
Posted: November 16, 2012 at 9:38 am in management ~ Permalink ~ TrackBack

I was talking to somebody last week who had recently moved to San Francisco, and she randomly interjected Big Data into the conversation. She said she’d learned that’s what you do in SF – Big Data is a buzzword that can be used at any time on any topic. I found this amusing, because Big Data is becoming almost messianic – all of our problems will be solved once we have Big Data! Everybody should become a statistician or economist or data analyst!

My response? Don’t believe the hype.

The trends are unmistakable – humans are creating and capturing more data than ever before. IBM estimates that 90% of the data in the world today was created in the last two years. And the tools we have to sift through data are becoming ever more powerful, with open-source packages like Hadoop for map/reduce, and R for statistical computing.

This flood of data and the tools to analyze it are creating market opportunities for businesses and career opportunities for individuals – the story of Target identifying a pregnant teenager from her purchases is a founding myth of Big Data. And some credit Obama winning the presidency to data analysis. So why am I skeptical?

There is a belief that if we could only quantify everything, we would be in control. The management saying is “You can’t manage what you don’t measure.” So if we have more data, and can measure everything, we should be able to manage everything! Except that the world is not that simple. Just because something has been quantified doesn’t mean that it is good or meaningful data – how the data is collected can introduce biases or trends that render it useless for making decisions. And just because an analysis gives a numerical output does not make it into useful knowledge or wisdom.

What I am seeing in the rush to Big Data is the urge to quantify things before understanding them. Recording 600 metrics and tossing them all into a database creates a ton of data, and analysts can spend weeks or months looking through the data. But is that really driving value for an organization? Similarly, I’ve seen situations where an analyst uses a standard ARIMA model to forecast a trend with confidence (because it’s data-driven!), and later being surprised that the forecast is wrong because they never really understood the underlying data. Another example is when a consultant creates a 500-line Excel spreadsheet, where every possible variable is quantified and every change ripples through the spreadsheet… but of those 500 lines, 490 are assumptions, so it’s impossible to tell which variables really matter.

Another potential peril is when analysts start their work with a preconceived notion of the result they want to get. With Big Data, you have enough data to support almost any conclusion if you slice the data in the right way. One of my favorite stories about the perils of data analysis came from my time as an intern at CERN – a grad student was looking for a particular energy resonance from the L3 detector data, and displayed this beautiful graph showing that resonance. Dr. Sam Ting, Nobel Prize winner, smelled a rat – the result looked _too_ clean. He told the grad student to show the data with all of the filters removed, and the raw data showed nothing but noise. The student had applied the filters to show what he wanted to see. Note that I’ve seen similar things happen at Google – as a coworker commented to me recently, if even Google (and MIT grad students) can’t consistently get data analysis right, can anybody?

I worry that the quantification of the world in the form of Big Data is being seen by businesspeople as an end in itself, rather than as the tool it is. Like any tool, data analysis can be used well by those who have trained in its use, or it can be used poorly and cause damage by those without experience. Understanding data is hard. It takes time and effort, and while a well-constructed tool can accelerate that process, it doesn’t replace the need to sit and work with the data to understand its quirks and characteristics. After really understanding the data, you may discover that only 3 metrics out of 600 really matter, and so you don’t need Big Data to run your organization – just a dashboard with the 3 things that matter.

Big Data isn’t a silver bullet that will fix everything with your organization. It is a powerful tool that can help you better understand what is going on, but only if you spend the time to use it properly. Just because your analysts create output that is quantitative doesn’t mean it’s right. Trust, but verify. Use your judgment and all of your tools including walking around to figure out what to do, because in the end, you are the one responsible, no matter what the data says.

Previous: Career development in the 21st century | Next: I went to India!




  1. Beemer commented on November 16th, 2012 at 10:12 am :

    This is funny to me, because in atmospheric science, Big Data isn’t an answer — it’s a looming problem that everyone’s trying to figure out how to deal with! The idea that people would be chasing after it if they didn’t already have to deal with it is a little surprising.

  2. Frank F commented on November 16th, 2012 at 10:27 am :

    Have you read The Signal and the Noise by Nate Silver yet? It goes into a lot of the thing you are talking about when it comes to making predictions using data.

    Like most things, Big Data is a tool, and it can be a great tool, but a tool needs to be operated correctly for value to be made.

    And congrats to the MIT hoops teams being preseason number 1!

  3. Eric commented on November 16th, 2012 at 10:45 am :

    Beemer: why is Big Data a looming problem in atmospheric science? If they are collecting more data than they know what to do with, should they be collecting less data?

    And I think other companies think that Big Data is the secret sauce that enables the success of Google and Amazon and Facebook, so they want to match that. But I think the real secret sauce is having hired a lot of really smart people and turning them loose on these hard problems – data alone isn’t the secret.

    Frank: Thanks for the recommendation – I’ve reserved his book from the library and will let you know what I think.

  4. Anca commented on November 16th, 2012 at 11:17 am :

    Big Data is the latest buzzword, following in the steps of “Social”, “Web 2.0″ and “dot com”.

    That being said, I think it’s kind of cool that more people are starting to learn how to interpret, acquire, and create data in an organized fashion. That takes time, patience, and mental effort, which is to say that smarter people will be able to do more with Big Data than those who think the data is the end rather than a means.

  5. Beemer commented on November 16th, 2012 at 11:28 am :

    Well, we know that we *need* the data. There’s plenty of research that shows that we simply don’t have sufficient observational density / resolution to answer certain questions, and more computing power means better model resolution means entirely new classes of problem to explore. So there are good reasons for all the efforts that are generating and collecting this oncoming tsunami of data.

    We just haven’t quite figured out how to deal with it all yet…

  6. Seppo commented on November 16th, 2012 at 12:57 pm :

    Interesting. My experience has been that the “instrument everything” approach definitely has major drawbacks – mostly what you said, that if you’ve got enough data you can support anything, as long as you look at it the right way. More, what we’ve found is that people who simply “munge data” can’t usually find things that are particularly important – you need the people who are looking at the data to be experts on the subject matter they’re looking at data for, because you can’t instrument subtlety very well, you have to be looking at the wall of data and have a masterful understanding of where the data is coming from, and what the user experience that *creates* that data leads to.

    Obviously that’s not … not obvious. Just that it seems like whenever you get that buzzword-bingo atmosphere going on, a surprising number of people end up thinking that it’s not just a means to an end, but an end in itself. The giant wall of data may be impressive in the way that big walls of anything are, but it’s not actionable without understanding. For us, the best use of data is when we have a designer or PM asking very specific questions with fairly easily quantifiable answers. So much of the development process used to rely very heavily on assumptions & extrapolation of personal experience, and the best use of data is to find out what people *actually do*, rather than what we expect them to do. And for that, we’re usually looking only at one (maybe two) metrics over a very limited time period.

    Small data. :D

  7. dw commented on November 16th, 2012 at 10:44 pm :

    Your worries about Big Data — that it can be abused, distorted or manipulated — are equally true of the old, qualitative way of doing things.

  8. Eric commented on November 17th, 2012 at 12:44 pm :

    Lots of great comments to respond to – thanks!

    Anca: Agreed that it’s great that data analysis and interpretation is getting more attention. My point is that really understanding the data is more than just the numbers – it requires domain expertise as well to understand how the numbers are generated and used.

    Beemer: Ah, interesting – so it’s kind of a situation where you don’t know what you don’t know, and you have to collect all of the data and deal with it first before you can begin understanding and simplifying it. Fun!

    Seppo: Great examples of why Big Data isn’t always the answer. If you don’t understand the data, you’re better off focusing on small data first to figure out what’s going on. And we’ve talked about how “data munging” alone isn’t enough – all the mathematical and statistical techniques in the world don’t help if you don’t understand the data. Which is a long way of saying Garbage In, Garbage Out.

    dw: Great point that these are not new worries. I would say that these worries, while present in the old ways, are not magically solved by quantifying them. And I’ve seen lots of people who think numbers can replace judgment.

 

Speak up!

Line and paragraph breaks are automatic.
Allowed tags: <a href=""> <blockquote> <code> <em> <strong>