“Big Data” is a technology buzzword. The idea is that we have so much data about people and the way they interact with a company, we should be able to generate new and interesting insights from this data that will solve business problems.
But there’s a catch. Big Data is just another form of analytics. In most companies, each additional piece of data provides less value than the previous piece of data. In economics, this is called the diminishing marginal utility. So if the first piece of data (e.g., has this customer bought this product before) may be worth $1, the 50th piece of data (e.g., how old are the customer’s children) may worth less than a penny.
Unfortunately, many people are unduly optimistic about the value that big data can provide. They have this idea that they have so much data, if only they could search all this data, there’s bound to be something useful in there. It reminds me of one of Ronald Regan’s favorite jokes:
The joke concerns twin boys of five or six. Worried that the boys had developed extreme personalities – one was a total pessimist, the other a total optimist – their parents took them to a psychiatrist.
First the psychiatrist treated the pessimist. Trying to brighten his outlook, the psychiatrist took him to a room piled to the ceiling with brand-new toys. But instead of yelping with delight, the little boy burst into tears. “What’s the matter?” the psychiatrist asked, baffled. “Don’t you want to play with any of the toys?” “Yes,” the little boy bawled, “but if I did I’d only break them.”
Next the psychiatrist treated the optimist. Trying to dampen his outlook, the psychiatrist took him to a room piled to the ceiling with horse manure. But instead of wrinkling his nose in disgust, the optimist emitted just the yelp of delight the psychiatrist had been hoping to hear from his brother, the pessimist. Then he clambered to the top of the pile, dropped to his knees, and began gleefully digging out scoop after scoop with his bare hands. “What do you think you’re doing?” the psychiatrist asked, just as baffled by the optimist as he had been by the pessimist.
“With all this manure,” the little boy replied, beaming, “there must be a pony in here somewhere!”
If you have enough data, you’ll certainly find relationships between the data. But then you have a new problem − whether or not these findings will help you run your business. You can find lots of statistically relevant correlations that are completely spurious, like this:
So how should you use Big Data? Start with small actionable analytics − analytics that matter. That’s what I did when I joined a new group at a large company. Though the business unit had a chief data officer, our team didn’t have the analytics that we needed. So I implemented a simple plan for understanding the users, making sure we had good data and building iteratively over time.
Understand the Users
The first thing we needed to do was understand who was going to use the system. These users were going to define success for the project. We had two user groups:
- Senior Management: This was the easy one to identify. Senior management wanted some basic information to run the business.
- Power Users: However, we wanted to have a robust view of the data, not just high level reports. So we needed to look for our power users. And that’s when we found him. Let’s just call him Power User. He was based in London. Each month, Power User asked for a download of all the raw data and ran a 10 year old program in Excel to get the output he needed. The output looked great but the program was very fragile, hard to read and not extensible. But the output of this program gave us a great starting point.
Make Sure We Have Good Data
Analytics, first and foremost, is about the quality of the data. If you don’t have quality data none of your analytics will be right, whether it’s big data or anything else. When I started working on this project there was a report that showed the volume of payments being processed. Every few months there was a massive spike in volume. This wasn’t something hard to see. It was a spike of a quadrillion dollars going through the system. By comparison, all the value of everything in the world is only $241 trillion. This report was a couple of years old and people reviewed it each month but no one mentioned anything. As it turns out, testing data was being included which is what messed up the numbers. Cleaning all of the data and ensuring its quality took a lot of time but was an integral part of the project.
Build Iteratively Over Time
We spent a lot of time with our users understanding what they wanted. We did most of this on the cheap. Instead of building out real applications, we created PowerPoint and Excel mock-ups to understand their needs. We went through many iterations to pin down what they needed. The real test was with Power User. He was this really smart data geek and had been doing the same thing for 10 years. I remember holding my breath when I asked him for feedback. I was happily surprised with the reply, “Fantastic!!! This is really really good, guys, thank you so much!!!”
Don’t Over Build
Once we’d finished the first phase we’d expected to move our work out of Excel to a “real” business intelligence platform like Microstrategy or Cognos. However, we realized that the user group was so small and the flexibility they needed was so great that there was little value in moving off of Excel.
Big Data is just another form of analytics. If you don’t do analytics well, you’re unlikely to find something magical under your pile of Big Data. It’s best to start small and really understand your basic analytics by understanding your users, ensuring your data quality and building iteratively.