Big Data

I recently attended a session on Autonomous Cars at the law firm Herbert Smith Freehills. It was an insightful session where the lawyers gave great presentations on legal issues they advise on, like M&A, regulatory and product liability. However, one non-legal item they talked about was the ability for car manufacturers to “monetize data.” The idea of monetizing data comes up often but it’s a lot harder than it sounds.

A decade ago, I was working for a large credit card company looking at new growth opportunities. We were convinced that we could become the most valuable company in the world. Our reasoning went like this. Google was worth billions of dollars. But Google’s value was based on what web links people clicked. We, as a credit card company, had data on what people actually bought. Because our data was more relevant to advertisers than Google’s data, we should clearly have been worth more than Google.

There was just one problem. While we had this data, so did Bank of America, Capital One, JP Morgan and every other bank. And everyone was looking to monetize their data.

Did I say one problem? It wasn’t just financial services companies looking to out-Google Google. The phone companies were in this game too. They were saying, “Hey, we should be the most valuable companies in the world. Google has data on where people go on the web, but we have data on where people actually are in the real world.” Suffice it to say, there was a lot of data around.

This reminded me of an article written about undersea cable capacity in the days of the telegraph. Andy Kessler shared the following cautionary tale:

After undersea telegraph messages were first sent between Newfoundland and Ireland in 1886, a half-dozen companies sprang up to relay messages between London and Paris and New York. Half the traffic was for stock trading. These companies charged up to $5 per word and could transmit 15 to 17 words per minute. Each thought it could generate revenues of $5 million dollars or more per year. It was easy to raise the $2 million it took to lay undersea cable and investors, who constantly dashed off telegrams themselves, were all too happy to lend money.

Each of these companies assumed that they’d have a monopoly on the market. But when many companies entered the market based on that same assumption, all of the excess capacity created a race to the bottom for telegraph message pricing, forcing many of the companies into bankruptcy.

So what makes Google different? I remember a discussion with stock analysts around that time. I had written a paper on Mobile Payments along with Citi’s Equity Analysts. The topic of data was very hot and various analysts asked me, “Who’s going to win the data game? Who has the best data?” I explained that the real differentiator, and what people will pay for, isn’t the data itself but what you can do with the data.

As the famous Harvard Marketing professor, Theodore Levitt said, “People don’t want to buy a quarter-inch drill. They want a quarter-inch hole!” In the data space, this would be, “People don’t want to buy data, they want to buy results!”

How Google Uses Big Data

The goal of a search engine is to find the most relevant documents. In the early days of search engines, things were relatively easy. You could:

Examine Web Pages: Early search engines like Lycos and Altavista would look at web pages and determine which ones were the most relevant. They would do this by looking at factors like the number of times a word was repeated or whether the search term was in the title of the document.
Curated Directory: Yahoo, on the other hand, had humans hand-curating the web into a giant directory. This was relatively easy when the web only had a few thousand pages.

My Interpretation of the Early Web. With Only a Few Pages, Choosing a Winner Wasn’t That Difficult.

However, as the web grew, it became more and more difficult to manage search with these methods. Lycos and Altavista were overwhelmed. Not only was it difficult to distinguish between multiple similar pages based on the text in the page but there was also web spam that was trying to fool the search engines into promoting their pages. Yahoo had a problem hiring enough people to keep up with the quickly growing web. Both had doomed strategies.

The State of Web Search When Google Entered the Game. As the Web Started Exploding, Finding the Best Pages Became Increasingly More Difficult.

Google went down a different path. By using an algorithm called PageRank (after Larry Page), formerly called BackRub (oh those Googlers and their funny names), Google was able to make use of data that everyone else was overlooking. The links between pages were just as valuable as the data in the pages themselves. For example, any page can claim to be the authoritative page of IBM. But if 100 people point to IBM.com as the right answer, it’s easy to lift that one to the top.

Google Changed the Game by Using Links from Other Sites as a Measure of Quality

There are a few things to realize about Google’s use of data:

1. Google didn’t have the “best” data. Yahoo had a more accurate method for categorizing the web. Having humans look at content gave better results for each individual page. Unfortunately for Yahoo, that method was too slow and expensive to sustain.

2. The data didn’t cost Google anything. At the time, everyone was concentrating on the web pages themselves — not the linkages between the pages. This kind of information is often called “information exhaust” — information that’s a by-product of what you’re really looking for. It was already out there, free for anyone to use.

3. It’s the capability that made the difference. While the data was free, it was up to Google to organize the data and make it useful. Going back to the jobs to be done metaphor, Google put this data to work solving a problem for users.

4. More data is better. While other search engines were getting overwhelmed by the torrent of data from an explosion of web content, Google’s product actually benefited: The more links that can point to a quality web page, the better search results Google produces.

Google has been using this template for various other projects since they were founded. They can leverage data in some very creative and useful ways. Take location data for example. If you have an Android phone or Google Maps on your phone, Google is keeping track of your location data. You can take a look at your data here. The data is useful to me but it’s a bit odd seeing that Google holds a record of everywhere I’ve been.

An Example of Google Tracking Me Through the Day.

So how can Google use your location, along with that of others, to create value? Well, one way is to aggregate this data to show where there’s road traffic. If you have a lot of phones not moving, then you can flag that road as congested. But where else could Google use this data? Google added a feature to Google Maps that let you see how crowded a restaurant was at different times of day based on how many cell phones they found at the restaurant.

A Graph of Popular Times at Bubby’s Restaurant Compiled Through Location Data. Note the Popularity of Sunday Brunch.

It’s important to remember that Google did not have the best data to determine busy times at restaurants. Telephone companies and restaurant sites (e.g., Yelp, OpenTable) likely had better data. For example, OpenTable manages the reservations systems for many restaurants and actually knows how busy they are. But yet again, Google was the best at putting the data to work at solving this problem.

So let’s sum up. People still talk about monetizing data but their data isn’t as valuable as they think it is. There’s a lot of data out there that can solve problems and generate value. The tricky part is extracting the value from the data. Google did this in search and continues to do so in lots of other ways.

Note: Ben Thompson from Stratechery gave a similar talk about how Google works last week to kick off the University of Chicago Antitrust and Competition Conference.

For decades the US military was trying to create self-driving cars with little success. Once the private sector got into the game, these cars improved at a breakneck speed. In 2004, when the first DARPA Grand Challenge took place, no car in the world was able to complete the 150 mile course through the Mojave desert. By 2016, self driving cars had driven over a million miles in regular traffic. The secret was not better computers or better cameras. The secret was better maps.

Peter Norvig, Google’s head of research, told the New York Times that Google Street View is the secret sauce behind Google’s self driving cars. He said:

It’s a hard problem for computer vision and artificial intelligence to pick a traffic light out of a scene and determine if it is red, yellow or green. But it is trivially easy to recognize the color of a traffic light that you already know is there.

I remember hearing that and thinking how convenient it was that Google happened to have Street View and that they could apply it to self driving cars. This would have been a classic case of “unlocking the power of data.” But then I learned the rest of the story.

Sebastian Thrun is the creator of Google’s self driving car and the founder of Google’s “X” lab. Google didn’t just “happen” to have Street View data lying around. Street View was created by Thrun after he met Larry Page at the DARPA Grand Challenge — the self driving car race. Thrun tells the story on CNBC’s The Brave Ones:

Larry came to the race itself and … came disguised with, like, a hat and sunglasses so he wouldn’t be bothered by everybody. But … he had a keen interest in this. Larry has been a believer in this technology for much longer than I even knew. And so was Sergey (Brin). And they really want to understand what’s going on,” Thrun said.

A later iteration of the car had cameras attached to its roof, so the team could review its progress each day, leading almost by accident to the development of Google Street View.

“We realized the video’s actually amazing. And we went to Google and said ‘we’d love to help you build Street View.’ And we kind of ended up – felt like an acquisition of a little start-up company, kind of Stanford transitioning into Google where me and four of my grad students then became Street View enthusiasts.”

“And we built up Street View and with a singular vision to photograph every street in the world.”

Street View became the first project within the secret Google X. “We had a separate building that no one knew about. At least for a year and a half, no one in Google had a clue we existed,” Thrun said.

So what did we learn? Data was the secret sauce for getting self driving cars to progress as well as they have. But it wasn’t a matter of finding a data set and applying it. It was about creating the data set for that specific purpose. Street View wasn’t a useful data set that was applied to self driving cars. It was the output of the mapping exercise that made self driving cars work so well.

One final addendum: When talking about Google Street View I have to add a link to an early version of Street View from 1979 that was created at MIT. The Aspen Movie Map (movie) used laserdiscs to simulate driving through the town of Aspen.