Who is Robert Schlaff?

I’m a devoted father and husband to an awesome family who works at AIG as Head of Commercial Digital Product. For more information about me, please visit my LinkedIn profile or my Facebook page.


Rob’s Cool Tools

This is a website of my cool tools. A “cool tool” is anything that is tried and true to make your life better. Kevin Kelly coined the term on his Cool Tools website that is a more modern and digital version of the Whole Earth Catalog. The Whole Earth Catalog might have the best motto ever — Stay Hungry, Stay Foolish which Steve Jobs quoted in his famous Stanford Commencement Address.

Why I Love My Fitbit Alta HR

The Fitbit Alta HR is one of the few technology products that gives me almost everything I need and very little that I don’t. It’s a tour de force of good design. When I look at what I need on my wrist, it’s not really a smartwatch or even a fitness tracker but something else (maybe a “smart wristband”). Let’s take a look at the 3 features of the Fitbit Alta HR that are most important to me.

3 Features I Love About My Fitbit

  1. EASY Sleep and Exercise Tracking. I need to track how much I sleep and how often I go to the gym. I used to have a Fitbit Flex, which while being a good product, made me manually track my exercise and sleep. For example, to track sleep I was supposed to tap on the band when I went to sleep and when I woke up. This meant that I forgot many days and didn’t have good data on my sleep patterns. The Fitbit Alta HR makes use of its heart rate tracker to guess at when I go to sleep (my heart rate drops by a lot) and when I’m exercising (heart rate goes up). It can even figure out the type of exercise I’m doing (i.e., bicycling, walking, sports).
  2. Vibrating Notifications. For about 20 years, since I got my first flip phone, my friend Seth Gilbert and I talked about how difficult it is to make sure that we got our phone calls. We would put our phones on vibrate in our pocket and occasionally miss calls. Women had it worse because their phones were in their purses. With the Fitbit Alta HR, the wristband connects to my phone and will vibrate when there are calls or text messages. But notifications are limited to ONLY calls and text messages so I’m not bothered or even tempted by anything else (e.g., app notifications).
  3. Tactile Alarm Clock. A wise man once said, “The hardest thing about being married is not having your own alarm clock.” Not really. I like to wake up earlier than my wife so I can relax and let my mind settle. But if I set off an alarm clock, then everyone wakes up. Having a vibrating alarm clock allows me to wake up with a vibration that’s much nicer than the buzzing of the alarm and wakes me up without waking up the rest of the house.

How I Might Improve These Features

As a product person, it’s fun for me to think about how to make these features even better.

  1. Exercise Guidance (building on the feature EASY Sleep and Exercise Tracking). It would be great if my Fitbit could offer me sleep and exercise guidance in addition to tracking. Fitbit’s certainly going in this direction after having bought Fitstar and renaming it Fitbit Coach. Trying to put all of Fitbit Coach into something without a screen is difficult so I don’t think it’ll be done in the near term. However, I can see something simple like the Seven Minute Workout coming relatively soon.
  2. I Need You NOW (building on the feature Vibrating Notifications). In an emergency, you really want to get someone’s attention. For example, parents want to know where their kids are but they’re not checking their phones. One way that I’ve seen parents get their kids attention is to set off the “Find My Phone” alarm on their kids iPhone. The “Find My Phone” alarm is normally used when the phone is missing in a large house so it puts out a piercing screech which cannot be ignored. One way of bringing this to the Fitbit would be priority vibrations. For example, if someone calls multiple times in a row, the vibrations would increase in intensity so that the wearer could not ignore them. If we wanted to bring this to the next level we could add a small shock to the wristband though I’m not sure how well that would sell 🙂
  3. Wake Up and Relax (building on the feature Tactile Alarm Clock). The first thing I do when I wake up in the morning is turn on my meditation. It would be great if my Fitbit would wake me up and then go right into a 5-minute breathing exercise. The Fitbit Charge 2 HR already has a clever meditation routine for a small screen called breathing sessions. I imagine that my experience would go something like this:
    1. The alarm rings
    2.  I wake up and tap it to tell the watch I’m awake
    3. The guided breathing exercise starts. In … Out … In … Out
    4. If I’m not breathing at the guided breathing rate for 30 seconds, the Fitbit assumes I’ve gone back to sleep and tries to wake me up again.

Reasons I Like My Fitbit More Than an Apple Watch

The obvious comparison here is to the Apple Watch. “Why not just get a smartwatch?” you might say. Then you’d only have to have one thing on your wrist. Here are the reasons that my Fitbit is better than an Apple Watch — at least for me.

  • It’s Not a Watch. I really like my watch. It’s beautiful and far nicer than an Apple watch. Wearing both watches looks a bit silly because hey, how many watches does a person need? It’s also a bit annoying because it has the feeling of “Go Apple or Go Home” —monopolizing my wrist. It’s one thing for my technology items in my life to be Apple. I mean I love my iPhone and I love my MacBook Air but I don’t need everything in my life to be Apple.
  • Long Battery Life. Because it’s not trying to do too much, the battery can last a week without charging. This lets me wear it to sleep which gives me half of the features that I want in my wristband (sleep tracking and alarm clock). The Apple watch needs to be charged every night so misses these features.
  • Small, Light and Fashionable. Because it doesn’t have these Apple Watch features (especially as it doesn’t need a full screen) the Fitbit Alta HR can be small, light and fashionable. I can wear it to the sleep, to the gym and even do vigorous exercise I don’t notice it.
  • Limited Notifications and Features. While there’s another feature or two that I’d love to have, I like the fact that I’m not distracted or even tempted by them. I’m sure if I had an Apple Watch I’d be tempted to read the newspaper or check my appointments on my watch — which I really don’t need to do.

Overall the Fitbit Alta HR is an awesome device. It gets me 90% of the things I need and little else. When I started writing this post I thought about how weird it was that I was talking about a fitness tracker that I don’t really use for fitness very much. Then I came across Wearable’s Top Fitness Tackers of 2017 and saw that they rated the Fitbit Alta HR as their top choice. One of the top features for the favorite “fitness tracker” was sleep tracking!

Big Data vs. Small Actionable Analytics

“Big Data” is a technology buzzword. The idea is that we have so much data about people and the way they interact with a company, we should be able to generate new and interesting insights from this data that will solve business problems.

But there’s a catch. Big Data is just another form of analytics. In most companies, each additional piece of data provides less value than the previous piece of data.  In economics, this is called the diminishing marginal utility. So if the first piece of data (e.g., has this customer bought this product before) may be worth $1, the 50th piece of data (e.g., how old are the customer’s children) may worth less than a penny.

The First Piece of Data Gives a Lot More Value than the 50th Piece of Data 

Unfortunately, many people are unduly optimistic about the value that big data can provide.  They have this idea that they have so much data, if only they could search all this data, there’s bound to be something useful in there. It reminds me of one of Ronald Regan’s favorite jokes:

The joke concerns twin boys of five or six. Worried that the boys had developed extreme personalities – one was a total pessimist, the other a total optimist – their parents took them to a psychiatrist.

First the psychiatrist treated the pessimist. Trying to brighten his outlook, the psychiatrist took him to a room piled to the ceiling with brand-new toys. But instead of yelping with delight, the little boy burst into tears. “What’s the matter?” the psychiatrist asked, baffled. “Don’t you want to play with any of the toys?” “Yes,” the little boy bawled, “but if I did I’d only break them.”

Next the psychiatrist treated the optimist. Trying to dampen his outlook, the psychiatrist took him to a room piled to the ceiling with horse manure. But instead of wrinkling his nose in disgust, the optimist emitted just the yelp of delight the psychiatrist had been hoping to hear from his brother, the pessimist. Then he clambered to the top of the pile, dropped to his knees, and began gleefully digging out scoop after scoop with his bare hands. “What do you think you’re doing?” the psychiatrist asked, just as baffled by the optimist as he had been by the pessimist.

“With all this manure,” the little boy replied, beaming, “there must be a pony in here somewhere!”

If you have enough data, you’ll certainly find relationships between the data. But then you have a new problem − whether or not these finding will help you run your business. You can find lots of statistically relevant correlations that are completely spurious, like this:

So how should you use of Big Data? Start with small actionable analytics − analytics that matter. That’s what I did when I joined a new group at a large company. Though the business unit had a chief data officer, our team didn’t have the analytics that we needed. So I implemented a simple plan for understanding the users, making sure we had good data and building iteratively over time.

Understand the Users

The first thing we needed to do was understand who was going to use the system. These users were going to define success for the project. We had two user groups:

  1. Senior Management: This was the easy one to identify. senior management wanted some basic information to run the business.
  2. Power Users: However, we wanted to have a robust view of the data, not just high level reports. So we needed to look for our power users. And that’s when we found him. Let’s just call him Power User. He was based in London. Each month, Power User asked for a download of all the raw data and ran a 10 year old program in Excel to get the output he needed. The output looked great but the program was very fragile, hard to read and not extensible. But the output of this program gave us a great starting point.

Make Sure We Have Good Data

Analytics, first and foremost, is about the quality of the data. If you don’t have quality data none of your analytics will be right, whether it’s big data or anything else.  When I started working on this project there was a report that showed the volume of payments being processed. Every few months there was a massive spike in volume. This wasn’t something hard to see. It was a spike of a quadrillion dollars going through the system. By comparison, all the value of everything in the world is only $241 trillion. This report was a couple of years old and people reviewed it each month but no one mentioned anything. As it turns out, testing data was being included which is what messed up the numbers. Cleaning all of the data and ensuring its quality took a lot of time but was an integral part of the project.

Build Iteratively Over Time

We spent a lot of time with our users understanding what they wanted. We did most of this on the cheap. Instead of building out real applications, we created PowerPoint and Excel mock-ups to understand their needs. We went through many iterations to pin down what they needed. The real test was with Power User.  He was this really smart data geek and had been doing the same thing for 10 years. I remember holding my breath when I asked him for feedback. I was happily surprised with the reply, “Fantastic!!! This is really really good, guys, thank you so much!!!”

Don’t Over Build

Once we’d finished the first phase we’d expected to move our work out of Excel to a “real” business intelligence platform like Microstrategy or Cognos. However, we realized that the user group was so small and the flexibility they needed was so great that there was little value in moving off of Excel.

In Summary

Big Data is just another form of analytics. If you don’t do analytics well, you’re unlikely to find something magical under your pile of Big Data. It’s best to start small and really understand your basic analytics by understanding your users, ensuring your data quality and building iteratively.

The Secret to Google’s Self Driving Cars — Google Street View

For decades the US military was trying to create self-driving cars with little success. Once the private sector got into the game, these cars improved at a breakneck speed. In 2004, when the first DARPA Grand Challenge took place, no car in the world was able to complete the 150 mile course through the Mojave desert.  By 2016, self driving cars had driven over a million miles in regular traffic. The secret was not better computers or better cameras. The secret was better maps.

Peter Norvig, Google’s head of research, told the New York Times that Google Street View is the secret sauce behind Google’s self driving cars. He said:

It’s a hard problem for computer vision and artificial intelligence to pick a traffic light out of a scene and determine if it is red, yellow or green. But it is trivially easy to recognize the color of a traffic light that you already know is there.

I remember hearing that and thinking how convenient it was that Google happened to have Street View and that they could apply it to self driving cars. This would have been a classic case of “unlocking the power of data.” But then I learned the rest of the story.

Sebastian Thrun is the creator of Google’s self driving car and the founder of Google’s “X” lab. Google didn’t just “happen” to have Street View data lying around. Street View was created by Thrun after he met Larry Page at the DARPA Grand Challenge — the self driving car race. Thrun tells the story on CNBC’s The Brave Ones:

Larry came to the race itself and … came disguised with, like, a hat and sunglasses so he wouldn’t be bothered by everybody. But … he had a keen interest in this. Larry has been a believer in this technology for much longer than I even knew. And so was Sergey (Brin). And they really want to understand what’s going on,” Thrun said.

A later iteration of the car had cameras attached to its roof, so the team could review its progress each day, leading almost by accident to the development of Google Street View.

“We realized the video’s actually amazing. And we went to Google and said ‘we’d love to help you build Street View.’ And we kind of ended up – felt like an acquisition of a little start-up company, kind of Stanford transitioning into Google where me and four of my grad students then became Street View enthusiasts.”

“And we built up Street View and with a singular vision to photograph every street in the world.”

Street View became the first project within the secret Google X. “We had a separate building that no one knew about. At least for a year and a half, no one in Google had a clue we existed,” Thrun said.

So what did we learn? Data was the secret sauce for getting self driving cars to progress as well as they have. But it wasn’t a matter of finding a data set and applying it. It was about creating the data set for that specific purpose. Street View wasn’t a useful data set that was applied to self driving cars. It was the output of the mapping exercise that made self driving cars work so well.

One final addendum: When talking about Google Street View I have to add a link to an early version of Street View from 1979 that was created at MIT. The Aspen Movie Map (movie) used laserdiscs to simulate driving through the town of Aspen.

Smart Audio is Here to Stay: Some Takeaways from NPR’s Smart Audio Report

NPR and Edison Research have been putting together The Smart Audio Report. The study, presented at CES in January, gives a good look into how quickly smart speakers like Alexa and Google Home are entering the home:

  • It’s growing fast: 16% of Americans have a smart speaker − 128% growth since January 2017
  • Usage is growing over time: 84% use their speaker the same amount or more than the first month they owned it
  • They’re becoming embedded into people’s lives: 65% say that they would not like to go back to life without their smart speaker

The most interesting chart is a breakdown of the most frequently used activities by the time of day.

I haven’t done many of these things but I look forward to finding out more about them!

Fun with Patents OR The Possible Future of Amazon Alexa and Google Home

In the article Hey, Alexa, What Can You Hear? And What Will You Do With It?, The New York Times delved into some of the patents that Amazon and Google have filed for the future of their voice assistants (Amazon Alexa and Google Home). The article focused on privacy concerns by the group Consumer Watchdog that may or may not have understood what a patent is. The stuff that really freaked people out was the Amazon patent that focused on an “always on” capability where the assistants are always listening to the discussions around them.

It’s an interesting idea to use the conversations in the room to develop a better understanding of them; however, the language used clearly doesn’t take privacy into account. The patent was filed more as a future idea rather than something with all the kinks figured out.  But I can understand why some phrases from the patent Keyword Determinations From Conversational Data upset people. To paraphrase:

In at least some embodiments, a computing device such as a smart phone or tablet computer can actively listen to audio data for a user, such as may be monitored during a phone call or recorded when a user is within a detectable distance of the device. In other embodiments, voice and/or facial recognition, or another such process, can be used to identify a source of a particular portion of audio content.

I thought some of the other patents might provide a window into how Amazon and Google viewed the future. My favorite one was titled Monitoring And Reporting Household Activities In The Smart Home According To A Household Policy and was written by Tony Fadell, founder of Nest and one of the fathers of iPod.

This patent talks about various different ways to make a home “smart.” Today having a smart home means being able to control various devices, but what if you could set a goal (or policy in the words of the patent) and the smart home would partner with you to achieve it. To paraphrase the language of the patent it is:

A method for household policy implementation in a smart home, comprising: monitoring the household, analyzing household activities, taking actions and reporting the information. This system can help a family achieve goals such as how much screen time is used by family members, how often the household eats together and whether mischief might be occurring.

Ignoring the obvious privacy issues, there were some interesting things here. As a father, this was really interesting because it thought of the way to install parental controls over my entire smart home.

Let’s start with the overall partnership model. As the parent, I get to define a goal and the house will help me achieve it. How will this work? Let’s look at the example of tracking screen time. I’m kind of excited about a future where I can say “Limit my kids to 30 minutes of screen time.”

First, we need to monitor screen time. We need to understand who is in the room and what they’re watching.

Then we need to define our goals.

Finally, we take an action based on whether the goal is met or not.

Other factors may come into play. For example, if the child has been grounded they may lose their TV time.

Also, just because this was pretty funny, I have to include the patent’s “mischief detector” that detects mischief by  (again paraphrasing):

listening for low-level audio signatures (e.g., whispering or silence), while the occupants are active (e.g., moving or performing other actions). Based upon the detection of these low-level audio signatures combined with active monitored occupants, the system may infer that mischief (e.g., activities that should not be occurring) is occurring. Additionally, contextual information such as occupancy location may be used to exclude an inference of mischief. For example, when children are near a liquor cabinet or are in their parents’ bedroom alone, the system may infer that mischief is likely to be occurring.

While I probably won’t be using the mischief tracker any time in the future, the idea of setting goals for the household, and letting Amazon and Google help, is quite appealing.

What Does a Hotel Brand Stand for? OR How Airbnb Changed the Game

I was recently on an airplane with a hotel entrepreneur. His family had immigrated to the US about 20 years ago and they decided to enter the hotel industry. Being new entrants to hospitality, they started with lower quality airport motels (e.g., Econolodge) and gradually moved up to more premium hotels (e.g., Marriott).

I had always assumed that a hotel with a better brand made more money for the owner. I was surprised to learn that this wasn’t necessarily true. Premium hotels are priced higher but these higher prices are eaten up by higher costs in service, staffing and quality of amenities (e.g., beds).

However, it’s generally easier to run a premium hotel. For example, the guests are better behaved, despite their bad rap for being overbearing and demanding perfect levels of service.  Guests at economy hotels bring bad behavior to a whole new level.  My new friend told me about having to break up fights between guests or calming down a customer who was threatening one of his front office staff with racist remarks. People are much more likely to treat the hotel property poorly and even break things in an economy hotel. This leads to additional costs.

One of the worst problems in economy hotels was bedbugs — and not the way you’d think. Customers who already have bedbug problems at home would check into his hotel. Then they would smuggle in some of their bedbug-infested linens into the hotel room. Then they’d check out and wait a few days for the bedbugs to entrench themselves in the hotel room. Then they’d sue the hotel and say that their house became infested with bedbugs because of the hotel. So now the hotel has a room with bedbugs and a lawsuit to deal with.

But stuff like that doesn’t happen at a Mariott (at least I hope it doesn’t). Hotels with premium brands set expectations on the customer experience —price, quality and customer behavior. Put another way, the brand provides a level of trust to the traveler that they will have a good experience.

So what does this have to do with Airbnb? For years, staying at a hotel was the only way that a traveler could trust that they would get a good experience. So when Airbnb came along, most people rejected the idea. In fact, Airbnb was rejected for seed funding by the first seven investors that they approached. I remember hearing that Airbnb was a combination of the two worst ideas in Silicon Valley:

  1. Staying in the home of a stranger
  2. Renting out your spare room to a stranger

In his excellent site Stratechery, Ben Thompson talks about how Airbnb (and others) changed the game. It starts off with something called the Law of Conservation of Attractive Profits that Clayton Christianson wrote about in his book The Innovator’s Solution.

In short, there are commodity suppliers and integrated suppliers in the value chain. The integrated suppliers are the ones that make the big profits. In the original PC business, IBM was the integrated supplier, with its brand and its proprietary components, and everyone else was a commodity supplier. But it’s possible to change the game and commoditize others in the value chain take the profits for yourself. This is what Microsoft did to IBM. Who thought that the OS provider could commoditize the hardware provider — but they did.

Graphical Depiction of the Law of Conservation of Attractive Profits from Stratechery.com

Now let’s look at Airbnb. Travelers have the same needs that they’ve always had. They want a place to stay that’s comfortable and safe that’s somewhere close to the activities that they want to do. So how can a supplier deliver a great experience to the traveler? Before Airbnb, hotels needed to own the whole building (or have a franchisee own it). They would deliver a consistent experience by having a set of corporate standards that represented the brand. So a traveler knew exactly what to expect when they went to a Mariott Courtyard.

However, with Airbnb, the company can set expectations for the traveler during the booking and reservation process. Instead of focusing on broad standards like bed type, free breakfasts, and free Wi-Fi, Airbnb can focus on individual customer experiences for each room that’s rented out. This lets Airbnb commoditize (sometimes called modularizing) the experience of each individual room and still maintain a consistent Airbnb experience. It also let’s Airbnb source from much smaller and diverse suppliers who have extra rooms. So Airbnb becomes the most important player in the experience and therefore the most valuable component in the value chain.

How Airbnb Altered the Hospitality Value Chain, Allowing It to Take Outsized Profits from Stratechery.com

To learn more about how this all works check out Ben Thompson’s writing on Aggregator Theory at Stratechery.com.

Reader Question: Don’t Chaos Monkies Slow Things Down?

Today’s reader question comes from Marc about my article on Chaos Monkeys on the Simian Army.

I can’t speak for your mother-in-law, but I find this fascinating. Do the testing of problems actually slow down the system, much like as if I were changing a flat tire every week?

— Marc 

What A good question Marc! It’s a question that many people have but rarely ask. This type of testing does slow down the system a little, but the benefits outweigh the costs. It’s like asking “Doesn’t sleeping 8 hours a day make you less efficient? Wouldn’t you be more efficient if you worked the whole 24 hours?”

The key is that failure is baked into this model. Think about if you have 1000 wheels on the car instead of 4. Now each of these wheels is rated to be replaced every 3 years. So each day you can expect about one tire to go flat a day. But that’s on average.

One wheel breaking every day is a pretty easy thing to recover from. But if you have 1000 tires, some crazy things could happen that are very hard to predict ahead of time. What happens if multiple tires go out? What happens if three tires go out that are next to each other? What happens if the front right tire and the front left tire go out at the same time?

It’s much more complicated in Netflix’s case because you have many different types of systems that are interdependent. That’s why Netflix tests all these different contingencies. Yes, there’s a slight overhead in doing this but it allows you to ensure that the system is robust. Also, Netflix wants to make sure that if any single component fails, the system degrades gracefully. For example, if the recommendations system goes down, Netflix should display generic recommendations like new movies or fan favorites and everything else should work fine.

What we’re really talking about is humility in our ability to design a system perfectly up front. In order to run a system at 100% optimal efficiency, you’d need to be able to predict everything that could go wrong and also what may unexpectedly change in the future. For a long time, people have worked hard at making this planning process better. However, trying to make the planning process perfect starts to take more and more time and cuts into the efficiency of the project. Also, and this may be obvious, it’s impossible to plan perfectly for the future.

This is why most software development is moving from a traditional “waterfall” design to a more “agile” design. People used to think that you should build software like you build a building. This was called “waterfall” because you start at the top with your strategy and that design flows down all the way into execution. You make highly detailed plans and then take years to build it. However, we’ve realized over the years that we can solve most key business problems without building the whole software project — we just build the parts that matter. Also, people can start using the software before it’s done — which lets us revise the plans on a regular basis as we see how it’s used. We’re accepting that we don’t know the total plan. We have an idea of where we want to go in five years, but we only know what we’re building over the next few months.

But why does agile work? Isn’t it more efficient to do all the planning first and then build it? Yes. But not in a way you think. The waterfall method is more efficient for software; however, it’s not what’s best to get the job done. When people start a project,  they don’t actually know everything that the product needs to do up front. They wish they could know 100% of the system in the beginning but they never actually do that. Think about trying to design a user interface five years ago. Would you even have considered building a voice interface like Amazon’s Alexa? Of course not. So if you built your 5-year-product roadmap, it wouldn’t have even considered a voice interface.

This idea of breaking projects down into small parts goes beyond software. Bent Flyvbjerg (researcher in project planning with a super awesome name) has found that larger projects are more likely to have cost overruns. However, it’s not necessarily about the size of the project itself, it’s much more related to the size of the segment of the project. Public works projects like building a bridge or a dam, which can only be built in one large chunk, are more likely to have cost overruns than a road, which can be built in small segments.

Chaos Monkeys and the Simian Army OR How Netflix Plans for Resiliency

In this post, I’m trying to take something technical and make it (mostly) readable for my mother-in-law. Enjoy!

One big trend, especially for internet companies like Facebook, Google and Netflix, is not to have one massive computer anymore. This is an oversimplification but computers used to be one big expensive box. The faster the computer you needed, the more money you spent. But eventually, the computers became too expensive to possibly meet the needs of today’s internet companies. So Netflix (and others) started stitching together these large supercomputers out of many smaller and cheaper computers by connecting them in these clever ways.

The benefits of doing this are pretty amazing because they allow you to get this supercomputer that can do incredible things that are very low cost. The problem is with each of the smaller computers. Because they’re so cheap, they can fail at any time. This means that Netflix has computers failing constantly. But customers don’t see this happening. So how does Netflix get this to work?

Netflix needs to make sure that of all its computers and systems are resilient. Using a car metaphor, Netflix is always able to swap out a spare tire if one gets a flat. On their blog, Netflix explains how they test this tire changing/computer failing problem:

Imagine getting a flat tire. Even if you have a spare tire in your trunk, do you know if it is inflated? Do you have the tools to change it? And, most importantly, do you remember how to do it right? One way to make sure you can deal with a flat tire on the freeway, in the rain, in the middle of the night is to poke a hole in your tire once a week in your driveway on a Sunday afternoon and go through the drill of replacing it. This is expensive and time-consuming in the real world, but can be (almost) free and automated in the cloud.

This was our philosophy when we built Chaos Monkey, a tool that randomly disables our production instances to make sure we can survive this common type of failure without any customer impact. The name comes from the idea of unleashing a wild monkey with a weapon in your data center (or cloud region) to randomly shoot down instances and chew through cables — all the while we continue serving our customers without interruption. By running Chaos Monkey in the middle of a business day, in a carefully monitored environment with engineers standing by to address any problems, we can still learn the lessons about the weaknesses of our system, and build automatic recovery mechanisms to deal with them. So next time an instance fails at 3 am on a Sunday, we won’t even notice.

In addition to Chaos Monkey, Netflix has a number of other members of the Simian Army. The Netflix descriptions of these fellows is a bit technical:

Latency Monkey induces artificial delays in our RESTful client-server communication layer to simulate service degradation and measures if upstream services respond appropriately. In addition, by making very large delays, we can simulate a node or even an entire service downtime (and test our ability to survive it) without physically bringing these instances down. This can be particularly useful when testing the fault-tolerance of a new service by simulating the failure of its dependencies, without making these dependencies unavailable to the rest of the system.

Chaos Gorilla is similar to Chaos Monkey, but simulates an outage of an entire Amazon availability zone. We want to verify that our services automatically re-balance to the functional availability zones without user-visible impact or manual intervention.

Iatrogenics OR When Doing Nothing Might Be the Best Alternative

i·at·ro·gen·ic /īˌatrəˈjenik/
Relating to illness caused by medical
examination or treatment.
— Google Definitions

I learned about the word iatrogenic when reading the book Writing to Learn by William Zinsser. The book, written in 1984, used the following passage as an example of medical writing. It talks about the link between medical prescriptions and opium addiction:

The medical profession has a long record of treating patients with useless or harmful relatives, often in clinical settings of complete mutual confidence. Iatrogenic diseases, complications and injury have been, in fact, common in the history of medicine. Only look upon addiction to certain dispensed drugs as one variation among the occasional effects of drug therapy.

I thought, “What an interesting new word!” as did Zinsser who also had to look it up. Then I came across Nicholas Nassim Taleb’s book Antifragile and found that he also fell in love with the word and expanded the idea into a class of issues that he called iatrogenics that went beyond medicine.

Iatrogenics are different from malpractice. Malpractice is doing an operation wrong. Iatrogenics is about doing a treatment correctly but it still having harmful side effects. When doctors ignore these side effects, they are far more likely to use all the tools at their disposal, like drugs or surgery,  whether or not it’s a good idea in the long term.

Let’s look at a recent example. The New York Times recently published Heart Stents Are Useless for Most Stable Patients. They’re Still Widely Used. While they have no medical benefit, putting in a stent makes both doctors and patients feel like they are doing something — that they are in control. And, from both points of view, “they seem to work,” even though they don’t work any better than a placebo.

So what’s the harm in that? Everyone’s happy aren’t they? Well no, they’re not. Doctors are performing an operation that does no better than a placebo so there’s no upside. However, there’s a significant downside in the complications from the operation.

Or take another example from a cruise I went on. Cruises offer Wi-Fi on the ship with tiny data limits (50MB for the whole trip). This is so small that just opening my phone will go over this limit. So a cruise director offered, “Give me your phone and I’ll make it work on the boat.” So I gave him the phone and he starting turning off these data hogging applications. A few months later I realized that one of the things he turned off was my iCloud backup. So the decision that the cruise director made, without telling me, was to give me very limited internet functionality on the boat while turning off my critical backup capability.

Another way of looking at iatrogenics is overvaluing of short term gains vs. long term risks. Take the example of Thalidomide, the poster child for drug overuse. Thalidomide was a sedative that was prescribed around 1960. While it helped women with morning sickness (a relatively minor problem) it caused tens of thousands of serious birth defects.

Indulge me with one more example. When George Washington had left the presidency he’d taken ill. His treatment was the standard for the day — bleeding. However, taking 5 to 7 pounds of blood from Washington’s body is now widely believed to accelerate his death. Bleeding stayed around for a while after that. It was still recommended by leading doctors as late as 1909.

Taleb tells one story of how this problem goes beyond medicine and into finance:

One day in 2003, Alex Berenson, a New York Times journalist, came into my office with the secret risk reports of Fannie Mae, given to him by a defector. It was the kind of report getting into the guts of the methodology for risk calculation that only an insider can see—Fannie Mae made its own risk calculations and disclosed what it wanted to whomever it wanted, the public or someone else. But only a defector could show us the guts to see how the risk was calculated.

We looked at the report: simply, a move upward in an economic variable led to massive losses, a move downward (in the opposite direction), to small profits. Further moves upward led to even larger additional losses and further moves downward to even smaller profits.

At its core, this was what caused the financial crisis. It was people adding more and more risk for smaller and smaller gains. They failed to look at the downside risks which kept growing larger and larger because they couldn’t imagine that they would occur.

Oddly enough, people don’t get in trouble for doing this. There’s a general sense that the people causing the problems were doing the best they could. The idea of “this is the best modern medicine (or modern finance) has — even if it doesn’t work” is well accepted. This is true even when the procedure is successful but the patient died or the economy collapsed.

A lot of this happens because the people making the decisions don’t have skin in the game. They get the upside benefits without being exposed to the downside risk. Taleb mentions that when Roman engineers built a bridge, they were required to sleep under it. Then, if the bridge fell down, the engineers would feel the pain (or death in this case) of the people who were hurt by the bridge.

So what can you do about all this? Try to get your doctor to put a little skin in the game. The next time you have an important medical decision to make, don’t ask your doctor for her medical opinion, ask her what she would do if she were in your place. This changes her mindset from a “disinterested professional” to someone with a personal stake in the game. You might get a very different answer.

Read this along with my story on back pain.

Prospect Theory in Real Life OR How Losing Feels Bad More than Winning Feels Good

I’m going to do a magic trick with a number. I’m going to take a number 1700 and by doing nothing more than raising and lowering it, I’m to show how the interpretation of the number can dramatically change.  Let’s see how that can happen and then I’ll explain how that works.

When my wife was pregnant with our second son, we had a test for Downs Syndrome. This test had three parts:

  1. A “Nucal” sonogram that measured some key ratios. This was the most important test and sets the baseline.
  2. A blood test that measured blood proteins in the mother.
  3. A test of “soft markers” that refined the initial estimates based on other sonogram features.

So we had the initial test. The chance of an issue was 1 in 1700.

“Is that good?” We asked the doctor. “It sounds good to us.”

“Well, in order to be certain, you’d need to have an amniocentesis which has a 1 in 400 chance of serious problems,” said the doctor.

So 1 in 1700 is pretty darn good. Then we got the blood test back. The numbers were even better. Our chances now were 1 in 6800. That was 4 times better than we’d had before!

So we’d finished 2 or the 3 tests. Then, things got tough. We went in for a sonogram and the technician stopped at one point and said, “I need to get the doctor.” That’s never a good sign.

When the doctor came back he said, “Well, your child had 2 soft markers for Downs.”

“What does that mean?” we asked.

“Well, it means that your child has a higher chance of having Downs Syndrome. Maybe you should see a genetic counselor,” he said.

“Before we go down that route, how does this really alter our chances?” we asked.

“Well, we’re not really sure. One soft marker could double the chance of having Downs Syndrome. So 2 soft markers might increase the chance by as much as 4 times but it’s probably less than that,” he said.

“So you’re saying our chances are back to 1 in 1700.”


See. Magic.

How did this happen? Behavioral Economics has an answer. In contrast to typical economic theory, Behavioral Economics looks at situations and sees how people really react — not how they would react in theory. The situation above is an example of Prospect Theory — the finding that losing something causes about twice as much pain as the pleasure you get from gaining something. So gaining and then losing the same amount still feels like a net loss.