MOD: Masters of Data

Bringing the human to the data

Alistair Croll: The Importance of Analytics

Founder, Solve for Interesting

August 7, 2018

31:58

We must teach critical thinking to make our population more resilient.

Alistair Croll is a serial entrepreneur, speaker, prolific author, and conference organizer - and a big thinker in the area of Data and Analytics.

Welcome to the Masters of Data podcast, the podcast where we talk to the people on the front lines of the data revolution about how data affects our businesses and our lives.

Our guest today has had a long and varied career in technology. Alistair Croll (@acroll) is a serial entrepreneur, a much sought-after speaker, prolific author—including the bestselling Lean Analytics, which is a must-read—organizer of conferences like Ford 50, and consultant to companies large and small.

Alistair is a big thinker in the area of data and analytics, and we got to cover a lot of ground in this episode. So, without any further ado, let’s dig in.

Alistair Croll's Journey into Tech

You’ve actually been described as having a little bit of career ADHD, which I can definitely identify with, doing a lot of different things, a lot of different interests. Tell me a little bit about how you got to where you were. Where did you kind of start? How did you get into the industry? And how did your career kind of develop?

So, I started out with a misspent youth with an Apple 2 computer and not enough time in the sun. So, that tells you how old I am.

I got a job in the telecommunications industry working for somebody called Icon Technology out of school. Actually, it was kind of cool. Every time you dialed one of those AOL modems with that CD rom that Steve Case stuck to every magazine he could, you were dialing one of my modems. I was the project manager for the dialup access concentrators for AOL.

width=

But when you’re dealing with networks, especially back in the 90s, everybody wants more performance. You have to have policies that say how you handle different kinds of traffic, and those policies come down to network performance.

So then I got into network performance. And I founded a company called Coradient with some friends of mine that was probably the first ever, real user-monitoring product. If any of your listeners are familiar with BMC, which eventually acquired the company, their product line for management is called TruSight. So, we built TruSight and sold to BMC. And because that was all about measuring real users, we wound up talking to a lot of people who cared about web analytics.

So web analytics showed you what people did, and TruSight would show you if they could do it, which is a pretty different but important question. Maybe the conversation rate you’re trying to get is not what you’re getting because people keeping getting a 500 error. And maybe you don’t know that because there’s no way the JavaScript loads on the page, so you don’t see it in analytics, for example. We built tools that would show you that stuff.

That got me into the web analytics world, and then I wrote a book called Complete Web Monitoring about how to measure your online presence. Eventually that was a slippery slope towards both running conferences and writing about data and analytics.

It’s kind of come 360 because the idea that you have to drive a differentiated customer experience, and how important that is to succeed in the marketplace—that’s more important today than it’s ever been. So, it’s good work that you guys did.

Yeah. If you go to a store, and you have a bad retail experience, the person on the other side of the counter knows you have that bad retail experience. This idea that you can go to the store online and have a bad retail experience, and nobody knows about it is infuriating to customers.

width=

I was on the phone with a bank today, in fact, trying to get insurance. And this is my bank. I was just trying to get home insurance, right? I’ve exclusively used online systems. They just worked. And this one, I had to go through four different people, and nobody that’s aware of how some guy is getting transferred across three different voicemail operators, misclassified as a graduate of some home inspection program, and then getting accused of lying. The user experience is so critical.

The only way you can get there is through data. And so often we ignore that data at our own peril.

"So often we ignore data at our own peril."

Absolutely. That’s a crazy story.

The funniest part is I went and Googled the guy [they thought I was], and I told her. And she’s like, “Oh, so you do know him.” I’m like, “No, I know how to Google things.”

I probably shouldn’t troll service personnel so much.

Data Science and Critical Thinking at Harvard

I want to definitely touch on some of the work you’ve been doing at Harvard—a class that you’ve been teaching over the last couple years. Tell us about why you started it, what have you been teaching, and what’s come out of it for you.

So, full credit, the professor who teaches the course is a guy named Srikant Datar—amazing thinker, teaches in the design school there—asked me to put together a course on data science and critical thinking for MBA students.

Which is refreshing, because not only do we need data and public policy fields, but also in stuff like business schools. If you’re managing data, the number of ways you can make stupid mistakes is unbelievable. We try and give the students framework to understand data and how to analyze it.

And we do get into the difference between random forest and clustering, and stuff like that, so they understand it and they can articulate it. But a lot of this is, “How do you know the data is real before you collect any questions?” You try to answer how you know if you were successful, how should you store it, how should you retrieve it, how should you display it, and what new experiments should you run.

So there’s a whole workflow or life cycle for data. We go through each of those stages and look at the places where things can go horribly wrong. One of my favorite examples is a thing called Benford’s Law:

Data Science and Benford’s Law

If I had nine buckets, numbered one to nine, and I went through your credit card transactions, and I took the first digit of every transaction—one, seven, four, whatever they are—and I put them in those buckets, most people would expect the buckets to fill up roughly evenly. You would expect there to be as many numbers beginning with a one as with as seven as with a four, and so on.

But the reality is that for almost every set of naturally occurring numbers, you get about 30% of them beginning with a one, and about 20% beginning with a two, and it goes down from there. It’s called Benford’s Law.

That sounds like a neat magic trick, but when Greece wanted to enter the European Union, it filed a bunch of financial data. One economist said, “Hey, wait a minute. This stuff doesn’t match Benford’s Law. I think it was randomly generated.” They ignored him, and you can argue that Brexit is a direct consequence of that kind of oversight.

So these little, tiny things have huge, wide-ranging changes. Just getting people to ask questions about the data they’re looking at, and what it means, and how it’s being used is incredibly important.

width=

So you’re trying to equip these MBA students for that kind of critical thinking.

Data Science and Survivorship Bias

Yeah. I’ll give you one more quick example. During World War II, these bombers were coming back from the front riddled with bullets. The Allies got together and said, “Let’s analyze these and figure out where the bullets are hitting and how we should better armor the planes.” And they had a whole plan for where to put the armor on.

And one statistician, I think it was in the eastern US, said, “Wait a minute. We should put the armor where there are no bullet holes, because those planes don’t make it back.” It’s pretty obvious once you say it, right?

It’s a problem called survivorship bias. We tend to analyze the survivors and over-fit the consequences.

So if you’re a business person, you’re making decisions based on, “What are my customers like?” When you’re trying to grow the number of customers you have, yeah, you should look at what your customers are like.

But you should also look at what the people who didn’t choose to buy from you have in common, because that probably tells you about your offering and it’s often more useful. It’s better to analyze the thousand handsome men who went to LA the same month Brad Pitt did and didn’t get an acting career. Ask them what they wish they’d done.

Getting people to think in these critical ways, or be aware of how they have to work with data and with information, is hugely important today.

“Getting people to think in these critical ways, or be aware of how they have to work with data and with information, is hugely important today.”

Big Data and the Marketplace Today

You talk to a lot of different companies. You’re running these conferences. What’s the level of recognition in the marketplace that this is going on? Are people taking it seriously? Are they really trying to leverage data in a critical thoughtful way, or has there been a lot of movement on this in the last couple of years?

I think the biggest problem remains that people aren’t asking good questions. If you ask a good question, you have so much data you can probably find the answer. In the old days, a manager was someone who convinced people to act in the absence of information. In the modern era, a manager or a leader is someone who convinces people to act by asking the right questions.

“In the modern era, a manager or a leader is someone who convinces people to act by asking the right questions.”

Once you have that question, you need data scientists and statisticians to tell you how to collect the data in an unbiased way—whether you can act on the model ethically or not.

A great example of that is in Boston, the city built an app called Street Bump. Street Bump measured when the car hit a pothole. Sounds great, right? And it worked very well.

The problem was that it told them where all the rich, white peoples’ potholes were. Because if you look at what kind of people drive themselves to work in their own car with a passenger seat they can put a phone on and an unlimited data plan, it’s probably going to be rich, white people.

So was there anything biased about the thing? No, it was great, but there was a systematic problem that needed to be addressed. And only by looking at this data model and going, “Huh, look at that map. That map looks a lot like socioeconomic studies,” can a human step back and say, “I think maybe we have a problem here. Let’s attach these to busses and garbage trucks.” Which is what they did, and they found the rest of the potholes.

So the first step is asking the right question. The next step is recognizing that there’s a cycle of experimentation.

The street problem thing was great because they tried it, and then they looked at the model, and they adjusted it. That’s how it’s supposed to be. Experimentation is the norm. You can spin up computer resources for pennies.

And then: deciding how and why to act on it and managing data projects. Often, people go find a data scientist and say, “Spin me some straw into gold.” Data scientists aren’t Rumpelstiltskin. They need a specific task, and they need to work with people who have domain knowledge and so on.

I don’t think I’ve heard it described exactly that way, and it’s really interesting. There’s a lot of talk right now about cognitive bias, and bias in data, and how maybe people aren’t seeing that beforehand.

But part of what you’re saying is that maybe you would never be able to see that until you’ve actually got it out there and you’ve tried it out. And then you expect that there’s going to be some bias, and you look for it, and you correct it. Instead of just spending all this time worrying to much about it beforehand, you actually have to get it out there and try it with real data. Does that sound right to you?

What’s the Mike Tyson quote? “Everyone has a plan until you get punched in the face.”

width=

Lean Startups

I think there’s a slightly more discreet version of that, which is like, “No plan survives contact with the enemy.”

I’m a big proponent of the lean startup model, and that’s why we did Lean Analytics. This idea of iteration doesn’t just apply to the product you’re building—it applies to the experiments you’re running to figure out what product to build. But you should design an experiment, try it out very quickly, and see what happens.

I’ve worked with companies that have done huge surveys. I show up and get five people to take them, and they can’t figure out how to take the survey. I’m like, “You can’t get five people in your office to answer this correctly?” Why on earth would you spent thousands of dollars sending it out? The cost to build a survey is free. Just go fire up Google Forms and put a survey together. It’s so easy.

Randy Bias had a great quote. (And apologies to animal lovers that are listening to this.) He said, “Once upon a time, we had servers. We named them. And when they got sick, we nurtured them back to health just like pets. And now, we have servers, and we give them numbers. And when they break, we kill them just like cattle.” People haven’t internalized that. Many people still think of IT as this unique and precious snowflake instead of this resource that’s disposable and cheap.

I think both when you and I were starting out, people would name their servers after cartoon characters, and they spent all this time lovingly nurturing them. It’s at the point now where you really can’t do that at scale. You have to really think differently.

Lean Analytics

I’ve definitely enjoyed your book, Lean Analytics, but I don’t know if everybody who’s going to be listening to this is aware of it. It’s key to a lot of the discussion, so can you take a minute and kind of describe the book—how you wrote it, how it’s evolved since then, and how you’re using it with your discussions with companies and things like that?

In 2011 (I think) I was part of a team of four people who launched an accelerator called Year One Labs. It was based on the premise of lean startup, which is when you have an entire year to launch your startup rather than the usual 90 days (which turns into a pitch contest and a lot of accelerators).

We told people they weren’t allowed to write code for the first month, which made them incredibly nervous and forced them to go talk to their customers. And every one of them realized they were trying to build a product nobody needed. It was a great experience.

“Every one of them realized they were trying to build a product nobody needed.”

And Ben Yoskovitz (@byosko), my Lean Analytics coauthor and I would sit down with these companies and say, “How are you doing?” And they’d go, “We’re doing great.” “Okay. How do you know you’re doing great?” “Well, conversions are at 14.” And we’re like, “Is that good?” “We don’t know.”

So Ben and I bought a lot of people beers, and they eventually told us their top secret, internal numbers. Because back then, people weren’t sharing what “good” was. We interviewed about 135 people—some VPs, some founders—and found out what “normal” was for a lot of these metrics.

So the idea behind Lean Analytics is that your company, if you’re smart, will go through five specific stages of growth:

  • Empathy, which is figuring out what the market needs.
  • Stickiness, which is getting the initial users to keep using it—re-subscribing or whatever.
  • Virility, which means they tell others.
  • Revenue, which is where you’re making money and pumping some of it back into customer acquisition.
  • Scale, which is where you’re growing customers and margins disproportionally to your costs.

There are different metrics for each of those stages. The metrics you want to track when you’re looking at virility are about how quickly people are sharing my message and how convincingly are they doing so. That’s different from the stickiness metric, which is about loyalty and “Am I focused on acquiring customers or getting them to return?”

We have this idea in the book of the one metric that matters. If you know anything about math, trying to solve an equation for more than one variable is impossible. You’ve got to isolate one variable, right?

width=

So we urge people to pick the one metric that matters and get it to a place where it’s good enough. If you don’t do that, then you’re probably doing something horribly wrong.

The underlying idea is that you should pick a metric, run an experiment with a hypothesis, see if you move the needle enough, rinse, repeat. Then you’re being much more disciplined about the process of growing your business (of any size) or your product launch if you’re a big company.

What’s happened since the book came out is that a lot of data scientists and analysts used it to explain to their bosses how to talk about analytics and data because we provide some fairly good lexicons.

So a lot of big companies have asked us to come in and help them simplify their dashboards.

Ben and I were in front of a company in Spain a few years ago that showed us their dashboard, and it was in seven-point fonts. We coined the term “mini metrics that might matter” or MMTMM. I think they were okay with it, but you could tell they were like, “All right, all right, stop making fun of us.” But a lot of companies have an MMTMM problem. They need to focus on one metric that matters for that project, or that department, or that startup.

It’s so needed to be able to encourage companies, just overall, in using data to backup decisions. Because there is a tendency human nature to say, “Well, this is my gut instinct, and I’m going to go after this.”

And there’s a place for that—some of the most innovative ideas have been built off of that. But then to not use data and actually not know what success is … then you never know if you’ve succeeded. You never know what you actually have to improve and work on. What you guys did in there is a huge contribution to that.

A Portfolio of Innovation

You gave a keynote at Electronic Arts last week. I’d love to hear what you were talking about and what your impression of that whole process was.

So the challenge for any organization is that analysts traditionally speak when spoken to. We come from a world where data resources are expensive; compute was expensive. The product owner would show up and say to analysts, “Hey, I need to know widgets by country by weather by color.”

And the analysts would go off, and they’d build the schema. They would pour the data into the schema, and they’d bring the reports back like tablets from the mount.

width=

And the project manager would go, “Ah, could you look at religious affiliation?” And they’d be like, “No, but we’re going to do another query now.”

So it was this very bad iteration.

Today, we first collect the data, and then we find emergent schema within it. Once upon a time, we used to find a suspect and collect data on them. Today, we find data and go and look for suspects within it, which is an interesting take on society. We can talk more about that a bit.

The challenge was really to give them reasons why they are necessary for the company’s survival. What I have found, and I’ve been working on this project for a little while under the name “Tilt the Windmill,” (which is a spin on the idea of tilting app windmills) is that companies that succeed have a portfolio of innovation. They have sustaining core innovation—what Clay Christensen would call “core innovation”—which is just to keep the lights on.

“Companies that succeed have a portfolio of innovation.”

If you’re Volvo, that’s cup holders in next year’s car.

There’s adjacent innovation, which is where you change one part of the product, or the market, or the go-to-market method. That would be adding the electric car, but you’re still selling it to drivers and you’re still selling it through dealerships.

There’s disruptive innovation, which is something that actually destroys something. If you’re Mercedes Benz, and you own car2go, the more people that use car2go, the fewer people are buying Mercedes. That’s disruptive.

Then there’s discontinuous innovation, and discontinuities are this philosophical idea that Foucault and others talked about, where the future becomes unknowable. The self driving car is an example of that.

In 20 years time, we’re very likely to go, “People used to drive themselves to work? That’s preposterous. Not only did they have accidents, but when did they shave?” We won’t understand that world, and I think that it’s very important for analysts to think in those mindsets.

width=

So one of the things I did was I had some beers with some friends a couple weeks beforehand and said, “What are some crazy ideas that might affect the video game industry?”

There is a company called Riot Games. They have a very popular game called League of Legends. League of Legends came from an idea of a game called Defense of the Ancients put out by Valve, which was the result of a map mod for Warcraft, which is based on a StarCraft map.

So at what point would an analyst see the adoption of StarCraft and that map being played a lot in contests and figure out that Riot Games is going to emerge?

At what point would you decide that the first generation of people who grew up on video games are now in old folks homes losing their faculties and cognition, and decide that retirement home video games is a burgeoning market?

There’s this question of “What kind of data would you pick?” And I’ve done this for lots of industries.

Innovation, Resilience, and Risk

I was talking to the department of transportation and some folks in the Pacific Northwest about resilience and risk. If you’re someone in the transportation world, you look at self-driving cars as this huge boon because the average car is parked 93% of the time. Which means that we have at least 10 times as many cars as we need.

So that’s great: self-driving cars spares us all the cars … except when there’s an earthquake and everyone needs to flee Cascadia. Are you going to do an Uber share, or are you just going to get your family out of downtown Seattle? There are all these tradeoffs with what happens when technology and society collide, and we don’t really think the consequences through.

“There are all these tradeoffs with what happens when technology and society collide, and we don’t really think the consequences through.”

I like the way you described that. Because I do love to read about the future of technology, but there is also a tendency with the futuristic to view this as very smooth sailing for all these technologies. In reality, as soon as they hit the average person on the street, it’s sometimes really hard to predict how they’re going to adopt them.

I always think back to in the 80s. I grew up in the 80s, looking at Back to the Future and what they thought the world was going to be like now with flying cars and whatever.

That’s a really important point.

We teach home ec, and we teach basic economics to students. We are going to have to teach critical thinking in order to make our population more resilient. And I mean resilient to attack, resilient to disinformation, etc.. Whenever I’m talking to my daughter about marketing and media—she’s seven now—I’m always asking her, “So, what is the objective function?”

In machine learning, an objective function is when you give an AI a goal. “I want you to get the highest score on Pong,” or whatever. “I want you to find the things that aren’t cats.” I think this year is the year when society starts to ask itself, “What’s the objective function of a political party, of a social network, of a marketing campaign?” That’s the first step, to me, in critical thinking: asking yourself, “What is this thing trying to get me to do or say?”

Using Technology to Make Society Better

Among the many things that you are involved in, is conferences, and you were telling me about a conference called Ford 50. I’d love to hear more about what that is, and what you guys are doing with that.

We started the conference last year. There was a void left by a much more tradeshow-y kind of conference that happened in Canada. It had great participation, but the underlying idea was “How do we use technology to make society better?”

There are some pretty important questions there. For example, in North America, we spend 12 billion dollars a year on tax preparation. In Estonia, they spend zero dollars because it’s all done by the government.

And you can make a pretty good argument that when the government does not keep up with technology, it leaves the gate open for someone else to come in. In some cases, that debate is about what should be. Should the government be making chocolate bars? Probably not. Should the government be collecting its taxes? Probably. So there’s a healthy debate to be had about where you draw the line about a government and its role in society.

There is no organization more built on data than government, and there is no organization more likely to be affected by machine learning. Laws are just codes written in complicated words. When we amend a law or change the text, we’re just debugging, because we found an edge case, right? (That’s Lawrence Lessig (@lessig), that’s not me.) Code is law. So I think digital government is an amazing opportunity to rethink what the role of society is and how society works.

The Veil of Ignorance

There’s a thing in policy called “the veil of ignorance.” It’s a thought experiment by a guy named Rawls. He said that we should put a blindfold on, the veil of ignorance, and then design a society not knowing whether in that society we will be a prince, or a pauper, or if we’ll be a saint or a sinner.

So you may, after you’re done, press a big button saying Launch Society. And now Ben Newton is a known serial killer. Or Ben Newton is a single mom. Or Ben Newton is the president. Without knowing ahead of time, what kind of society would you build? And he called this the “veil of ignorance.”

The Triumph of the Commons

I argue that we’ve had so many fundamental changes because of digital technology, like the fact that we can now create things that are abundant. We used to talk about the tragedy commons, now it’s the triumph of the commons that makes Wikipedia. Did you know that Wikipedia was made in the amount of time Americans spend watching commercials in one weekend? Clay Shirky actually did the math. If we took all Americans, and you had them stop watching commercials for a weekend, and they spent that time making Wikipedia, you could get Wikipedia up to about 2011.

That’s pretty crazy. So, this idea of the triumph of the commons, and what can be done, and the ability to create abundant digital stuff—where the marginal cost is zero—means we have to revisit the veil of ignorance. We have to put the blindfold back on and ask, “What can be done? Should we be using direct democracy in an era where every citizen can vote? Or do we still need representation?”

And I think that’s one of the reasons why digital government is a fascinating field. There are so many societal questions. Why is “unemployment” a bad word other than because we use GDP as a scorecard? So there are a lot of really juicy, fundamental policy questions that are wrapped up in technology. And that’s why I’m really excited about Ford 50.

The Future of Analytics, Tech, and Society

What’s next for you? What are some of the things that you’re looking at that are kind of just nascent ideas in your head that you think you might chase after in the next couple of years?

1. Scale Tech

One thing I’m spending a lot of time on is in scale tech. Everyone talks about the scale up problem. We have lots of startups, but we don’t have a lot of scale ups. And those scale ups are big enough to catch the attention of their competitors, but still weak enough and small enough that they can be taken out.

So they have to invest heavily in technology that gives them some kind of force multiplier. You know the old parable that the shoemaker’s children are most poorly shod? Often times these tech companies don’t employ the technology internally to scale efficiently.

So I’m spending some time with a few very large-growth, scale-stage VC firms on that.

2. The Dirty Secret of Successful Startups

The other thing is a pet project I’ve been working on for a little while. I wrote the first blog post on my Medium page. It is based on a talk I’ve given a few times that is continually the most controversial talk. It’s called ‘Just Evil Enough.’ The dirty secret of startups is that every successful startup did something evil, slightly evil, that allowed it to get a platform to behave in an unintended way.

A technical example of that would be Farmville. When Farmville started out, they realized that their app could post to your friend’s Facebook feeds. They got 30 million users in a month. Then Facebook went, “Oh, that’s a vulnerability,” and they patched the vulnerability. Script kiddies annoy the hell out of me, but this idea of a zero day growth exploit is awesome.

I’ll give you one more example: Tupperware. When I say “Tupperware,” you don’t think of the Tupperware box, you think of Tupperware parties.

The party was a way to get a platform called “the dinner party,” and turn it into another—subvert it to your advantage, which is how hackers think.

I like the new definition of marketing (which I’m going to be writing about in this book), which is that the goal of marketing is to create attention you can turn into profitable demand. To do that, you have to be slightly evil. And I make a pretty good case on it. I’ve been collecting lots of dirt on how companies got where they are that I’m going to put it together into a book.

I’m definitely going to take a look at that book. It sounds awesome. I think you’re doing a service to us all, to uncover the dirty little secrets.

I think the stuff that you’ve got your hands into is absolutely fascinating. And I look forward to seeing how this book goes and the other things you’re working on. Thank you so much for taking the time.

The guy behind the mic

Ben Newton

Ben Newton

Ben is a veteran of the IT Operations market, with a two decade career across large and small companies like Loudcloud, BladeLogic, Northrop Grumman, EDS, and BMC. Ben got to do DevOps before DevOps was cool, working with government agencies and major commercial brands to be more agile and move faster. More recently, Ben spent 5 years in product management at Sumo Logic, and is now running product marketing for Operations Analytics at Sumo Logic. His latest project, Masters of Data, has let him combine his love of podcasts and music with his love of good conversations.

More posts by Ben Newton.

Listen anytime, anywhere

Available to stream or download via these and other prodcast apps.