How to develop statistical intuition with Tim Wilson
In this episode of the Marketing Analytics Show, Tim Wilson shares what statistical intuition is and how it helps marketers make better data-informed decisions.
You'll learn
What statistical intuition is
How it helps you make better decisions
How to nurture statistical intuition
Some useful resources to get started
Subscribe to the Marketing Intelligence Show
Learn from Supermetrics' experts how to use data to fuel growth and maximize the ROI of your marketing spend.
Anna Shutko:
I’m your host, Anna Shutko. And today, our guest star is Tim Wilson, Senior Director of Analytics at Search Discovery and co-host of Digital Analytics Power Hour. In this episode, you’ll learn how to define statistical intuition and how marketers can develop it. How can statistical intuition benefit both individuals and organizations? As well as what are the tools and languages one can start with, to perform the statistical analysis? I hope you’ll enjoy this episode.
Hello, Tim, and welcome to the show.
Tim Wilson:
Thanks for having me. I’m excited to be here.
Anna Shutko:
I’m super excited to have you. And today, we have a super, super interesting topic suggested by Tim, which is statistical intuition. So my first question to you would be, how would you define statistical intuition? And why do you think it’s so important for analysts?
Tim Wilson:
So I have been trying to figure out the best way to define it because it’s one of those things that I have an intuition that there’s this thing called statistical intuition, even though I don’t think it’s a formally defined term. So a lot of times, I think about the word statistic or statistics, that they’re slightly different words and have very different meanings.
So if we think about a statistic, which analysts are very comfortable with, that’s just a fact about a set of data. That’s the mean, the median, the mode, whatever. And then we move from statistic to statistics, which normally would just be a plural. Okay, that’s multiple of those. But it’s not really. Statistics is the science of trying to make inferences about a population from some sample of data.
And to me, statistical intuition is the, once you’re doing statistics and actually start really thinking in terms of the samples and populations, and independent variables and dependent variables, and type one and type two errors, you wind up with this profoundly different way of thinking about data and analytics. And not thinking just, oh, this was the average conversion rate from paid search is 2.6%. And start recognizing that what we’re doing with data and trying to do is actually decision-making under conditions of uncertainty, which is a scarier and broader area.
But to me, that statistical intuition is actually just having a deeper sense of the data, what it can and what it can’t tell you, determinism versus probabilism. Michael Ershov always tells me it all goes back to David Hume in the 1700s. And I’m not quite there yet, but I’m pretty sure he’s right.
Anna Shutko:
All right. Awesome. I really love how you mentioned that this statistical intuition and understanding of statistics improves decision-making. So can you please tell us about this maybe on a bit deeper level? So how exactly can statistical intuition improve how marketers and analysts make decisions or think and approach decision-making?
Tim Wilson:
Sure. Yeah, I mean, to me, it’s a more structured way of thinking. And I think where I’m at Search Discovery, our head of data science, I tend to pick up on things with really smart people, and then they say something a couple of times, and then I find myself repeating. And Dr. Joe Sutherland, the number of times he will be in the middle of a conversation about some analysis or some question, and he’ll say, “Wait a minute, what is our dependent variable?”
And to me, having been in this space for 20 years and realizing how long I don’t just focus on, wait a minute, the world that I’m working in, the world that marketers are working in, there are things that we are trying to influence. We were trying to drive more orders, or more leads, or more profit. Having real clarity at the moment around what is our dependent variable? And then saying, well, if that’s a dependent variable in the statistical way of thinking, what are your independent variables? What is affecting those?
To me, even going a little deeper and thinking about, well, my independent variables, there are some of those that I can influence. I can influence how much I spend on paid social. I can’t influence what day of the week it is or what the weather is going to be outside. Both of those may influence my dependent variable.
So to me, and even thinking about things like the unit of analysis, thinking about causality. When am I looking at data and just coming up with a story that fits the data, versus when am I actually taking the data and figuring out if I can validate the hypothesis truly? Am I finding a story that fits the data, or is the data actually telling me that story?
So to me, it just gives a much more structured and deeper way of thinking about framing the problem space that we’re trying to work in. And then also pivoting and saying, what’s the decision I’m trying to make? Am I really clear on it? What can I influence? And then how can I use the data to really support it? It’s kind of squishy, and it’s kind of scary, and it’s kind of hard. But I also think it’s a much, much more productive way to actually try to put data to use in an organization.
Anna Shutko:
I really love your example with dependent versus independent variables. I never thought about it this way, that there are some conditions that you cannot actually affect. I think it can be super helpful for marketers.
So now, if we continue this topic of marketing analytics, how can this statistical intuition help marketers better understand attribution models and what events they can affect, what event they cannot affect, and maybe calculate the probability of a certain event better?
Tim Wilson:
Yeah. Looking at multi-touch attribution, which at its highest level, we tend to be looking at marketing channels. It’s comical. I have had cases in the past where a marketer has looked at the results of some attribution model and said, “Well, clearly we need to drive more direct traffic. That’s our highest converting channel.” And it’s like, yeah, but you can’t do that. When you’re looking at marketing channels, think about the ones that you can influence versus the ones that you can’t.
But I’ve been on the naysayer side of multi-touch attribution, pretty much for my whole career. And it’s why it has changed over time. I feel like as I’ve developed my statistical intuition, I’ve gone from what was one set of valid criticisms to a much broader and more profound set of criticisms.
So if I look at the history of multi-touch attribution, we started with last-touch attribution, which was fine. And I think it was pretty valid. What was the last touch that we could identify? That’s useful. Then we headed down this path of all thought leaders getting out there, saying, “Well, you’re undervaluing display. You’re thinking of the world too simplistically.” And that was this critical point where the entire industry took a wrong turn and said, “Ah, we must chase this idea of if we could just track every customer across every single touchpoint, then we could just use the model.”
And it headed down this, which is laughable to data scientists. I’ve had two different data scientists independently get into the world of how marketers do multi-touch attribution. And they’re like, “Are you kidding me?” This is absolutely ridiculous to be picking a heuristic model of last-touch, first-touch, time-decay, inverse J curve.
And that’s really for two reasons. One that I’ve always had an issue with is it’s really, really hard to track one person across all of the touchpoints, from an impression, through all their clicks, across channels, across devices. And that’s getting harder with the death of the cookie. It was bad data getting worse. Intuitively, I knew that that also ignored what the person’s history with the brand was? We weren’t valuing that.
The way that data scientists look at multi-touch attribution, and again, I’ll point to our head of data science at Search Discovery. The example he used with me was to think about going to a grocery store and standing behind the checkout counter. And watching until somebody is approaching the checkout counter with a case of Coca-Cola, and then running up to that person and saying, “You should buy Coca-Cola.” And then stepping back and watching them. And, oh, they bought Coca-Cola. And I was one of the touches in their journey. Therefore, I should get some credit.
They were going to buy it anyway. And that’s this huge challenge with multi-touch attribution is it tends to say, let’s take the entire pool of revenue or leads or orders, or whatever. And let’s divvy them out across all of these marketing channels. It completely ignores if you shut off all your marketing, chances are some people would still purchase. Even if you try to get to algorithm models, it still does that.
That’s not to say, well, do you throw up your hands and say, “Forget it. I can’t do anything”? No, but it does mean that really if you want to get to causality, what is the real impact of a particular channel, you probably need to do some level of experimentation. If you want to know what email is contributing, that’s one of the easier ones. Well, you need to send emails to part of your customers and don’t send emails to those. And look at the mean effect between those two groups.
And you can do that with any form of media, as long as you have enough budget and a broad enough reach. You can do a randomized control trial to get to that. So to me, attribution is one that, and increasingly we see more and more of our clients say, “Okay, we really want to figure out what is the impact of display advertising?” I can’t track it at an individual, personal level. I can design an experiment and really get to causality on that front. I could talk for two hours on the topic of attribution.
Anna Shutko:
Right. No, that was amazing. And I really loved the Coca-Cola example. But if we could come back to statistical intuition and how it can help individuals flourish in their organizations.
So another question I had was, how can people without any technical background develop it? For example, I’m a marketer. I studied business management. Is there any easier way for me to develop this statistical intuition? And then how can it help me flourish in my role within the organization?
Tim Wilson:
I wish I could point to the one book because I feel like I’ve stumbled along in different ways. When I dove into this in earnest, which was four or five years ago, I was really doing it under the guise of learning the R programming language and knowing that I needed to learn more statistics. And so I went, and I had had a couple of statistics classes in university, but I got Statistics in Plain English. I got the comic book Guide to Statistics, and that really didn’t help.
When I’ve looked back, I’ve realized how much that Nate Silver’s book, The Signal and the Noise, was a really good read because he started talking about the nature of uncertainty, not really in business or marketing context. But actually, in hindsight, I realized how much that did sink in.
I think there are some other great… Emily Oster, she’s been in the news again, quite a bit, around COVID and school lockdown. But she really came to prominence with a couple of books about investigating what the data says about pregnancy and what they can and can’t do. But she’s an economist. She’s an economist at Brown. And I think economics is trying to often just look at data occurring in the wild, which is what marketers are often working with. And trying to figure out what conclusions can we draw?
So I would say, read anything Emily Oster’s ever written, or hear her speak. And I would also say Cassie Kozyrkov, who’s Head of Data Science or something at Google. Very lofty title, but she is an amazingly prolific and compelling speaker. She has all sorts of videos. She has a podcast as well.
I don’t quite see eye to eye with how she views the world of analytics versus data science. That makes me think I’m probably wrong. But she has lots of examples of actually trying to explain prediction versus descriptive material. To me, it’s to consume those things and then start really thinking critically about the problems or the questions that you’re asking in your own day-to-day business world.
Anna Shutko:
Awesome. Thank you. These are really good recommendations. And following up with that, could you please share, what are the tools that could help me start with statistical analysis? So maybe you’ve briefly mentioned that Excel spreadsheets could be enough, and what should I learn in addition to using this so that I could master that?
Tim Wilson:
I mean, I am still trying to figure out how to master all of this stuff. I can drop some of the buzzwords that I still don’t fully, fully get. I mean, to me, with Excel and I go back to even when I was in statistics classes, can you do statistics with Excel? Absolutely. There’s the analysis tool pack that you can put in as a plugin. You can start to run regressions. It’s kind of clunky. It gets kind of scary.
As an analyst, I’m a big, big proponent of actually learning R or Python, either one. If you’re already using SQL, moving into a programming language that has a lot of statistical capabilities that are very, very easy to run models. And then start getting familiar with interpreting the results.
So when you look at the result of a regression, recognizing that if you throw a complete noise into a model and into linear regression, you will still get a result out. But it’ll show you that, yeah, there’s very, very low confidence with any of that. And starting to actually think, what does this really mean? How can I actually make this a decision?
But I wish I could figure out the clear, what’s the six-week prescription to really do this? To me, I feel like I’ve been on this rambling, meandering journey for a number of years now. And I haven’t quite figured that out. So awareness is the first step, and then start looking for opportunities to dive in a little bit deeper here and there.
Anna Shutko:
Yeah. I think this is super interesting. And could you please share a bit more about your personal experiences? So what did you start with, and what did your statistical learning journey start with? Did you just start with looking across the organization and seeing which problems you could take that could be solved using statistical analysis? You just started using Excel or any other spreadsheets, and maybe you could share what language you start with learning? So what did that journey look like? What could people start with?
Tim Wilson:
Just for my own career, while working, I wound up taking statistics classes at two different times, ten years apart. And in both cases, the class went fine. It was pretty interesting. And I literally would turn back to my day job and had no idea how I could do anything with what had been taught because that was the theoretical data sets that weren’t relevant to me. And I couldn’t do anything with it. So both times, that went nowhere.
For me, I decided, just because there was a lot of chatter around it and I felt like I was probably missing out on something, I thought I will learn the R programming language. And pretty quickly, anywhere you start reading up on R, you’ll hear, “Oh if you’re going to learn R, you will have to learn statistics as it goes along.” I don’t know if that’s entirely true, but they have moved in tandem.
So the more that I got to where I was using R, the more I got to where I was more comfortable pulling and manipulating large data sets. And I was starting to come across these things like, oh, I could do just linear regression. Well, where would I actually apply that? How do I take web data and actually break it down, so that I can actually use and do a regression?
And then it really was. Initially, I was just doing the stuff I’d always done, but doing it more efficiently and more robustly because I was programming with data. But then I also was slowly starting to realize that, oh, I’m being asked this question about marketing channels or device types. And it turns out if I pull the data at a little more granular level, instead of just pulling here are the facts with the data that I looked at, I can start saying, what’s the variability within that data? And start doing a little bit of a richer analysis.
And so my analyses went from being horizontal bar charts to horizontal bar charts with error bars on them, to say this is the variability within the data. But it has been a little bit tough. It really worked moving to an organization, where there were more advanced analytics and more data science going on. And then really partnering with that group to figure out.
And a lot of times, data science teams within organizations, they’re frustrated when it comes to digital analytics data because they don’t have the domain expertise. And the analysts are talking about this multi-touch attribution silliness, and they just cannot connect. And so for me, a lot of it has been, oh, I can bring my domain expertise to a data scientist. Now, we can actually work together to say what can we say about the data? What can we not say about the data? How do we actually approach it? But it has been a winding and inefficient path that I hope the industry starts to get better at.
Anna Shutko:
Awesome. I really love your journey, and I also love the fact that you’re trying to encourage the conversation between the data team within the company and your team with your domain expertise. I just love it. Thank you so much for this conversation. And Tim, if the audience would love to learn more about you, where can they find you?
Tim Wilson:
So I float around on social media. I’m on Twitter, @tgwilson. I tweet a couple of times a week, but I’m on Twitter maybe every day. So I’m readily find-able there. I do cohost the Digital Analytics Power Hour. We’re about to do a rebranding. So that’s analyticshour.io or @analyticshour on Twitter. And then I’m also readily find-able on LinkedIn, as well as in the Measure Slack team.
Anna Shutko:
Awesome. I love Tim’s podcasts. So if you’re a fan of analytics, please go ahead and check it out. And yeah, Tim, thank you so much for coming on the show.
Tim Wilson:
Thanks so much for having me. This was fun.
Anna Shutko:
And that’s the end of today’s episode. Thanks for tuning in. Before we go, make sure to hit the subscribe button, and leave us a review or rating on Apple Podcasts, Spotify, or wherever you’re listening. If you’d like to kickstart your marking analytics, check out the 14-day free trial at supermetrics.com. See you in the next episode of The Marketing Analytics Show.
Stay in the loop with our newsletter
Be the first to hear about product updates and marketing data tips