Marketing mix modeling or MMM is seeing a renaissance, over 60 years since it came into common use. Unlike most marketing attribution methods, MMM doesn’t require user-level data, instead of modeling what channels deserve credit for sales by statistically mapping spikes and dips in spend to actions and events in your marketing channels. Upgrading from simple linear regression to techniques like ridge regression or Bayesian methods, marketing mix modeling is being reinvented for the modern age.
However, there are major hurdles to overcome. Building a model can take 3 to 6 months, according to Meta/Facebook, which has been working on its open-source MMM library since October 2021. By its estimates, about 50% of the time is spent collecting and cleaning data before modeling begins. This matches my experience at Recast—and previously Harry’s—as well as the results of a CrowdFlower study which found that 60% of data science time is spent cleaning and organizing data.
Fast forward >>
Data cleaning is 60% of the job, and how to make it 0%
To build an accurate model, you need your data in a specific format. Getting the data ready is time-consuming, so MMM projects take longer than they need to. This makes MMM a specialized and expensive skill, so most companies can only build one to two models a year. If you can automate the process using a tool like Supermetrics to build an MMM data feed, you can have your model regularly updated, allowing you to better optimize your marketing budget.
Tabular data format
To build a marketing mix model, you must have your data laid out in an unstacked, tabular format. This means one row per observation—usually days or weeks—and one column per model ‘feature’—typically media spend and organic or external variables. Categorical data—for instance, a list of national holidays—need to be encoded to dummy variables—1 when it’s that holiday, 0 when it’s not.
Joined data sources
To build a marketing attribution model, you need to have all of your marketing data in one place. This is what Supermetrics handles for you automatically. With over 90 connectors, all of your marketing spend, events, and activities can be pulled together into one place, manipulated as needed, and then exported to the format and location you need.
Exporting to Google Sheets
Once you have a Supermetrics account, you simply need to go to Extensions > Add-ons > Get add-ons and install it. It’ll ask you to authenticate with your Google account linked to your Supermetrics account, and then the sidebar will appear in the extensions menu.
Once this is done, you can launch the sidebar—if it’s not already launched—and click to create a new query. Queries are how you decide what data to pull and from what accounts. When you select one of the ad platforms like Facebook Ads and Google Ads, it’ll prompt you to authenticate and grant Supermetrics access.
Then you’ll choose the account you want to pull data from and the date range. Finally, choose your metrics—usually cost or impressions for MMM—and dimensions—only select the date to be consistent with the tabular format.
Optionally, you might want to add a filter if you need to select a specific set of campaigns. For example, if you had ‘YT: ’ in the name of your YouTube campaigns, you might want to select those as a separate source, then duplicate the query and filter for each of your other campaign types.
When you’ve finished your query, make sure you’ve selected the cell where you want the data to pull into, and click ‘Get Data to Table’. If you make a mistake, just duplicate the query and put it in the right place, deleting the other one.
I find it helpful to put the name of each source in a cell above the table so I know where I’m pulling the data from. The result should look like this:
Building a marketing mix model in Google Sheets
Marketing mix modeling is a powerful tool for attribution, but it’s actually more accessible than you may think. Most practitioners use custom code and advanced statistics, but you can do the basics in an afternoon with nothing more than Excel or Google Sheets.
Linear regression with the LINEST function
Excel and Google Sheets both provide a simple method, the LINEST function, for doing multi-variable linear regression. LINEST works by passing the column we’re trying to predict, then multiple columns representing the variables we’re using to make the prediction. The final two parameters are whether we want an intercept line—usually 1 for yes—and whether we want the output to be verbose—containing all the statistics for the model, not just the coefficients.
Note that the X variables we’re using to make the prediction need to be consecutive, so I’ve just referenced the columns on the left to repeat the values next to each other.
Re-forecasting with model coefficients
Now that we have a model, we need to use the coefficients to estimate the impact of each channel. If we take the top row of numbers, those are the coefficients, and multiply them by the corresponding input values from our data—we’ll get the contribution of each variable to total sales.
One thing to watch out for is that LINEST outputs the coefficients backward. The first value starting from the left is always the last variable you input, then they continue in reverse order until you get to the last value, which is the intercept. If you add up all of these contribution values, it gives you the predictions from the model, which you can compare against actuals to ensure the model is accurate.
Checking model accuracy metrics
How do we know whether our model is reliable? The model should fit the data well, it should be able to predict new data it hasn’t seen, and it should have plausible coefficients. Several validation metrics capture these requirements.
Check the functions in the template to see how to calculate these metrics.
To use the template, go to ‘File’ > ‘Make a copy’ > ‘Launch Supermetrics’ from the list of add-ons > duplicate this file for another account and then proceed to the account selection.
R2 or R-Squared is a measure of how much of the variance in the data is explained by the model, and it’s between 0 and 1: a good model would be above 0.7, but anything approaching 1 is probably suspicious. Close to 0, like our model is, is a sign we’re not including enough variables in our model and need to incorporate things like organic channels, holidays, and macro-economic factors.
‘Normalized Root Mean Square Error’ is how we measure accuracy, and it’s found by taking the difference between model predictions and the actuals, then finding the root of the squared values as a percentage of the actual value. Ideally, this is done based on unseen data—a holdout group—but in our simple model, we just calculated the error against in-sample data.
The root and squaring procedure handles negative values for us and acts to penalize really big errors. This can be interpreted as the percentage of the model is off on any given day, so it’s a useful, intuitive measure.
Plausibility is a big topic, and it’s usually something that an analyst should have the final say over. However, it’s helpful to have a metric you can programmatically calculate so that you have an understanding of how far the model deviates in terms of its findings from your current channel mix.
Decomp RSSD is a metric invented by the Robyn team at Facebook which measured the difference between your current spend allocation and what channels drove the largest effects, as predicted by the model. If the model said your biggest channel didn’t actually drive that many sales, then you would have a high Decomp RSSD.
In our case, we have a high value of 0.6 because the model gives too much credit to Facebook, which represents a small amount of spend.
Delivering MMMs automatically and at scale
Marketing mix modeling is one of those activities that’s infinitely scalable. You can get decent results in an afternoon with Excel or Google Sheets and Supermetrics, as we’ve done here, but you could also spend 3 months with a team of 6 data scientists writing custom code with sophisticated algorithms like Bayesian MCMC to build something more robust and accurate.
There’s a checklist of features that go into building an advanced model, some of which require advanced statistics knowledge. Add into the mix several expensive data engineers for building data pipelines if you don’t use Supermetrics to automate that part for you.
Be warned: MMM is hard. You could spend $500, $5,000, or $50k on modeling and see wildly different outcomes in accuracy and robustness. What really matters is the opportunity cost of getting your marketing spend allocation wrong.
If you spend $10k a month, then a spreadsheet model once a quarter will be fine. However, if you’re spending over $100,000 per month, even being off by 5% can cost you tens of thousands of dollars over a year.
That’s when it makes sense to invest in more advanced modeling. Conduct a build vs. buy analysis to decide between a custom solution built on open-source libraries like Facebook’s Robyn or advanced attribution software like what we’ve built at Recast.
About the author
Michael Kaminsky is a trained econometrician with a background in healthcare and environmental economics. He previously built the marketing science team at men’s grooming brand Harry’s before co-founding Recast.