How to build a data warehouse with Sebastian Mehldau
In this episode, we catch up with Sebastian Mehldau, Head of Growth Marketing at Vanmoof to learn his secret to successfully rolling out a data warehouse.
Here's what you'll learn
Subscribe to the Marketing Analytics Show newsletter
Be the first to know when a new episode drops.
Subscribe to the Marketing Analytics Show podcast
Learn everything you need to know about marketing analytics.
Hello, Sebastian, and welcome to the show.
Hi, Anna. Happy to be here.
Awesome. Thank you so much for coming. I’m super excited for this episode. Now, let’s talk more about marketing. Marketing data warehouse is the topic for today. So my first question to you, Sebastian, is why homegrown Excel databases don’t work as a solution anymore? And when should marketers actually start thinking about moving from spreadsheets to a data warehouse, and what are the key benefits of using a data warehouse to store marketing data?
Sure. I think before I answer that question, I would quickly touch upon the concept of homegrown databases. It can be Excel types, it can be Google Sheets, but it could also be any other tool that is not optimized for storing large amounts of data. So it could be something else. And I think the moment these homegrown databases in most organizations that are on Excel or Google Sheet stop working is when the amount of data becomes just too large for such a file to contain them efficiently and effectively. But not only that, but also when an organization starts to grow and these homegrown databases start to emerge in different departments and different silos, then it’s just convenient to find a proper database solution or a data warehouse that is optimized for holding large amounts of data.
All right. I really love how you gave such a precise explanation of this. Maybe you could elaborate more on the benefits of using a data warehouse to store marketing data and what is the creation process there? So how long does it usually take to set it up, and maybe you could talk more about the stakeholders involved and what each of the stakeholders does within the process.
Of course. So maybe first about the benefits of using a data warehouse, I would say that the modern cloud-based data warehouses are a very good product, finished products that are easy to use, easy to set up, but the main benefits basically are query speed, so they handle large amounts of data with incredible speed and they’re always available. So they are solutions that are managed with little effort. A small team can set up the database, the data warehouse in the cloud. And so the process depends really on the organization. So the process to set up such a data warehouse is different from organization to organization, but I can talk about our experience at VanMoof. I joined VanMoof around three years ago when it was still a very small organization, so we didn’t have a dedicated data engineering team.
And so the process was very simple. I looked at the different products and offers that were in the market, and I will cut it short, but we chose BigQuery for different reasons. And one of them is that it’s easy to set up. And once we had identified the solution, BigQuery, we prepared a business case for decision-makers, where we explained why it was important to move as much data as possible, starting with marketing, to a modern data warehouse.
And in that process, of course, we identified power users from different departments that could have an interest in developing the solution and implementing the solution. And in general, we decided to go for a very entrepreneurial style where we just learned on the go. Having said that, Most of the time, implementing a data warehouse also requires expertise from people who are used to managing databases. And so we also constantly checked with somebody who helped us with the data models, with defining the tables, the structure of our data, and so on. So it is advisable to have somebody next to you who knows a bit about databases. However, also more and more, there are really good tools to close those gaps.
Great. And you mentioned that while you were preparing the business case, you also prepared the arguments of why you should use a marketing data warehouse. Could you please elaborate a bit more on that? What did the arguments look like in your case, and how did you collaborate with the data team and other stakeholders who knew more about marketing data warehousing.
What were the pieces of information you provided for them so that they could make sound decisions on how to move the data, what data they should move, which databases they should pay. So basically from a marketer’s perspective, what should the marketing team communicate to the data team and what arguments should they give to the C-level people when they realize that okay, we would like to build a marketing data warehouse.
Yeah, I think one of the very strong arguments was that we were spending large amounts of time on building reports in Google Sheets. And so one of the key elements, in that case, was the time we would save by automating certain processes of data capture and data visualization, for example. So that was one of the arguments at that time, that we would save by implementing a modern data warehouse.
The other very strong argument was that in order to really combine different sources, to correlate different data sources, we needed to have one central database where we could have all the needed information and the different tables. And so this is a very common problem, and the solution is also typically a data warehouse. When you start having several data silos or data islands, then it makes sense to look into data warehouse solutions.
And that was probably the second strongest argument. Then others would be that we of course have the other databases available in the organization, but they are not all designed for analytical purposes. So we were looking for a solution that was tailored for analytical purposes, and two key factors are usability and also query speed.
So it is very important that the data model is easy to understand, that it is independent of operational databases so that analysts feel free to play with the datasets available and that they also know how to navigate the database. So it has to be easy to use, easy to retrieve the needed information.
Yeah, no, I just wanted to add how to involve the different stakeholders. Probably, you really need to be very clear with your objective, right? So in our case, we said that our objective was to build the datasets that would allow us to do a deeper analysis of our marketing activities. And so as soon as we described that objective, and we said that that was our first and most important interest as a marketing team, then most people wanted to support it. And yeah, that’s the only thing I wanted to add.
Yeah. Thank you so much for sharing. I also really like how you mentioned that the solution should also be tailored for analysts and optimized for usability and query speed. And can you please add a bit more to that? So are there any best practices for creating a marketing data warehouse?
For example, you previously mentioned in your webinars that you have two instances, one for testing, another one for production, loads from data sources that are batched, so dispatching and integrations with different database platforms like Google Data Studio, for example. So in addition to these, are there any other stocks the teams should consider while they’re moving their data to the data warehouse?
Yeah. So I also have to be very honest at this point that I have some experience in data, of course, but I was more of a product owner, so gluing the different teams together and finding the expertise and making sure the communication objectives and so were clear. However, I can say a little bit about those best practices, for sure.
For example, let’s start very simply with production and testing environments. Designing a schema, or understanding the data model, and so on. Sometimes a bit of an experimentation process, right? So what we would do is we would just start doing things in a bigger instance that we called the sandbox, or a testing environment where we would have the freedom to test different tables and to run queries against them, maybe connect them to other systems, build certain data pipelines without the fear of breaking anything.
And in parallel, every time we found the right solution, we would then implement that in a separate, bigger instance that is a production instance. And that’s where we try to … Let’s say that is the product that we constantly try to improve, but only when we are certain about the decision or about the right solution. So I think just because you alluded to the testing and production environment, I would always recommend having some sort of sandbox or testing environment. And then when you have found the right solution when you know that things are accurate and that they work the way you intend to, then you move into the production environment.
And so other best practices would be … I don’t know if this is a best practice, but a recommendation is that people involved in the design of a data warehouse should definitely focus on the 20% of the data that they use most and really, really focus on that end, and be very detail-oriented for those datasets. And then you have a long tail of other data sources and data points, and yeah, the information that organizations collected that are less relevant. So just spend less time on that 80% of that long tail. But that is something that worked for us. It doesn’t mean that they will work for everybody.
Right. And you mentioned that you were working on improving the collaboration between different stakeholders during this migration process. And I heard that VanMoof also created the so-called data collaboration model, where you use this marketing data warehouse not just for storing data for growth marketing activities, but also for other departments and their activities and their results.
So can you please talk a bit more about that model and how can an organization collaborate together to create a marketing data warehouse, where in addition to the marketing data, there would be data from other departments as well.
Yes, I think that is a really good question. And again, the answer is probably different from organization to organization or from company to company, but what worked in our case, and I think it would work in other places as well, is that to highlight the shared interests. So all organizations have a common interest, right? They work towards similar goals, to the same goals. And so we always focused on what matters most for VanMoof, for the company and we tried to align on those objectives.
So that was the first thing that I always made sure that everybody agreed upon, that we have the same goals and that we are working towards the same direction. Then, of course, something that I learned early on in the process is that it made sense at the beginning to have the marketing data warehouse, but the real value in having a data warehouse is when you can start correlating different data sources.
For example, we don’t want only Facebook, Google Ads, and Google Analytics data. We also want real-time data from our ecommerce database. So in order for that to happen, we had to talk to the ecommerce team or the digital team, and explain to them very clearly why we believed it was useful not only for the marketing team but also for other teams such as the ecommerce team to have the same database or to use the same place for analytical purposes.
So we started expanding to other departments of VanMoof, and we made this collaboration model where we identified common interests, where we also identified what is unique to each department. And I started importing more and more datasets to BigQuery. And it really never was a problem because when we started doing this was also a moment of big changes for VanMoof, and we all just identified the same needs, which was to move away from homegrown databases.
All right. On speaking of data, you’ve also previously mentioned that there were three stages of marketing data warehouse maturity, which are crawl, walk and run. So can you please tell us more about these and maybe elaborate more on what happens during each stage?
Sure. So I’m not sure if this is exclusively for data, but this is a framework that we use a lot, which is essentially acknowledging that you cannot get everything right from the first day. So in the beginning, we did baby steps. We were crawling. So we said, okay, we have to be happy at the beginning with creating our first dataset in BigQuery. Or running very simple sales analytics, or having a very simple dashboard in Data Studio, for example. So we said, let’s take baby steps, let’s get the basics right. And once you have the basics right then you can build upon that and then you can start walking. So for a person to be able to walk, that person has to crawl first.
But when we were in the walking stage, we started to do more complex things, but that was only after we had the basics right. So for example, in the walk phase, we were not talking about simple BI dashboards and sales analytics, but instead, we were also talking about real-time insights, maybe product analytics, and even simple machine learning models. And then of course after that, once you’re walking, you can also start running. You can go faster, you can go further. And so that is when the model is helpful. So what do we see in the future? Running, right? So maybe machine learning and production where we can do predictions in real-time or forecasts in real-time about our sales or about our stock levels.
But just in general, more advanced analytics. And so we sometimes talk about another phase, which is after running, maybe there’s flying and that’s basically where we dream what we could do next is not a specific timeframe, but let’s say a year or two from now. We always have that end goal in mind, which is really what we want to do in the future, improve cities and city livability. And we are constantly finding ways to do that with data. And I know it sounds a bit abstract, but it’s always also good to have such an end goal in mind that keeps people aligned and motivated.
I really liked that model, and I also really like how you added the fourth stage of flying because I do think it’s super important to have bigger goals and truly align your team to achieve something bigger. And now let’s talk a bit more about the reporting part of this whole process. So now that you have your marketing data warehouse or company data warehouse set up, you have access to a lot more data sources and larger volumes of data, meaning a lot of historical data. So what kind of reports can you actually create with all this historical data? And another question here would be, what kind of reports can you create when you combine marketing data with the data coming from other departments, and what are the key benefits of doing so?
Yes. Good question. So maybe first, what kind of reports it allows and enables? So we have fully automated reports, both on dashboards, but also even still today in Google Sheets. For dashboards, we have several dashboards. Nowadays, we use Looker, but we started with … What is the Google product called again? Data Studio. So we started with Data Studio, but now we use Looker. And so we have, for example, our sales dashboard, we have a marketing dashboard, we have our operations dashboard, we have our supply chain dashboard, and many others. So we have per department, we have at least one dashboard that is fully automated. This also enables people who are not necessarily very proficient with SQL, or SQL, to do their own data exploration. So that is important.
But then we also know that people keep using Google Sheets, but what is really cool about, for example, Google BigQuery, so the data warehouse we use is that it connects directly with Google Sheets. So people can create reports in Google Sheets and fully automate them. We know that there are other solutions, for example, the Supermetrics plugin for Google Sheets, but we have moved more towards using BigQuery live connection with Google Sheets. And really, what kind of reports you can produce depends on the data that you have available. So you asked for an example, what correlations we can do between marketing data and other data sources or other departments.
And a clear example, a very simple one maybe is that previously we were only able to report on gross sales as registered by Google Analytics. And nowadays we are able to correlate, for example, marketing spends with net sales in our e-commerce database. And we also are able to see, for example, cancellations. And we can do analysis, for example, what are the channels that are most profitable? Are there any marketing channels where we see a higher cancellation rate?
This is a very simple example that probably most e-commerce companies have available already, but we also have more unique examples at VanMoof where we start correlating test rides that people take in different cities with sales. So we have one data source that test rides that people take in our stores or in our test ride locations. And then we correlate that information with sales and we can see the conversion rate from test rides to sales online. And then the possibilities are endless, right? So we do many more things, but I think these are two very simple and concrete examples that I can share today.
These sound awesome. And thank you so much for breaking down all the data processes and sharing all your useful insights in such a clear and organized way. And now Sebastian, if our listeners would love to learn more about you, where can they find you?
They can find me on LinkedIn, I think that would be the first place to find me. My name is Sebastian Mehldau. So pretty sure you will share my name somewhere how it’s spelled, and they can always find me there and send me a message if they need anything.
All right. Thank you so much for coming today.
Thank you, Anna, for having me.
Try Supermetrics for free
Get full access to Supermetrics with a 14-day free trial.
No credit card required.