How to overcome the limitations of data blending in Google Data Studio
If you’re a Google Data Studio advanced user, chances are you’ve already used the data blending feature.
It’s a great feature that allows you to enrich and unlock the potential of your data quickly. Especially if you don’t have the time to pull data from multiple sources and combine them in spreadsheets.
However, data blending also comes with some limitations that could slow your report down at best and affect your data accuracy at worst.
In this article, we teamed up with two experts from our team, Bartosz Schneider and Evan Kaeding, to discuss the good and the bad of data blending in Data Studio, and how you can avoid all the headaches.
To make sure we’re on the same page, let’s look at the basics first.
The basics of data joining
What is data joining?
Let’s say you’re managing an online store. You’re running paid ads across popular social media platforms. You want to know what channels bring in the most revenue. To do this, you need to combine your paid social data with data from Shopify.
Or you want to see how your ecommerce funnel looks. For example, what pages customers visited and what products they added to their shopping cart before purchasing. In this case, you can connect Google Analytics data with Shopify data.
That’s a rough description of data joining. Whenever you join data from multiple data sources into a single dataset, you’re performing data joining. Data joining works when your joined data sources share at least one common dimension, or a ‘join key’.
Typically, business accumulates data from different sources. Without combining all the data, you’re missing the whole picture of your performance. Data joining helps you:
- Uncover valuable insights from separate data sources.
- Discover meaningful relationships among data sets.
- Make better data-informed decisions.
Different types of joins
- Inner join means combining data from both sources — matching it where the join keys are the same and dropping the data that doesn’t match.
- Outer join means taking all the data from both sources — matching it where the join keys are the same. And finally, in the joined table, padding the non-matching columns with empty values.
- Left join means taking all the data from the left table and the matching data from the right table where the join keys are the same.
- Right join, similarly, means taking all the data from the right table and the matching data from the left table where the join keys are the same.
So, does data joining have anything to do with data blending?
Spoiler alert: It does. Data blending in Data Studio is a left outer join.
Data blending in Google Data Studio
By default, when you create a chart in Google Data Studio, you’re pulling data from a single data source. However, you can connect multiple data sources and visualize them together in a chart or a table with data blending.
Data blending is a left outer join
To blend data, you need to choose:
- A primary source: the first (or leftmost) data source that you add to the ‘Blend data panel.’
- At least one secondary source: any data sources added to the right of the primary source
- A join key
Since data blending in Data Studio is a left outer join, the blended data will include all data from the primary data source and matching data from secondary sources that share the same join key.
Let’s take a look at the example below.
Here, Google Ads has conversion data from five countries: the United States, Germany, Finland, France, and Australia.
Facebook Ads has conversion data from seven countries: the United States, Germany, France, Ireland, India, Singapore, and Spain.
If you pick Google Ads as your primary data source, the blended result will show conversions from the United States, Germany, Finland, France, and Australia only. Since Ireland, India, Singapore, and Spain aren’t included in Google Ads (your primary source), the data will be excluded from the blended table.
You can also see Finland, which has data in the Google Ads table and not in the Facebook Ads table, will stay in the blended table. However, its Facebook Ads conversion value will be ‘null’.
Alternatively, if Facebook Ads is your primary source, your results will be conversion data from the United States, Germany, France, Ireland, India, Singapore, and Spain.
In Google Data Studio, the first data source you bring into the ‘Blend data’ view is your primary source. Changing the order of the data source is pretty straightforward. All you have to do is drag and drop the data source to the position you want.
How to create a blended data source
There are two approaches you can use to blend your data.
The first approach is quite quick and easy. If you have two tables with a common dimension, you can select both tables, right-click, and choose ‘Blend data’. Data Studio will quickly combine two tables into one. Then, automatically generate a blended data view based on the fields provided in the source tables.
The second approach requires more steps, but it gives you a little more control of your data.
To start, click on ‘Resource’ → ‘Manage blended data’.
Next, open your ‘Blend data’ view by clicking on ‘Add a data view’.
Then, add the data sources you want to blend. Remember, the first data source you add to the view will become your primary source.
From here, you can choose the join keys, dimensions, and metrics you want to blend.
Tip: Give your blended data source a name so it’s easy to distinguish from other sources later on.
After you’re happy with the setting, click ‘Save’.
Start building charts with your blended data source by adding it to the ‘Data source’ field.
The limitations of data blending in Google Data Studio
Traditionally, when you join data in a spreadsheet, you can use different formulas to tell the computer precisely what data you want to retrieve. This lets you see what’s happening with your data in each step. If an error occurs, you can always go back to the raw data and trace the problem.
However, with Data Studio, the join happens under the hood, so if the blended data has errors, you wouldn’t know what caused the problem.
Supports only left outer join
As mentioned above, data joining in Google Data Studio is always a left outer join. This can be somewhat limiting if you’re used to using different types of joins to enrich your data.
You have to be extra careful when blending data, especially the order in which you join them. One problem with the primary sources can harm the accuracy of your blended results.
You probably notice Google Data Studio can take its sweet time loading your reports. Things get worse when you bring data blending to the picture.
Whenever you create a blended data source, Google has to go through different APIs to retrieve data. And that process requires quite a bit of computational power.
The more blended data sources you add, the slower your dashboard will be.
A limited number of blended sources
Another frustrating limitation is that you can blend a maximum of five data sources. While this number sounds like a lot, it isn’t. Occasionally, in many advanced and in-depth reports, you need to blend data from more than five sources. You’ll easily cross the limit if you want to create a very detailed table with many columns.
So, should you just save yourself from all the trouble and avoid data blending?
In fairness, Google Data Studio does a splendid job with a simple and light blending. So if you want to blend one to two data sources with a simple join key like date, you can stick with Data Studio.
On the other hand, if you’re looking to gain more control over your data and do more advanced blending, Google Sheets is the way to go.
Data blending in Google Sheets
When data blending in Google Data Studio becomes a bit of a hassle, you can blend your data in Google Sheets and bring it back together in Data Studio for reporting.
This approach gives you more flexibility with your data. You can take advantage of the Google Sheets formulas to enrich your data. Additionally, it’s much faster to load blended data from a Google Sheet than from several sources.
In addition, you can use Supermetrics to pull data into Google Sheets automatically. You’ll have more time to do what you’re good at — analyzing the data and getting meaningful insights.
Let’s take a look at some tips for joining data in Google Sheets.
Manage your data in Google Sheets
It can get messy quickly when you bring data from different sources to Google Sheets for blending. A good way to stay organized with your data is to divide them into separate tabs.
The ‘raw data’ tab is where you store all your unformatted raw data from your data sources. In this example report, we use Supermetrics to pull data from Facebook, Microsoft, and Google Ads into three separate tabs.
The ‘blended data’ tab is where the magic happens. You can match your data together and perform some calculations to get more insights from your data.
The ‘reporting data’ tab is where you put the last piece of the puzzle. When you’re done enriching and transforming the data, you can present them in a separate tab where it’s easier to monitor.
Additionally, you can connect the ‘reporting data’ tab to Google Data Studio to bring the final results to your dashboard. You can find the Google Sheets connector in the connector gallery.
Next, let’s take a look at some functions you need to know when blending data in Google Sheets.
Three useful functions for joining data in Google Sheets
VLOOKUP is one of the most used functions for data joining. It lets you search for a value in one table and use it in another table.
The syntax for VLOOKUP is:
VLOOKUP (search_key, range, index, [is_sort])
- search_key: the value you want to look up.
- range: the range that contains the value you want to look up. Note that VLOOKUP will search from the first column in your range.
- index: the column number (within your chosen range) that contains the returning value.
- is_sort: this parameter is optional. Here, you can specify if you want to receive an exact match (FALSE) or the nearest match value (TRUE). In the case of data joining, you’ll want to set it to an exact match.
You’re telling Google Sheets what value you want to search for, where you want to search for it, the column number in the range that has the value to return, and finally, if you want to receive an exact match (FALSE) or the nearest match (TRUE).
Let’s say you have two tables:
- A marketing table with data about date, source, medium, campaign, impressions, cost, and clicks
- A conversion table with data about date, source, medium, transactions, and revenue.
According to Bartosz, there are two steps to connecting the puzzles.
First, you need to create composite keys for two tables using the TEXTJOIN function. Each composite key can be used to uniquely identify each row of the table. Without the composite keys, you’re likely to run into one-to-many relationships. Additionally, you can use them as join keys for VLOOKUP.
Your composite keys will include campaigns’ date, source, medium, and campaign (which means campaign name in this case). It’ll look something like this.
Next, use VLOOKUP to join two tables. For example, the formula for combining transaction data with the marketing table is:
Tip: Using absolute reference makes it easier for Google to search for the value and for you to drag the formula across your spreadsheet.
Simply put, Google searches the first column for the composite keys and returns the corresponding transactions.
IF + REGEXPMATCH
The first step is to remap the campaign name to new values with an IF function (columns F and N). That new cleaned-up name is then used as a join-key to generate the metrics table on the right side of the sheet, where metrics from two sources are aggregated together where the previously remapped campaign name matches.
The function we’re looking at next is a nested function — IF + REGEXPMATCH, where
Bartosz finds that this function comes in handy when he needs to remap campaign names from one or many different data sources.
Let’s take a look at the table below. As you can see, it has different naming conventions, for example, ‘Google Data Studio’ and ‘googledatastudio’ or ‘Enterprise’ and ‘enterprise’.
You can put all your Google Data Studio campaigns in one basket and Enterprise campaigns in one basket using this formula:
REGEXMATCH(A7,"Data Studio|datastudio"),"Data Studio Campaigns",
In simpler terms, your function searches in column A7 for ‘Data Studio’ or ‘datastudio’ and returns ‘Data Studio Campaigns’. If there is no such value, search for ‘Enterprise’ or ‘enterprise’ and returns ‘Enterprise campaigns’.
You can remap campaign names from different sources and use them as your join key.
In Google Sheets, you can use different aggregation functions to summarize your data — calculating the sum, average, or counting the number of data points. However, in reality, you may not want to aggregate all the data you have. In that case, you can use conditional aggregation to specify which data you want to aggregate.
Conditional aggregation is a function that tells Google to perform data aggregation over a set of data when it meets certain criteria. We’ll take a look at some common conditional aggregation functions.
The SUMIF function tells Google to calculate the sum of the data that meets a predefined condition in a range. The syntax for the SUMIF function is:
SUMIF (range, criterion, [sum_range])
- range: you want to specify the data range you want to apply the condition to.
- criterion: you should specify the condition that defines which cells will be summed.
- sum_range: you should specify the range to be summed if different from ‘range’. This is optional.
Take the table below as an example. Let’s say you want to calculate the impressions from the US. You can do so by using SUMIF
(B3:J12, “US”, D3:D12).
The AVERAGEIF function returns the average value of data that meets certain criteria in a range. The syntax for the AVERAGEIF function is:
AVERAGEIF (criteria_range, criterion, [average_range])
- criteria_range: you should choose the data range you want to apply the condition to.
- criterion: specify the condition that defines which cells will be averaged.
- average_range: you should specify the range to be averaged if different from ‘criteria_range’. This is optional.
For example, if you want to calculate the average cost from the US, you can use
AVERAGEIF(B3:J12, “US”, E3:E12).
Similarly, the COUNTIF function performs a conditional count over your data. The syntax for COUNTIF is:
COUNTIF (range, criterion)
- range: the range you want to count.
- criterion: the condition you want to apply.
For example, you want to count how many countries have CPC greater than 1. You can do so by using
Different ways to use data blending
There are many ways you can put data blending into practice. We’ll take a look at some examples in this section. Additionally, you’ll also find some ready-made templates with blended data that you can use right away.
Note that connecting the data sources to the templates will automatically start your 14-day free Supermetrics trial.
Compare your Facebook Ads vs. Google Ads performance
Google Ads and Facebook Ads are among the most popular advertising platforms. Even though it isn’t exactly an exact comparison, combining Facebook Ads data with Google Ads data can tell you which types of campaigns work best on which channels.
For example, in the Google Ads vs. Facebook Ads dashboard below, you can easily see:
- A side-by-side performance of each channel.
- The split in your cost, impressions, clicks, and conversions between two channels.
- Your best-performing campaigns by channels.
Organic social media
Managing your company’s social media accounts isn’t a walk in the park. For one thing, you have to manage at least three different accounts, all of which have different algorithms and requirements for content.
Blending data from social media platforms helps you manage your performance easily and stay on top of your social game.
For example, in the dashboard below, we combine data from four popular social media channels — Facebook, Instagram, Twitter, and LinkedIn. This dashboard is great for:
- Monitoring your performance across channels.
- Comparing the performance of different channels.
- Diving deeper into each channel’s performance.
Paid channel mix
You probably have performance marketing data in paid channel platforms and sessions and conversion data in Google Analytics. Blending paid ad data with web analytics data helps you understand which campaigns and channels drive high-quality traffic.
For example, in this paid channel mix dashboard, we blend paid data from Google, LinkedIn, Twitter, Facebook, Microsoft with Google Analytics data. With it, you’ll see:
- A general overview of your paid campaigns.
- A comparison of your performance with the previous period.
- A table showing the key metrics of your best-performing channels.
Organic search vs. paid search analytics
It’s not about organic search versus paid search. To grow your business, you need both. For example, performance marketers can look at the high-ranking search phrase and decide if it makes sense to bid on those keywords.
Similarly, content marketers can also use paid search data to fuel their content strategy.
In this organic search vs. paid search analytics template, our friends at OIKIO agency combine data from Google Ads and Google Search Console. It helps you drive conversions in both channels. With it, you can determine:
- The Google Ads phrase performance that you haven’t included in your organic search.
- The organic search phrase performance that you haven’t bid on with Google Ads yet.
- The PPC/SEO ratio for a particular search phrase.
Organic traffic and keyword analysis
It was pretty frustrating when Google removed search phrase data from Google Analytics. But worry not because there’s a workaround. By combining data from Google Search Console with Google Analytics data, you’ll figure out which organic keywords bring in traffic to your website.
Let’s take a look at this organic traffic and keyword analysis template by our friends at OIKIO. With this template, you can:
- Figure out which search phrases bring in the most traffic.
- Compare the performance of different landing pages.
Over to you
Congrats, you’ve made it to the end of this article. Now pat yourself on the back.
After all, data blending helps you make the most of your data and uncover more meaningful insights.
If you’re working with a small and manageable amount of data, you can totally take advantage of the data blending feature in Google Data Studio.
On the other hand, if you’re handling a much bigger dataset and want to have better control over your data, Google Sheets is a better solution.
And remember, whenever you need help with moving your data to Google Sheets, you can start your 14-day Supermetrics free trial.
Turn your marketing data into opportunity
We streamline your marketing data so you can focus on the insights.