Mar 10, 2021

How to load marketing data into a MySQL database with Python

15-MINUTE READ | By David Magdolen

Data IntegrationData StorageSupermetrics API

[ Updated Feb 28, 2023 ]

In this guide, I’ll show you how to use the Supermetrics API with Python to query data from marketing and digital advertising platforms (like Facebook, LinkedIn, Twitter, and many more) in JSON format, how to convert that data into a DataFrame, and how to load it into a local MySQL database on your device.

But before we get to the fun part (aka the how-to), let’s quickly talk about when this approach may come in handy.

Why should you move marketing data into a database rather than directly into a BI tool like Tableau or Power BI?

While you could use the Supermetrics API to extract marketing data and load it directly into your BI tool of choice, that approach poses a few challenges:

  • BI tools aren’t intended for long-term data storage, which means that each refresh either generates suboptimal increments (Tableau) or overwrites the previously queried dataset (Power BI).
  • Combining data from “web connectors” with other data sources in the same report is either very labor-intensive (Power BI) or borderline impossible (Tableau).

To circumvent storage issues, you could, of course, query a sufficiently long date range directly from the API, e.g. the last three years. But the problem is, this might make your reports slow to refresh. Besides, you might even run into API or data source limitations. And anyway, wouldn’t it feel rather inefficient to retrieve the last three years of data on a daily basis for a number of ad platforms?

Instead, the more efficient way is to only query a few days worth of data each day and insert the data into a single location where you can easily retrieve it and use it in many different ways. After all, you don’t want your BI tool to dictate what you can and can’t do with your data.

And this is where the Supermetrics API, Python, and a MySQL database come in handy. Here’s how you can benefit from this powerful combo:

  1. Supermetrics API allows you to query data from multiple sources with an intuitive user interface. In other words, you don’t need to deal directly with each and every ad platform’s unique API.
  2. Python, on the other hand, allows you to query the data, transform it in any way you want, and store the desired output in the destination of your choice, e.g. a MySQL database, BigQuery, Amazon S3, etc.
  3. You can also easily automate and schedule your Python scripts. As soon as you have your initial setup done, you don’t need to manually run the script every time you want to refresh your data.
  4. Finally, you can store your data permanently in your MySQL database and then use it for reporting and BI, and even combine your ad data with data from other sources you might have in the same database.

And psst! If you don’t have access to Supermetrics API just yet, you can start your free 14-day trial today to try it out. No credit card required.

How to write a Python script to move your marketing data into a MySQL database

Let’s move on to the good stuff.

Now, let’s assume you’re using the Supermetrics API to extract the marketing data you want to replicate in your MySQL database. (For instructions on how to do that, check out the getting started guide for Supermetrics API.)

Step 1: Import packages

First, you’ll want to import the packages you need to execute the script:

# IMPORT PACKAGES
import requests
import urllib.parse as up
import json
from pandas.io.json import json_normalize
import pandas as pd
import numpy as np

Some of the packages, e.g. “urllib.parse”, are optional, but are meant to provide some help to make the URL for the Supermetrics API calls more readable.

Step 2: Load the API key into your script

If you’re using git or similar version control, you might want to retrieve the API key that’s inside the API call URL and store it in a separate text file. 

Then, you’ll need to run the following command to load the API key into your script:

# READ API KEY (COPIED FROM API URL): STORE TXT WITH API KEY FROM THE URL GENERATED IN THE QUERY MANAGER
# IN SAME PATH AS THIS SCRIPT. OTHERWISE YOU NEED TO ADD THE PATH TO THE BELOW TXT FILE
# (NOTE: ADD QUOTATION MARKS " AT START AND END OF API KEY IN THE TXT)
f = open("YOUR_SM_API_KEY.txt", "r")
api_key = f.read()

Step 3: Retrieve the API key

If you wonder how to retrieve the API key from the URL, here’s an example URL to showcase the API key (in this case: “api_XXXX”) you need to retrieve:

https://api.supermetrics.com/enterprise/v2/query/data/json?json=%7B%22ds_id%22%3A%22GA%22%2C%22ds_accounts%22%3A%5B%2220619257%22%5D%2C%22date_range_type%22%3A%22yesterday%22%2C%22fields%22%3A%5B%7B%22id%22%3A%22Date%22%7D%2C%7B%22id%22%3A%22Sessions%22%7D%5D%2C%22max_rows%22%3A1000%2C%22api_key%22%3A%22api_XXXX%22%7D

One tip at this stage: If you find the URL above hard to read, you can decode the URL inside Python using the “urllib.parse” package (see code snippet below). This will generate the following URL that I personally find easier to read and work with:

https://api.supermetrics.com/enterprise/v2/query/data/json?json={"ds_id":"GA","ds_accounts":["20619257"],"date_range_type":"yesterday","fields":[{"id":"Date"},{"id":"Sessions"}],"max_rows":1000,"api_key":"api_XXXX"}

Here’s the Python code to generate a readable URL:

# SUPERMETRICS API URL EXAMPLE (WITH API KEY REMOVED)
url = "https://api.supermetrics.com/enterprise/v2/query/data/json?json=%7B%22ds_id%22%3A%22GA%22%2C%22ds_accounts%22%3A%5B%2220619257%22%5D%2C%22date_range_type%22%3A%22yesterday%22%2C%22fields%22%3A%5B%7B%22id%22%3A%22Date%22%7D%2C%7B%22id%22%3A%22Sessions%22%7D%5D%2C%22max_rows%22%3A1000%2C%22api_key%22%3A%22api_XXXX%22%7D"
 
# DECODE URL: HELPFUL TO IMPROVE READABILITY OF QUERY PARAMETERS => ALLOWS FOR EASIER EDITING
# Decode the URL generated by the SM QUERY MANAGER: https://team.supermetrics.com/query-manager
print(up.unquote(url))
# PRINT ORIGINAL URL FOR COMPARISON
print(url)

Here’s the output of the command “print(up.unquote(url))” to generate the readable URL:

output command for generating a readable URL

At this stage, you can decide whether you’d like to work with the original URL generated by the Supermetrics API or the decoded version. 

For the remainder of this guide, I’ll use the decoded URL in Python as it allows me to modify the query directly in the Python script more easily. But the following steps will work just the same with the original URL.

First, the decoded URL can be split into multiple parts across multiple lines to improve editability:

# SPLIT UP DECODED URL AND MODIFY PARAMETERS, E.G. "date_range_type"
url_decoded = ('https://api.supermetrics.com/enterprise/v2/query/data/'
# CHOOSE OUTPUT FORMAT: JSON, KEYJSON, CSV, TSV, ETC: https://supermetrics.com/docs/product-api-output-formats/
# EXAMPLE: 'keyjson?json='
+'[INSERT_OUTPUT_FORMAT]?json='
# CHOOSE DATA SOURCE, e.g. "GA" or "FB" or "IG"
+'{"ds_id":"[INSERT_DATA_SOURCE]"'
# CHOOSE ACCOUNTS AND PARSE AS LIST, e.g. GA views: ["1234","5678"]
+',"ds_accounts":["INSERT_YOUR_ACCOUNTS"],'
# DEFINE DATE RANGE
# +'"start_date":"2020-06-01","end_date":"2020-06-30",'
+'"date_range_type":"last_week_mon_sun",'
# CHOOSE YOUR FIELDS, i.e. DIMENSIONS AND METRICS. SEE EXAMPLE FOR Google Analytics BELOW
+'"fields":[{"id":"Date"},{"id":"Sessions"},{"id":"Pageviews"},{"id":"Users"}],'
+'"max_rows":1000000,'
+'"api_key":'+api_key+'}')

Step 4: Fill in the parameters

You will need to fill in the following parameters:

  • [INSERT_OUTPUT_FORMAT]: Values can be “JSON”, “KEYJSON”, “CSV”, etc. You can find all output formats here.
  • [INSERT_DATA_SOURCE]: If you’d like to query data from Google Analytics, the value would be “GA.” If you’d like to query data from Facebook Ads, the value would be “FA.” You can find all data source IDs here.
  • [“INSERT_YOUR_ACCOUNTS”]: You can specify a list of accounts from which you want to retrieve data, e.g. {“ds_accounts”: [“567890”, “1358618”]}. You can find more information here.
  • Define a date range: Either use parameters “start_date” and “end_date” or “date_range_type.” Here you can find more info on dates and date ranges, e.g. you can use relative values, e.g. “start_date”:”2019-01-01″, “end_date”:”yesterday”. You can use this tool to test relative date expressions.
  • Select your fields: In the snippet above, the fields “Date,” “Sessions,” “Pageviews,” and “Users” are selected as examples (for querying data from Google Analytics). You can find the available fields per data source here.
  • Define the maximum number of rows: You can use a value of up to 1 million rows as in the snippet above.
  • “api_key”: This points to the variable above, which was created by reading the API key from a separate text file in the same path as the Python script. If you prefer not to separate the API key from the script, you can use the API key that’s in the URL (see the part of the URL above that says api_XXXX)

Please note that you don’t need to split the URL as shown above, you can also simply copy/paste the URL for the API call generated in Supermetrics API, either the “normal” URL or the “Short URL”, into the variable “url_decoded”. The remainder of the script should still work.

Step 5: Call the API

Now we’ll call the API with the following command and print the response:

# CALL SUPERMETRICS API
response = requests.get(url_decoded)
 
print(response)

If the response is “<Response [200]>”, it means the API call was successful and your data was loaded.

If you chose the output format “JSON” or “KEYJSON”, you can now convert the JSON data in the “response” variable to a DataFrame, with the following snippet:

# IF STATEMENT TO CHECK FORMAT
if 'data/json?' in url_decoded:
   # CONVERT JSON INTO PANDAS DATAFRAME
   data = json.loads(json.dumps(response.json()))
   df = pd.DataFrame.from_dict(data['data'])
   headers = df.iloc[0]
   df2 = pd.DataFrame(df.values[1:], columns=headers)
   df2.head()
   print(df2.head())
elif 'data/keyjson?' in url_decoded:
   # CONVERT JSON KEY-VALUE PAIR OUTPUT TO PD DF
   df2 = pd.read_json(json.dumps(response.json()))
   df2.head()
   print(df2.head())
else:
   # THROW EXCEPTION MESSAGE
   import sys
   sys.exit("DEFINE JSON OR KEYJSON AS OUTPUT FORMAT: https://supermetrics.com/docs/product-api-output-formats/")

Basically, what this snippet does is to check which format was chosen in variable “url_decoded” and form the DataFrame accordingly, as we offer two different JSON output formats, which need to be handled differently to create the DataFrame. 

Please note that if you choose an output format other than “JSON” or “KEYJSON” the script will throw an error with the message “DEFINE JSON OR KEYJSON AS OUTPUT FORMAT”.

Step 6: Load the data into MySQL

Once the DataFrame is created, simply load the data into a MySQL database table:

# IF YOU PLAN ON LOADING THE DATAFRAME INTO A DATA WAREHOUSE,
# YOU WILL NEED TO ADD THE REQUIRED CREDENTIALS TO WRITE THE DATA INTO YOUR DWH
 
# MySQL EXAMPLE CODE CAN BE FOUND BELOW
# LOAD DATA TO MYSQL DB
from sqlalchemy import create_engine
import pymysql
 
# CONNECTION OPTION 1: READ CONNECTION DETAILS FROM TXT FILE (STORED IN SAME PATH AS THIS SCRIPT)
# YOU NEED TO CREATE THE TXT FILE YOURSELF. IT NEEDS TO CONTAIN THE CREDENTIALS THAT ARE INSERTED IN OPTION 2 BELOW
f = open("[INSERT_FILE_NAME].txt", "r")
sql_conn = f.read()
conn = create_engine(sql_conn)
 
# CONNECTION OPTION 2: INSIDE THE CODE
conn = create_engine("mysql+pymysql://[INSERT_USER]:[INSERT_PASSWORD]@[INSERT_HOST]/[INSERT_DATABASE_NAME]")
#Example: conn = create_engine("mysql+pymysql://root:PassWord123!@localhost/my_database")

As you can see, two connection options are described in the snippet above:

  • Option 1 follows a similar logic as before regarding the API key. Thus, if you’re using version control, don’t expose any kind of sensitive information such as names, password, etc., inside the Python code.
  • Option 2 just shows you how you can directly parse your MySQL credentials inside the code.

Once you have established the connection to MySQL, all that’s left to do is to load the data from the DataFrame into a MySQL DB table. 

At this stage you need to specify the [TABLE_NAME]:

# WRITE DATAFRAME INTO DB TABLE
df2.to_sql(name='[TABLE_NAME]', con=conn, if_exists = 'replace', index=False)

After that, all that’s left for you to do is to run a SQL query to double-check if the data was actually loaded to MySQL and print the output. To do that, simply replace [INSERT_DATABASE_NAME] and [TABLE_NAME] accordingly:

# SQL QUERY TO VALIDATE DATA LOAD
frame = pd.read_sql("select * from [INSERT_DATABASE_NAME].[TABLE_NAME]", con=conn)
print(frame.head())

Here’s the entire script for you to copy and use:

# IMPORT PACKAGES
import requests
import urllib.parse as up
import json
from pandas.io.json import json_normalize
import pandas as pd
import numpy as np
 
# READ API KEY (COPIED FROM API URL): STORE TXT WITH API KEY FROM THE URL GENERATED IN THE QUERY MANAGER
# IN SAME PATH AS THIS SCRIPT. OTHERWISE YOU NEED TO ADD THE PATH TO THE BELOW TXT FILE
# (NOTE: ADD QUOTATION MARKS " AT START AND END OF API KEY IN THE TXT)
f = open("YOUR_SM_API_KEY.txt", "r")
api_key = f.read()
 
# SUPERMETRICS API URL EXAMPLE (WITH API KEY REMOVED)
url = "https://api.supermetrics.com/enterprise/v2/query/data/json?json=%7B%22ds_id%22%3A%22GA%22%2C%22ds_accounts%22%3A%5B%2220619257%22%5D%2C%22date_range_type%22%3A%22yesterday%22%2C%22fields%22%3A%5B%7B%22id%22%3A%22Date%22%7D%2C%7B%22id%22%3A%22Sessions%22%7D%5D%2C%22max_rows%22%3A1000%2C%22api_key%22%3A%22api_XXXX%22%7D"
 
# DECODE URL: HELPFUL TO IMPROVE READABILITY OF QUERY PARAMETERS => ALLOWS FOR EASIER EDITING
# ALTERNATIVE: GO TO https://www.urldecoder.org/ and decode the URL generated by the SM QUERY MANAGER: https://team.supermetrics.com/query-manager
print(up.unquote(url))
# PRINT ORIGINAL URL FOR COMPARISON
print(url)
 
# SPLIT UP DECODED URL AND MODIFY PARAMETERS, E.G. "date_range_type"
url_decoded = ('https://api.supermetrics.com/enterprise/v2/query/data/'
# CHOOSE OUTPUT FORMAT: JSON, KEYJSON, CSV, TSV, ETC: https://supermetrics.com/docs/product-api-output-formats/
# EXAMPLE: 'keyjson?json='
+'[INSERT_OUTPUT_FORMAT]?json='
# CHOOSE DATA SOURCE, e.g. "GA" or "FB" or "IG"
+'{"ds_id":"INSERT_DATA_SOURCE"'
# CHOOSE ACCOUNTS AND PARSE AS LIST, e.g. GA views: ["1234","5678"]
+',"ds_accounts":["INSERT_YOUR_ACCOUNTS"],'
# DEFINE DATE RANGE
# +'"start_date":"2020-06-01","end_date":"2020-06-30",'
+'"date_range_type":"last_week_mon_sun",'
# CHOOSE YOUR FIELDS, i.e. DIMENSIONS AND METRICS. SEE EXAMPLE FOR Google Analytics BELOW
+'"fields":[{"id":"Date"},{"id":"Sessions"},{"id":"Pageviews"},{"id":"Users"}],'
+'"max_rows":1000000,'
+'"api_key":'+api_key+'}')
 
# CALL SUPERMETRICS API
response = requests.get(url_decoded)
 
print(response)
 
# IF STATEMENT TO CHECK FORMAT
if 'data/json?' in url_decoded:
   # CONVERT JSON INTO PANDAS DATAFRAME
   data = json.loads(json.dumps(response.json()))
   df = pd.DataFrame.from_dict(data['data'])
   headers = df.iloc[0]
   df2 = pd.DataFrame(df.values[1:], columns=headers)
   df2.head()
   print(df2.head())
elif 'data/keyjson?' in url_decoded:
   # CONVERT JSON KEY-VALUE PAIR OUTPUT TO PD DF
   df2 = pd.read_json(json.dumps(response.json()))
   df2.head()
   print(df2.head())
else:
   # THROW EXCEPTION MESSAGE
   import sys
   sys.exit("DEFINE JSON OR KEYJSON AS OUTPUT FORMAT: https://supermetrics.com/docs/product-api-output-formats/")
 
 
# IF YOU PLAN ON LOADING THE DATAFRAME INTO A DATA WAREHOUSE,
# YOU WILL NEED TO ADD THE REQUIRED CREDENTIALS TO WRITE THE DATA INTO YOUR DWH
 
# MySQL EXAMPLE CODE CAN BE FOUND BELOW
# LOAD DATA TO MYSQL DB
from sqlalchemy import create_engine
import pymysql
 
# CONNECTION OPTION 1: READ CONNECTION DETAILS FROM TXT FILE (STORED IN SAME PATH AS THIS SCRIPT)
# YOU NEED TO CREATE THE TXT FILE YOURSELF. IT NEEDS TO CONTAIN THE CREDENTIALS THAT ARE INSERTED IN OPTION 2 BELOW
f = open("[INSERT_FILE_NAME].txt", "r")
sql_conn = f.read()
conn = create_engine(sql_conn)
 
# CONNECTION OPTION 2: INSIDE THE CODE
conn = create_engine("mysql+pymysql://[INSERT_USER]:[INSERT_PASSWORD]@[INSERT_HOST]/[INSERT_DATABASE_NAME]")
#Example: conn = create_engine("mysql+pymysql://root:PassWord123!@localhost/my_database")
 
# WRITE DATAFRAME INTO DB TABLE
df2.to_sql(name='[TABLE_NAME]', con=conn, if_exists = 'replace', index=False)
 
# SQL QUERY TO VALIDATE DATA LOAD
frame = pd.read_sql("select * from [INSERT_DATABASE_NAME].[TABLE_NAME]", con=conn)
print(frame.head())

Over to you

As you can see, using the Supermertics API with Python is fairly straightforward. With the script we just walked through, you can create multiple queries for your data sources, and store the data in MySQL — or any other database.

This way, you can keep your API queries small and fast, while being able to build your database, and using the API for any ad hoc reporting you might need to do. 

By storing data in a MySQL database or any data warehouse or data lake, you can cater to many different needs in your organization, such as:

  • Connecting the stored data to a BI tool for high-performance reports
  • Connecting the stored data to an ETL tool like Talend to perform transformations e.g. combining website, CRM, and financial data
  • Enabling advanced analytics, e.g. applying machine learning models on your ad data to predict the effectiveness of your advertising campaigns, or building a marketing mix model on top of historical data from your ad platforms.

You can also automate the whole process we just walked through and have the script run on a schedule. For example, you can use CRON jobs locally on MacOS or Windows, or schedule the jobs in a cloud services platform such as GCP, AWS or Azure if you’re planning to load the data to a cloud data warehouse or data lake.

Additionally, you could modify the script for more advanced data loading, for example, by overwriting data in your database for the last X days for which you are retrieving the latest data in the API. This might be useful if the historical data in your data sources is prone to change, such as cost or conversion data in Facebook Ads.

If this seems like a lot to handle and manage, you’re right. Because it is. 

Scheduling, retries, upkeep, and maintenance take up more time than you might think. If you’d rather have Supermetrics take care of that work for you, check out our data warehouse and cloud storage products, which come with built-in scheduling, automatic retries, and sophisticated error handling and notifications.

If you’re interested in seeing any of those features in action, feel free to start your free 14-day trial of any Supermetrics product.

Stay in the loop with our newsletter

Be the first to hear about product updates and marketing data tips