8-MINUTE READ · By Dino Mahi
The proliferation of marketing channels in the new digital age has led to an explosion of customer and marketing related data. To make timely sense of this information, marketers require new technologies that can condense the time to insights without blowing away operational budgets or requiring marketers to undertake an extensive skills overhaul. Data pipelines play nicely into this role in that they allow marketers to quickly assemble data from multiple sources, and then analyze it at scale using next-generation self-service business intelligence tools. In most of the cases, all this can be done with minimal IT involvement, and very low capital investment thereby making this technology an ideal fit for data-savvy marketing teams.
In this article, we provide a brief overview of what data pipelines are, and how they enable faster, more efficient data analysis cycles with minimal IT involvement.
Data pipelines vs. traditional data integration approaches
Consider the case of an eCommerce company running multiple campaigns across the Facebook platform. The marketing team realizes the shortcomings of constantly downloading campaign performance data in order to do exploratory data analysis and would like to automate the entire process. Before data pipelines, doing this would involve the following steps
- Provision a development machine and install a specialized data integration software (e.g. Talend, Informatica, Pentaho, etc.). Ensure that all dependencies for installation are met.
- Provision the target database software/hardware (e.g. Mysql database running on a dedicated machine). Setup the test and production databases.
- Use the ETL tool in Step 1 to build the integration to pull data using Facebook API, transform it as per business rules and then push into the target database.
- Manually implement all non-functional capabilities such as reporting errors when the script fails, notifications on successful completion, logging and audit for troubleshooting, version control and access control.
- When the script is ready and tested in a development environment, physically deploy it on a production machine from where it can pull data from the source and move it to target.
- Plug-in a BI tool like Tableau, QlikView or some other self-service BI software to slice and dice data.
This approach is riddled with inefficiencies and in any case, requires extensive IT involvement. Imagine running a campaign that plateaus out in performance within 2 to 3 days and any optimizations must be done within this period. Even the most well-coordinated efforts are unlikely to be enough to convert data into meaningful insights using the approach above.
With a data pipeline, however, the possibilities of in-flight campaign optimization are very real.
Using a data pipeline, an analyst can get the same end result as above but without getting bogged down by complexities around software/hardware installation, data mapping, scheduling, and other non-functional capabilities. Being cloud-hosted, it provides the ability to scale the hardware (storage/processing power) requirements very inexpensively and automatically. Secondly, it provides in-built non-functional capabilities (e.g. Access control, version management, auto-scaling, error handling and notifications, failover support, etc.) which not only results in huge savings in time and cost but also end up significantly reducing IT involvement.
These differentiators provide a very compelling business case for data engineers to contemplate deploying data pipelines for marketing data analysis. The next section provides a more in-depth discussion of some of the specific benefits.
5 Reasons to consider using data pipelines for marketing data analysis
Lower capital costs: Data pipelines are typically cloud-based and offer pay-per-use pricing. This means almost no upfront capital investments in hardware, networking, license fees, and vendor maintenance contracts.
Faster deployment times: As mentioned in the previous section, one of the key USPs of a data pipeline is faster deployment. There is no software to install, no dependency libraries to be set up, and no need for dedicated hardware to run the integration code. A vast majority of non-functional capabilities are pre-built into the tool. Amazon Data pipeline, for example, provides drag-and-drop integration with its various services for things like access control, version management, alerts, failover, and auto-scaling features. This frees up developers to focus on implementing the business logic for transforming data rather than non-functional issues.
Shorter learning curves: Before data pipelines, developers would need to learn how to use dedicated ETL tools such as Talend, Informatica, and Pentaho etc. This usually has steep learning curves even for skilled data engineers, let alone marketing analysts. With a data pipeline it is entirely possible to code the majority of the transformations in plain SQL-something that is familiar to many Analysts. Supermetrics for BigQuery, for example, offers completely plug-and-play integration with Google BigQuery whereby data engineers do not have to write any code to pipe data from source systems into BigQuery. Other capabilities such as fetching data from the source system, setting up execution schedules, error handling, etc. can be implemented using basic configuration settings rather than writing code.
Data available in its original form: There are two competing approaches to preparing data analysis for analysis:
- ETL (Extract, Transform and Load)-Use a dedicated machine to transform the original data for specific analysis requirements and store aggregated data in a general-purpose database (e.g. Oracle, MySql). When hardware resources (storage, memory) are scant, this is the only option available to Engineers in order to keep costs low and also to be able to process ad-hoc queries within reasonable response times.
- ELT (Extract, Load and Transform)-Skip most of the transformation part and load the data as is into specially designed databases that do not suffer from capacity constraints. Moreover, these databases provide connectivity to next-generation business intelligence tools such as Tableau, which can pull in and process large data volumes with sub-second query response times.
A data pipeline leverages the second approach and allows marketers access to raw data in its most granular form. In exploratory data analysis or data science, where marketers often do not know what they are looking for, this capability is huge.
Plug-and-play integration with leading tools: Data pipelines benefit from templated integration solutions whereby ISVs and Vendors can create solution templates to quickly load data from source to target. For example, when using the Supermetrics for BigQuery pipeline which runs on the Google Cloud Platform, clients can automate the loading of their Facebook Ad performance data into Google BigQuery using simple configuration steps. There is no code to write and clients benefit from an import of their entire historical data in addition to periodic refreshes of new rows. Storage capacity or query response times are no issues given that the pipeline is hosted on Google Cloud Platform and therefore provides nearly unlimited scalability.
The capabilities required for effective data analysis in marketing have changed drastically in the new digital age that has given rise to complex, multi-touch, multi-channel user journeys. Thousands of dollars of marketing investments could potentially turn into sunk costs without the tools to analyze and action all this data in time. Data pipelines offer marketing analysts new ways to take much greater control of how cross-channel data is assembled, processed and stored. The cost, operational and technical advantages of leveraging this new architectural paradigm are simply too compelling to ignore.
About Dino Mahi
Dino Mahi is a Marketing Technology Specialist with over 15 years experience working with major brands on creating high level marketing technology landscapes and data strategies. A passionate Enterprise Architect with a background in marketing data analysis, Dino currently focuses on identifying innovate ways to blend online and offline data for building advanced customer and marketing intelligence solutions.