The data ecosystem has changed drastically over the last six years and we’ve witnessed the rise and fall of several different technologies. However, there’s one constant that’s remained the same, the cloud data warehouse.
Thanks to modern data platforms like Snowflake and Google BigQuery, consolidating your data into a centralized platform is easier than ever. The problem is that your data warehouse is only accessible to your technical users who know how to write SQL, so the platform you purchased to eliminate data silos has inevitably become a data silo itself. This is exactly why Reverse ETL is so important.
In this post you'll learn:
Reverse ETL is the process of copying data from your central data warehouse to your operational systems of record, including but not limited to SaaS tools used for growth, marketing, sales, and support.
The definition above is packed so let’s briefly take a step back. Why would you want to move data out of your warehouse after spending so many resources to get it in there in the first place? The answer is relatively straightforward – all of your core business definitions live in the warehouse.
Think about your core metrics like items in cart, subscription type, workspaces created, playlists, pages viewed, purchase date, lifetime value, and annual recurring revenue. This list could go on and on, but the key takeaway is that all of your important data models live in the warehouse.
Data Being Synced to Salesforce
Sure, you could easily create reports and visualizations using this data in your BI tool, but these insights are much more powerful if they drive everyday operations for your business teams across sales, marketing, finance, and support – in tools like Hubspot, Salesforce, Netsuite, SAP, Workday, Gainsight, or Zendesk. Reverse ETL, therefore, has emerged as a key part of the modern data stack to close the loop between analysis and activation.
The traditional ETL process has been around since the 1970s and data pipelines have remained largely unchanged. For those unfamiliar, ETL stands for extract, transform, and load. It’s the process of automatically extracting, transforming, and loading data into your desired destination (e.g. a data warehouse or data lake).
The ETL Process
Fully managed SaaS platforms have made this process even easier by offering pre-built connectors to extract and load your data. Dedicated transformation tools like dbt have given rise to a new process known as ELT where data is transformed after it is loaded into the warehouse. The question is, why can’t you just use conventional ETL or ELT to move data out of your warehouse?
Most people don’t know this, but Reverse ETL actually requires you to write reverse SQL, so moving data out of the warehouse and back into your operational systems and SaaS tools can be very complex.
The Reverse ETL Process
An ELT tool like Fivetran or Stitch is largely used to power dashboards, whereas Reverse ETL is mainly used to power workflows, marketing campaigns, and general business processes where time sensitivity is extremely important.
Since ELT is mainly focused on merging data based on “updated_at” fields, time is the only parameter to account for. However, since Reverse ETL queries directly against your warehouse, there are no “updated_at” fields and that means you have to diff the changes between sync runs to ensure you are only syncing fresh data.
If you make a mistake with conventional ELT, you can simply delete your table and re-ingest your data. This is not the case with Reverse ETL, because a platform like Hubspot does not have a time machine feature that lets you roll back to a previous state.
Ultimately, there are a ton of technical differences between ELT and Reverse ETL, and if you’re interested in getting into the weeds you can read our post here.
For the most part, every modern data stack has several core components that are the same across analytics teams, and usually, it looks something like this.
The Modern Data Stack 2.0
While the technologies for each of these layers might differ from company to company, the overall components largely remain the same across industries. However, the modern data stack has left a gaping hole that companies and other technologies have tried to fill over the years but have failed in doing so. Reverse ETL fits nicely into this stack as the solution to the “last-mile problem” to help you activate your data.
Reverse ETL is not a new concept by any means. Companies have been trying to activate their data for years. In the past, moving data out of the warehouse required you to either manually download/upload CSV files or build custom integrations and pipelines for every single one of your SaaS applications and end systems. Neither option is scalable.
Point-to-point tools or iPaaS platforms like Zapier, Tray, and Workato can be an attractive option for tackling Reverse ETL use cases because they let you send data from one platform to another without code. However, these types of platforms don’t scale or integrate well with your current data stack. If you have just four applications, you’ll quickly find yourself with 16 different pipelines (e.g. 4x4 = 16).
All of the platforms in this space work in a similar manner, they perform actions based on a trigger that you define (e.g. sending a marketing email in Hubspot when a lead is created in Salesforce). Inevitably this means you have to build custom workflows for every integration in your stack and this can become an absolute nightmare as you weave in various dependencies, triggers, if/then clauses, and fail-safes (just look at this example of a workflow in Tray.)
Reverse ETL creates a hub and spoke approach, where the warehouse is your central source of truth, completely eliminating the complex web of pipelines and workflows that come with conventional point-to-point solutions.
Point-to-Point vs. Hub & Spoke
In the world of customer data, you’re probably familiar with customer data platforms (CDPs). Platforms like Segment made a name for themselves in the marketing world by creating a single platform where you can house all of your customer data and activate it in your various tools across all of your operational teams.
The main advantage of these types of platforms is that they offer built-in event collection for your behavioral data, identity resolution, and audience management tools. CDPs have several flaws though. Firstly, you don’t actually own the data. CDPs force you to store data outside of your own cloud infrastructure. In many cases, all of this data already lives in your warehouse, so you inevitably end up paying for storage twice. In addition to this, CDPs are extremely rigid. They force you to use proprietary data models that only represent users and accounts. This is not helpful if you have unique objects like workspaces, subscriptions, and playlists.
At their core, CDPs are rigid black boxes that are not easily configurable in the context of a modern data stack. To truly own your data from end to end, you need to leverage Reverse ETL.
While on the surface it can seem like Reverse ETL is simply just focused on syncing data, there are three primary use cases for Reverse ETL: data activation, data automation, and data infrastructure.
Data Activation is the method of unlocking the knowledge sorted within your data warehouse, and making it actionable by your business users in the end tools that they use every day. In doing so, Data Activation helps bring data people toward the center of the business, directly tying their work to business outcomes.
It’s no longer enough to simply understand past behavior. You need to be able to predict and identify common patterns and attributes in your customers to take action immediately.
Since Reverse ETL is focused on syncing real-time customer data into your business applications, you can rest assured that your teams have a holistic view of your customer data and the right data to make the right decision.
Every company wants to be more data-driven. Yet the most daunting question for every lead is “how”. Deriving insights from data is part one, but the last mile of “analytics enablement” (e.g. translating those insights into action) is a different ball game.
Analytics enablement is typically seen as a people problem, and this is true to some extent – but how data is presented can play an equally large role.
Imagine you’re a B2B company trying to figure out which accounts your sales reps should focus on. In most scenarios, your data analyst would use SQL to derive characteristics of high-value leads and present them to you in a BI report. The problem is that this data isn’t actionable and to your analyst’s dismay, the report is rarely even opened.
A traditional analytics enablement outlook to this problem would be to train sales reps on how to leverage BI reports as part of their day-to-day workflow. In practice, this is tough because data enablement is the reason most data projects fail.
Instead of training your sales reps to use BI reports, what if you could just empower your analysts to just feed lead scores from your data warehouse into a custom field in Salesforce? This same thought process can be applied to basically any business team.
Data Activation is flashy, but in reality, companies are filled with far less glamorous problems when it comes to data. In any sizable organization, there are tons of manual requests for data floating around and with any manual process, there’s always the question of how to automate it.
Here are a few common examples of everyday data requests from various teams:
How Reverse ETL Powers Business Teams
There’s a high probability that you’ve had to deal with at least one of these requests. The data is likely already available in your data warehouse. With Reverse ETL SQL is really all you need to extract and sync that data to your external tools – thus making it the simplest solution.
Reverse ETL has also emerged as a general-purpose solution in data infrastructure and software engineering and there are two primary use cases powering this:
The most obvious use case for Reverse ETL in software engineering is activating your analytics and data science models to build personalized customer experiences.
E-commerce is a great example. Pretend your data science team calculates a score on top of your data warehouse or data lake to define a user’s likelihood of buying a product. And your growth team wants to drive more purchases by offering discounts to users who are deemed unlikely to make a purchase. Since your warehouse is too slow to serve user-facing experiences, your engineers could simply use a Reverse ETL tool to sync the propensity score in your warehouse to your production database, giving you the ability to serve customers with personalized in-app experiences in real-time.
Today, customer data is spread across dozens – if not hundreds of disparate systems. Sometimes, your cloud applications need to access information from disparate data sources.
Pretend you’re a B2B SaaS company with enterprise customers on a contract. Every time a new enterprise is onboarded, your sales deal desk records each customer’s credit allotment in Salesforce. Your customers keep asking to see their credit allotment inside your web app, but your developers don’t want to integrate with Salesforce. However, chances are that Salesforce data is already available in your data warehouse (via an ELT tool like Fivetran).
With Reverse ETL, you can sync relevant Salesforce information from your warehouse to the production database that powers your app, giving your customers direct access to their billing information.
The possibilities are endless, but the key takeaway is that Reverse ETL is becoming a core part of the software engineering toolkit and it isn’t limited to “data projects.”
Although it’s relatively easy to see why you need Reverse ETL, it’s not always as straightforward to see what that actually entails. There are basically a near-limitless number of use cases for Reverse ETL but in many scenarios, the use cases tend to be centered around your data teams, marketing teams, and sales teams.
Advertising is arguably the backbone of any marketing team and figuring out how to increase match rates, improve return on ad spend (ROAS), and decrease customer acquisition costs (CAC) is at the forefront of every decision. Enriching your advertising platforms with rich behavioral data about your customers for retargeting campaigns and lookalike audiences can be somewhat challenging though, as the typical process involves manual downloading and uploading individual data sets.
Lucid, the visual collaboration suite and visual diagramming platform, faced this exact problem. Since adopting Reverse ETL, Lucid has seen a 56% increase in ROAS in Google and a 37% increase in new users.
Imperfect Foods, the leading online grocer at the forefront of eliminating food waste, saw similar benefits, reducing the CAC by 15% and increasing customer reactivations by 53%.
The dream of every sales rep is to have a centralized platform with a holistic view of the customer. Customer relationship management platforms (CRMs) arose as the first platform to tackle this challenge, but these platforms really only capture sales and marketing interactions.
In reality, your sales team needs and wants access to the unique behavioral data and product usage data that lives in your warehouse so they can track every step in the customer journey and accelerate deal cycles.
Gorgias, the e-commerce helpdesk platform, uses Reverse ETL for this exact purpose and syncs important behavioral data about prospects and accounts directly to Hubspot so individual sales reps can prioritize high-value accounts, enroll contacts in customized email sequences in real-time, and monitor product usage spikes. Since implementing Reverse ETL, Gorgias has grown its outbound pipeline by 60-70% and seen a 2x in quarterly revenue.
Your business teams are not the only ones to benefit from Reverse ETL, it can also have a huge impact on your data teams. No data engineer wants to build or maintain custom integrations. They’d much rather be building data models and optimizing your current technology stack.
Nando’s (the world-famous peri-peri chicken restaurant) experienced a similar problem, but since implementing Reverse ETL, engineering time has dropped from 80% to 20%.
Seesaw, the online learning platform experienced lowered the time it took to integrate with Salesforce from days to minutes.
While at face value it seems like most of the use cases for Reverse ETL are largely focused on syncing data, this is not the case. Reverse ETL is also used to power notifications in messaging tools like Slack and workflows in various SaaS applications like Hubspot.
Vendr, the SaaS platform for buying SaaS products uses Reverse ETL for this exact use case. Vendr leverages Reverse ETL to escalate messages directly to customers in Slack, and also notify employees every time a sales rep saves customers over 25% in savings. Every time a transformation job finishes running in the warehouse (via dbt), Vendr syncs data directly to Hubspot to trigger workflows for specific emails to both buyers and sellers.
If you’ve ever bought enterprise software, then you’ll know that there are always pros and cons to buying a purpose-built solution and building one in-house. If you’re leaning toward the DIY camp you’ll likely need spare data engineering resources (if you have spare resources you are one of the few).
Building Reverse ETL in-house can get complicated very quickly because every application or tool in your stack is equipped with a unique API that is constantly updating and changing and that means you’ll either have to download/upload manual CSV files or be forced to build a unique integration for every tool in your data stack.
Third-party APIs & CSVs
You’ll also be forced to monitor and manage each integration because a single change in an API can easily break your data flows and this isn’t even mentioning all of the factors you have to consider when integrating with third-party APIs including:
Needless to say, there are a lot more complexities involved with Reverse ETL. Depending on your resources it might make sense to go this route. Either way, you’re going to need to learn how to integrate with third-party APIs.
A fully managed Reverse ETL platform gives you the ability to schedule your data syncs from your warehouse in the same way that you would schedule ELT jobs to ingest data into your warehouse. Implementing a dedicated Reverse ETL enables your engineers to focus on tasks that actually impact your business rather than doing the boring job of building and maintaining Reverse ETL pipelines.
If you’re evaluating Reverse ETL platforms there are a lot of factors to consider:
Reverse ETL is a brand new category in the data space and like any hot category, many companies will try and ride this wave. If you prefer investing in best-in-class tools and want to have a fully managed Reverse ETL solution up and running in a matter of minutes, the first integration with Hightouch is completely free.
You’ve likely heard about ELT — Extract Load and Transform… the Modern Data Stack’s evolution on ETL. This is a game changer by nature in that it enables organizations to ingest raw data into the data warehouse and transform it later. ELT gives end-users access to the entirety of the datasets they need by circumventing downstream issues of missing data that could prevent a specific business question from being answered.
As data became more and more available to companies, data integration became one of the most crucial challenges for organizations. In the past decades, ETL (extract, transform, load) and later ELT (extract, load, transform) emerged as data integration methods that transfer data from a source to a data warehouse.