Evolution in ETL: How Skipping Transformation Enhances Data Management
This article provides an overview of two new data preparation techniques that enable data democratization while minimizing transformation burdens.
Image by Editor
Few data concepts are more polarizing than ETL (extract-transform-load), the preparation technique that has dominated enterprise operations for several decades. Developed in the 1970s, ETL shined during an era of large-scale data warehouses and repositories. Enterprise data teams centralized data, layered reporting systems and data science models on top, and enabled self-service access to business intelligence (BI) tools. However, ETL has shown its age in an era of cloud services, data models, and digital processes.
Searches such as “Is ETL still relevant/in-demand/obsolete/dead?” populate results on Google. The reason why is that enterprise data teams are groaning under the weight of preparing data for widespread use across employee roles and business functions. ETL doesn’t scale easily to handle vast volumes of historical data stored in the cloud. Nor does it deliver real-time data required for rapid executive decision-making. In addition, building custom APIs to provide applications with data creates significant management complexity. It’s not uncommon for modern enterprises to have 500 to 1,000 pipelines in place as they seek to transform data and equip users with self-service access to BI tools. However, these APIs are in a constant state of evolution as they must be reprogrammed when the data that they pull changes. It’s clear this process is too brittle for many modern data requirements, such as edge use cases.
In addition, application capabilities have evolved. Source systems provide business logic and tools to enforce data quality while consuming applications enable data transformation and provide a robust semantic layer. So, teams are less incentivized to build point-to-point interfaces to move data at scale, transform it, and load it into the data warehouse.
Two innovative techniques point the way to enabling data democratization while minimizing transformation burdens. Zero ETL makes data available without moving it, whereas reverse ETL pushes rather than pulls data to the applications that need it as soon as it is available.
Zero ETL Reduces Data Movement and Transformation Requirements
Zero ETL optimizes the movement of smaller data sets. With data replication, data is moved to the cloud in its current state for use with data queries or experiments.
But what if teams don’t want to move data at all?
Data virtualization abstracts servers from end users. When users query data from a single source, that output is pushed back to them. And with query federation, users can query multiple data sources. The tool combines results and presents the user with integrated data results.
These techniques are called zero ETL because there is no need to build a pipeline or transform data. Users handle data quality and aggregation needs on the fly.
Zero ETL is ideally suited for ad-hoc analysis of near-term data, as executing large queries on historical data can harm operational performance and increase data storage costs. For example, many retail and consumer packaged goods executives use zero ETL to query daily transactional data to focus marketing and sales strategies during times of peak demand, such as the holidays.
Google Cortex provides accelerators, enabling zero ETL on SAP enterprise resource planning system data. Other companies, such as one of the world’s largest retailers and a global food and beverage company, have also adopted zero ETL processes.
Zero ETL gains include:
- Providing speed to access: Using zero ETL processes to provision data for self-service queries saves 40-50% of the time it takes using traditional ETL processes since there’s no need to build pipelines.
- Reducing data storage requirements: Data does not move with data virtualization or query federation. Users only store query results, decreasing storage requirements.
- Delivering cost savings: Teams that use zero ETL processes save 30-40% on data preparation and storage costs compared to traditional ETL.
- Improving data performance: Since users query only the data they want, results are delivered 25% faster.
To get started with zero ETL, teams should evaluate which use cases are best suited for this technique and identify the data elements they need to execute it. They also should configure their zero ETL tool to point to the desired data sources. Teams then extract data, create data assets, and expose them to downstream users.
Using Reverse ETL to Feed Applications with Data On-Demand
Reverse ETL techniques simplify data flows to downstream applications. Instead of using REST APIs or endpoints and writing scripts to pull data, teams leverage reverse ETL tools to push data into business processes on time and in full.
Using reverse ETL provides the following benefits:
- Reducing time and effort: Using reverse ETL for key use cases reduces the time and effort to access data for key use cases by 20-25%. A leading cruise line leverages reverse ETL for digital marketing initiatives.
- Improving data availability: Teams have greater certainty they’ll have access to the data they need for key initiatives, as 90-95% of target data is delivered on time.
- Decreasing costs: Reverse ETL processes reduce the need for APIs, which require specialized programming skills and increase management complexity. As a result, teams reduce data costs by 20-25%.
To get started with reverse ETL, data teams should evaluate use cases that require on-demand data. Next, they determine the frequency and volume of data to be delivered and choose the proper tooling to handle these data volumes. Then, they point data assets in the data warehouse to their destination consumption systems. Teams should prototype with one data load to measure efficiency and scale processes.
To Succeed with Data, Use a Variety of Preparation Techniques
Zero ETL and reverse ETL tools provide teams with fresh options for serving data to users and applications. They can analyze factors such as use case requirements, data volumes, delivery timeframes, and cost drivers to select the best option for delivering data, whether traditional ETL, zero ETL, or reverse ETL.
Partners support these efforts by providing insight into the best techniques and tools to meet functional and non-functional requirements, providing a weighted scorecard, conducting a proof of value (POV) with the winning tool, and then operationalizing the tool for more use cases.
With zero ETL and reverse ETL, data teams achieve their goals of empowering users and applications with the data they need where and when they need it, driving cost and performance gains while avoiding transformation headaches.
Arnab Senis an experienced professional with a career spanning over 16 years in the technology and decision science industry. He presently serves as the VP-Data Engineering at Tredence, a prominent data analytics company, where he helps organizations design their AI-ML/Cloud/Big-data strategies. With his expertise in data monetization, Arnab uncovers the latent potential of data to drive business transformations across B2B & B2C clients from diverse industries. Arnab's passion for team building and ability to scale people, processes, and skill sets have helped him successfully manage multi-million-dollar portfolios across various verticals, including Telecom, Retail, and BFSI. He has previously held positions at Mu Sigma and IGate, where he played a crucial role in solving clients’ problems by developing innovative solutions. Arnab's exceptional leadership skills and profound domain knowledge have earned him a seat on the Forbes Tech Council.