SeattleDataGuy’s Newsletter

Share this post

A Zero ETL Future

seattledataguy.substack.com

A Zero ETL Future

SeattleDataGuy
Dec 13, 2022
31
11
Share this post

A Zero ETL Future

seattledataguy.substack.com

AWS has jumped on the bandwagon of removing the need for ETLs. Snowflake announced this both with their hybrid tables and their partnership with Salesforce.

Now, I do take a little issue with the naming “Zero ETLs”. Because at the very surface the functionality described is often closer to a zero integration future, which probably doesn’t come across ‘sexy’ enough. This may also only be phase one of AWS and Snowflake’s plan to remove the need for ETLs.

Overall, I do agree with the idea of reducing the amount of duplicate logic and data that exist. So if there is some form of path that leads to a zero ETL world, we should make it happen.

But what would it take?

In this article, I will go through a Zero ETL future and how we might get there.

The Problem Of Perception

When I read or hear about announcements like this, I assume there are undiscussed nuances. But I have found when the business reads these types of announcements, they take them at face value.

They come back and tell their team, we want to move to this no-code, zero ETL and serverless future. It all sounds good from a business perspective: costs will be reduced, head counts can be slashed, and value from data can be gained immediately.

But this will skip over all the other nuances that are unavoidable.


Before diving into a Zero ETL future, let’s review some of the reasons ETLs exist.

Why We ETL

Simply duplicating data from point A to point B is not an ETL. If that’s all we needed to do, we could just create replicant databases and report off of those. So why create complex systems and use tools like Airflow or Prefect at all?

Why hire expensive data engineers?

Why even ETL?

Historical Data - Generally speaking, most operational databases don’t track historical data. Specifically they don’t track historically changing entity data like where a customer lives.

So when data gets updated or deleted, if you’re not using CDC(change data capture), you’re going to lose information. This is why the concept of slowly changing dimensions exists - to help track information to ensure if a customer moves states or an employee changes jobs, we can accurately reflect this over time.

Now don’t get me wrong. What you could do instead of a traditional SCD(slowly changing dimensions) is simply create a date partition and load data into a table that is ever expanding. We did this from time to time at Facebook because it’s a simpler design.

data modeling consulting

Keep reading with a 7-day free trial

Subscribe to SeattleDataGuy’s Newsletter to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2023 SeattleDataGuy
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing