登录查看更多内容

Why AWS is investing in a zero-ETL future

Swami Sivasubramanian

VP, AI and Data

发布日期: 2023年7月6日

Data is at the center of every application, process, and business decision. When data is used to improve customer experiences and drive innovation, it can lead to business growth. According to Forrester , advanced insights-driven businesses are 8.5 times more likely than beginners to report at least 20% revenue growth. However, to realize this growth, managing and preparing the data for analysis has to get easier.

That’s why AWS is investing in a zero-ETL future so that builders can focus more on creating value from data, instead of preparing data for analysis.

Challenges with ETL

What is ETL? Extract, Transform, Load is the process data engineers use to combine data from different sources. ETL can be challenging, time-consuming, and costly.

It requires data engineers to create custom code.
DevOps engineers have to deploy and manage the infrastructure to make sure the pipelines scale with the workload. In case the data sources change, data engineers have to manually make changes in their code and deploy it again.
While this is happening data analysts can’t run interactive analysis or build dashboards, data scientists can’t build machine learning (ML) models or run predictions, and end-users can’t make data-driven decisions.

The time required to build or change pipelines makes the data unfit for near-real-time use cases such as detecting fraudulent transactions, placing online ads, and real-time supply chain analysis. In these scenarios, the opportunity to improve customer experiences, address new business opportunities, or lower business risks can simply be lost.

Jon Bonso 2 个月前

Key Components That Make Up Modern Data Architecture…

Vintage 3 个月前

AWS GLUE

Rohit Singh 1 个月前

AWS is bringing its zero-ETL vision to life

Zero-ETL makes data available to data engineers at the point of use through direct integrations between services and direct querying across a variety of data stores. This frees the data engineers to focus on creating value from the data, instead of spending time and resources building pipelines.

We have been making steady progress towards bringing our zero-ETL vision to life so organizations can quickly and easily connect to and act on their data. Here are just two examples:

With?Amazon Redshift ?Streaming Ingestion, organizations can configure Amazon Redshift to directly ingest high-throughput streaming data from?Amazon Managed Streaming for Apache Kafka ?(Amazon MSK) or?Amazon Kinesis Data Streams ?and make it available for near-real-time analytics in just a few seconds.?Customers can connect to multiple data streams and pull data directly into Amazon Redshift without staging it in Amazon Simple Storage Service (Amazon S3).
Using federated query in Amazon Redshift and Amazon Athena , organizations can run queries across data stored in their operational databases, data warehouses, and data lakes so that they can create insights from across multiple data sources with no data movement. Data analysts and data engineers can use familiar SQL commands to join data across several data sources for quick analysis, and store the results in Amazon S3 for subsequent use.

And just last week we announced the public preview of Aurora zero-ETL integration with Amazon Redshift, to enable near real-time analytics and machine learning (ML) using Amazon Redshift on petabytes of transactional data from Aurora. With this launch, customers can ingest data in hundreds of thousands of transactions every minute in Aurora and can still analyze them real-time in Redshift without having to build labor intensive and expensive ETL pipelines. To learn more read my full blog .

When organizations can quickly and seamlessly integrate data that is stored and analyzed in different tools and systems, they can make data-driven predictions with more confidence, improve customer experiences, and promote data-driven insights across the business.

Julian Frank

Sr. Solution Architect & GM - Amazon EBU

1 年

"Zero-ETL" definitely sounds good

Chris Ebert

Principal Software Engineer @ Tyler Technologies | AWS Community Builder

1 年

What a great vision statement

Adrian Brudaru

Open source pipelines - dlthub.com

1 年

None of these are really zero ETL solutions, they are just product features that assist with the ETL or use something that was already ETLd. A zero ETL future doesn't mean 99/100 ETLs still running because you can solve one on your cloud. Zero ETL can only be a marketing misnomer. However, dlt gets as close as possible to simplifying EL in open source, tech agnostic. Works with redshift or s3 too, but also with google cloud, snowflake, parquet files in storage, etc. dlt library is an open source data loading solution like nothing before - with built in schema inference and evolution, and scalable extraction, this makes data loading as simple as can be https://dlthub.com/docs/getting-started/try-in-colab https://dlthub.com/docs/reference/explainers/schema-evolution

5 次回应

Gourav Sengupta

Head - Data Engineering, Quality, Operations, and Knowledge

1 年

Because Swami Sivasubramanian the coders in AWS perhaps understand very little beyond the full form of acronym of ETL? Amazon does not have a single coherent solution for enterprise data architecture. If AWS is interested to know what ETL stands for beyond its abbreviations and implied technicalities let me know. Only a person with very little knowledge would argue that throwing millions of pen and paper will make a country educated, just like pushing more data makes a company suddenly intelligent.

1 次回应

Steve Wilshaw

AWS Cloud Superfan!

1 年

Sounds fabulous but sadly I am not seeing much progress with the functionality and reliability of Databrew. Feels like an abandoned product unfortunately which is a shame as it has huge untapped potential.

查看更多评论

要查看或添加评论，请登录

Chief Data Officer Insights on Generative AI and Data Strategy

2023年11月17日

Why AWS is investing in a zero-ETL future

Swami Sivasubramanian

VP, AI and Data

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Which Data Pipeline Orchestration Tool Is Right For?You? (ML4Devs Newsletter, Issue 16)

DATA Pill #078 - Streaming SQL in Data Mesh, Databricks + Arcion, BigQuery is much cheaper than you think

CIO Strategy for AWS Big Data Implementation

AWS Data Engineering Essentials Guidebook

Data Engineering on AWS

Simplifying Data Work with Amazon EMR and PySpark for Data Processing and Analysis

How to build a data pipeline with AWS MSK and AWS MSK Connect

Build a chatbot that retrieves and provides answers using SageMaker Canvas and AWS Data Wrangler

Building a Scalable Data Lake on AWS: A Comprehensive Guide