Data Warehousing on Apache Spark

eBay's data Warehouse for over a decade, there are millions of batch queries running daily, and it's processing 60+ PB data in every day. Based upon that, data service and products enable eBay business decisions and site features, so it has to be always available and accurate.

Starting from 2017, we have been working on migrating the entire relational processing workload to Apache Spark. We built a full automation framework with key components like RDBMS SQL convertor and metadata driven data flow optimization, it covers end2end ETL from code to process and enables 90%+ of them are migrated in automatic way.

And same time, to scale Apache Spark capability and scalability for such a large scale enterprise data Warehouse, we have been enhancing and extending the key features for native Spark, like Adaptive Execution for dynamic job optimization, Indexed Bucket for data layout optimization, and ACID...

2 Topics will be presented in Spark Summit in October.

Max Shen

Engineering Manager at Facebook

6 年

Impressive!

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了