Data Engineering Open Forum at Netflix. Was. A. Blast!
Pawel Mikler
Leading Clurgo global expansion by delivering data-intensive applications used by millions
Last Thursday, April 18th Netflix Campus in Los Gatos, California turned into epicenter for experienced data practitioners that came from well-known organizations like: Tesla , Airbnb , OpenAI , LinkedIn Meta or Warner Bros. Discovery to connect with fellow professionals and to bridge the gap between technical depth and accessible discussions in the data engineering field.
The event kicked-off and was orchestrated by an incredible (and hardworking!): Xinran Waibel who introduced the idea behind the first ever Data Engineering Open Forum at Netflix and the importance for an ongoing dialogue within the data professionals community.
The event was very useful as it revealed some impressive set of tools like: Pensive (error classification service that leverages the rule-based classifier), Nightingale (service running the ML model trained using Metaflow and is responsible for generating a retry recommendation), Scheduler (service scheduling jobs, current implementation is with Netflix Maestro) or ConfigService by Netflix which use neural networks to refine Spark configs for more reliable pipeline retries. The recommended configurations are saved in ConfigService as a JSON patch with a scope defined to specify the jobs that can use the recommended configurations.
Jide O. , from Context Data caught everyone's attention with their AI Agent building a Data Ontology model (logical and physical), served via a Knowledge Graph to LLMs. A Case Study of Manufacturing firm was a great example on how Jide's solution can incorporate into complex domain such as supply chain operations, and:
The presented solution is an interesting example showcasing the scale of complexity for an enterprise client when it comes to data integration for optimization analysis as well as how challenging is to incorporate such a solution in domain-specific context.
Netflix former Director, Data Science & Engineering, Jason Reid beautifully demonstrated the flexibility of Apache Iceberg. Tabular co-Founder explained the versatility of Iceberg and the reliability this open table format brings to big data, while making it possible for engines like Spark, Trino, Flink, Presto.
One of my favorite talks during the event was Clark Wright explaining AirBnB's innovative method for assessing data warehouse quality, which has huge potential to set a new standard for data productivity in the industry. It was very interesting to hear about the gold standards for defining data quality dimensions at AirBnB:
领英推荐
and the perspective of using the score by two types of users: Data Producer (well-built data rises to the top of data consumer demand) and Data Consumer (data quality becomes something data consumers are demanding).
In order to better address the specific AirBnB customer needs and create data-sensitive product & services AirBnB data engineering approach to scale data quality is possible by:
The event at Netflix took just one full day and it was one of the most productive and well-spend days in my entire professional career. Long lines with so much accurate questions to the speakers, the beautiful Netflix Campus spoiled in a Californian sun, and enlightening sessions from the Gold Standard Data Engineering Community makes me wanna come back very soon!
I'm thankful for being able to share know-how and to introduce Clurgo to the #dataengineering family (I think I was the only attendee that came from Europe to attend this event).
Met such a wonderful people ( Karl Eden Stephanie Vezich Tamayo Jessica Larson Tulika Bhatt Martin Franco and so many more) as well as reconnected with my old-friend: Maciej Kaziród
A special thanks goes to: Xinran Waibel Rashmi Shamprasad Chris Colburn Jai Balani and Patricia Ho for the hard work of organizing such a remarkable event! You're true hidden heroes!
Until next time,
Pawel
Software Engineer @ Google | Bachelor's in Computer Science
10 个月Thanks for summarizing it so well, Pawel! Wish I could attend.
Building at the intersection of Data Engineering and AI
11 个月Thank you very much for feedback and review Pawel Mikler. Had a great discussion with you and hope we can have more follow ups
CTO, Builder, Trailblazer, Technologist and entrepreneur.
11 个月great piece! I wish I could attend. thanks for summarizing, i now have a checklist of new technologies to research and get up to speed.
Sales Manager, Banking Industry | Finance Advisory | MBA, Public Sector
11 个月Great summary of the special event, one day and so much information given in good quality, congratulations Pawel!
CTO & co-Founder at Clurgo
11 个月Great Post! I wish I hadn't been there ;-(