What a beautiful day! Excellence certification received from Zach Wilson's DataExpert.io January bootcamp, including three mentorship sessions – Woohoo! Valuable Data Engineering knowledge gained, and most importantly I made amazing friends here. Thank you all of them This was absolutely?one the best decision ever! Keep grinding! #DataEngineering
DataExpert.io
教育业
San Francisco,California 30,109 位关注者
Data Engineering education, solutions, and evangelism
关于我们
EcZachly Inc is a company dedicated to inspiring and educating the next generation of data talent!
- 网站
-
https://www.dataexpert.io
DataExpert.io的外部链接
- 所属行业
- 教育业
- 规模
- 2-10 人
- 总部
- San Francisco,California
- 类型
- 私人持股
- 创立
- 2023
地点
-
主要
US,California,San Francisco,94103
DataExpert.io员工
-
Preeti Prajapati
Data Engineering | Adtech
-
Zach Wilson
Zach Wilson是领英影响力人物 DataExpert.io 创始人 | 高级数据工程师| 7年经验FAANG工程师
-
Arockia Nirmal Amala Doss
Founder, Data Engineer @ ZippyTec GmbH | Data Migration & Data Engineering Consulting | Data Migration Coaching | AWS Community Builder
-
Samuel Lederman
Data Scientist | Mathematician | Investor | Traveler
动态
-
Don't miss this amazing meetup in Bengaluru!
The last thing I'll be doing in Bengaluru before I leave is a data engineering meetup on March 22nd at 3PM IST with all your favorite data engineering creators. I'm teaming up with Ankit Bansal, Darshil Parmar and Deepak Goyal to deliver you a meetup that you will not forget! Seats are very limited! Secure your spot here: bit.ly/blr-de-meetup First 20 people to use code MEETZACH can get 50% off!
-
-
A majority of those with ADHD and autism are unemployed. The amount of untapped potential in these brilliant people is immense I'm going to be live with Jhillika talking about how to grow your career while having ADHD tomorrow at 3 PM pacific on Riverside, YouTube, LinkedIn, and X! I'll be talking about: - how creating a "working with me" document can be life changing for those who are neurodivergent - things I do to manage my ADHD so it doesn't get in the way as much - how you can use resources like Mentra to land jobs Make sure to share this with your neurodivergent friends!
此处无法显示此内容
在领英 APP 中访问此内容等
-
Everybody and their dog is claiming to be an AI engineer nowadays! Delia and Madeline understand that wrapping ChatGPT with Python makes you a full stack developer, NOT AN AI engineer If you use ChatGPT to make data quality checks for your pipelines, you’re a data engineer not an AI engineer! If you’re fine tuning models, you’re an AI engineer. If you’re building models, you’re an AI engineer. If you’re building evaluation sets, you’re an AI engineer. If you aren’t doing any of those activities, you’re a spicy full stack developer!
-
Successfully started the Week 1: Dimensional Data Modeling of the "Finish the YouTube Boot Camp" hosted by Zach Wilson and DataExpert.io and have completed the Day 1 lecture and lab sessions ?? Here's some gist about the topics and my insights from the lesson: Day 1: Working with Complex Data Types - Struct, Array etc. 1. Know Your Consumer Initially, before beginning any data modeling, it is imperative to understand in depth who the end user of data is. Whether it is being used for analytics by a Data Analyst / Data Scientists, or consumed by downstream jobs managed by Data Engineers to curate master data with other pipelines dependencies, or being fed into ML models or executive dashboards, the intricacies of the data model distinctively varies in terms of usage of complex vs. flat data types, storage and compression, ease of query and accessibility. 2. OLTP vs. Master Data vs. OLAP Continuum Understanding the differences when modeling a transactional system like an application database requiring low latency, low volume versus an analytical system such as cubes used for quick analysis on aggregations, while also finding a sweet spot in between, where the master data sits, which is deduped and optimized for completeness of entity definitions from which other datasets can be created. 3. Cumulative Table Design CT designs are very commonly used to create master data, where you hold on to all of the dimensions that existed right up until a specific time (until purged or hibernated). Such designs are beneficial for state transition tracking of different metrics for e.g. for growth accounting, which can be used to analyze patterns and model. Especially, the design serves well in computing cumulation metrics, using complex data types such as array of struct to combine the changing values. 4. Complex Data Types Usage of complex data types depending upon the the type of modeling based on the end user, ranging from most compact for transactional purposes to most usable for analytics, with upstream staging or master data residing somewhere in between. Mostly used complex data types such as struct, map, array, nested arrays such as array of struct are quite common to utilize for compacting the datasets. 5. Temporal Cardinality Explosion, Compression & Run-length Encoding Explored the importance of considering the cardinality when working with dimensions that have a time aspect, the need to sort data correctly before compressing such as using parquet format with run-length encoding. Also, complex data types such as array of struct can be used to combine the temporal dimension values, which prevents spark shuffle from ruining compression when working on distributed environments. Thank you, Zach Wilson & DataExpert.io for the incredible session! Day 2, loading ?? ?? #bootcamp #zachwilson #dataexpertio #dataengineering #freeyoutubebootcamp #finishtheyoutubebootcamp #rampup #upskilling #onwardsandupwards
-
I’m happy to share that I’ve obtained a new certification: DataExpert.io Free Data Engineering Bootcamp Certificate from DataExpert.io!
此处无法显示此内容
在领英 APP 中访问此内容等
-
Kunmi put in the work and is improving his situation!
I spent most of the past 7 weeks on DataExpert.io's free Data Engineering Bootcamp led by Zach Wilson! This incredible course provided me with hands-on experience and a deeper understanding of core data engineering concepts. Over the last 7 weeks, I’ve focused on optimizing systems, managing data pipelines, and developing scalable solutions that bring real impact to businesses. Here’s a look at some of the key takeaways for me: ?? Flink & Apache Spark: From sessionization logic in Flink to optimizing joins and aggregations in Apache Spark, I’ve successfully implemented solutions that enhanced data pipeline performance and ensured data integrity. It was also a great opportunity to experiment with different data partitioning and aggregation techniques for more efficient query execution. ?? Experimentation & Metrics: I’ve designed and executed A/B tests to enhance user engagement in music streaming app(e.g Apple Music). My focus on testing personalized playlists, onboarding processes, and social features has provided valuable insights on how user experience and retention can be significantly improved through data-driven decisions. ?? SQL & PySpark: Converting PostgreSQL queries to SparkSQL and building PySpark jobs to handle Slowly Changing Dimensions (SCD) transformations was an exciting challenge. My work on backfill query conversions and unit testing ensured the integrity of data transformation processes. ?? Data Pipeline Ownership: I have taken ownership of multiple data pipelines, ensuring smooth operations, monitoring, and troubleshooting through comprehensive runbooks and on-call schedules. I’m committed to maintaining robust systems and ensuring data flows seamlessly, even during unforeseen challenges. Each assignment has taught me something new—whether it's refining my approach to data pipelines, improving collaboration with teams, or driving product improvement experiments. I'm excited to continue expanding my skill set and exploring more innovative ways to harness the power of data to solve real-world challenges. Big thanks to Zach for making this a free resource and of course to the Discord community for providing troubleshooting tips every step of the way!
-
-
Dibyanshu is committing the necessary energy to actually upskill and become better!
Microsoft Certified Data Platform Engineer | Lakehouse | Big Data | Spark | Python | Kafka | DWH | CICD | Docker | Azure, AWS | EXL, Deloitte
This journey felt different - it took late nights and bit of sweat to complete all the assignment, but it was absolutely worth it! Thanks to Zach Wilson for the great content and clear deadlines that kept me motivated to stay on track. If you're still thinking about joining, give it a try - it's totally worth it!" : https://lnkd.in/gn58ARjJ #bootcamp #dataengineering #learning
-
-
Hey folks! After seven weeks of a lot of hard work, I have successfully completed DataExpert.io Data Engineering Bootcamp! ?? ?? The topics covered were: - Dimensional Data Modeling - Fact Data Modeling - Apache Spark Fundamentals - Applying Analytical Patterns - Real-time pipelines with Flink and Kafka - Data Visualization and Impact - Data Pipeline Maintenance - KPIs and Experimentation - Data Quality Patterns Out of 30,100 participants, only 65 of us (0.2%) successfully completed all the homework assignments! I’m proud to be one of them! ?? A big shoutout to Zach Wilson and the amazing DataExpert.io team for creating such a comprehensive and impactful program. ?? I would also like to thank the discord community who were always available to answer questions and share knowledge. Finally, I would like to thank the people who liked my posts with class notes and helped disseminate my content across the network. If you want to review my class notes, you can access them through the address below: https://lnkd.in/dW5h7HKd
-
-
One of the best things I accomplished this January was completing Zach Wilson’s Free Data Engineering Bootcamp —a challenging yet incredibly rewarding experience. ?? This bootcamp required around 40 hours of dedication, packed with learning core concepts, engaging in hands-on labs, and tackling homework assignments. It wasn’t easy, but every bit of effort was absolutely worth it. ?? A huge thank you to Zach Wilson for making this bootcamp accessible to all. His ability to explain concepts through his work experiences at Facebook, Airbnb, and Netflix made the content engaging, relatable, and easy to grasp. ?? Through this workshop, I gained hands-on experience with essential data engineering topics, including: Dimensional Data Modeling Fact Data Modeling Apache Spark Fundamentals Analytical Patterns and KPIs Unit Testing PySpark Pipelines Real-Time Pipelines with Flink and Kafka Out of 30,100 participants, only 65 of us (0.2%) successfully completed all the homework assignments! I’m proud to be one of them! Every module brought exciting new challenges, and I couldn’t wait to dive into the next one. If you’re looking to enhance your data engineering skills, I highly recommend this bootcamp! (It’s open until February 7th.) DataExpert.io I’m now working on a personal project inspired by what I’ve learned and will share it once it’s complete. Stay tuned! Here’s to more growth, learning, and victories in 2025!
-