Introducing English SDK for Spark
Sarah Floris, MS
Senior Data & ML Engineer | dutchengineer.substack.com | Host of Ask A Data Mentor Podcast
1B+ downloads of Spark per year.1
The popularity of Spark is not surprising given the many tools available for data engineering, data science, and data analytics that have Spark capabilities. I have personally used Spark to pull in terabytes of data through batching and streaming formats without having to change much of my code.
The biggest challenge when learning the Spark framework was the coding language and having to look up the slight differences between Pyspark and Python, or figuring out how to transform a data frame and plot it. Fortunately, this will soon be less of an issue with the English SDK for Spark.
In this tutorial, I will show you how to implement the English SDK for Spark with a California Housing sample dataset. This is just a small glimpse into what the future holds for this tool.
The code can be found?here?and was run in Google Colaboratory with the provided sample dataset.
Read more below.
Head of Database & Data Governance | Exadata X9M | Oracle OCP 11g | ADB Cloud Specialist 2019
1 年Thank you for sharing
Data Engineer | Python Developer | Data & Software Design
1 年Im concerned about the esoteric bugs. Though it’s a great step for making spark more useable. Thanks for sharing!
Data Analytics Manager @KPMG UK | Generative AI enthusiast
1 年Thanks for sharing Here is the link of my post on 6 Python Libraries You Need to Know for Data Analytics https://www.dhirubhai.net/posts/mk-analytics_6-python-libraries-you-need-to-know-for-data-activity-7083625817766805504-9dkH?utm_source=share&utm_medium=member_desktop
Sales Associate at American Airlines
1 年Thanks for sharing
Software engineer
1 年Cf