Data Reading Club #10
Ivanna Jurkiv Ditlevsen
Senior Data Engineer | Novo Nordisk Engineering
My focus these past few weeks has been on preparing for #dp203 certification from Microsoft. This meant reading a lot of Microsoft documentation and training material. I also read some articles created elsewhere and some not really related to the certification.
Here we go????
Nodes, workers, executors- this Spark jargon can be intimidating as you begin working with Spark.
Since this is the engine behind Azure Synapse Analytics and (Azure) Databricks- two of the services tested in detail in the Microsoft certification- I spent some extra time understanding it.
The #medium article that I link to begins high-level and then goes into quite some depths to explain how to configure parameters for Spark applications.
?? I cannot speak to the quality of the configuration discussion in this article because this is all abstracted away when working in Synapse, and also because the article is from a few years back. But I did find the explanations at the beginning of the article helpful and easy to understand.
?? Side note: skimming through the in-depth part about configurations needed for Spark makes me realize how much is abstracted away in Synapse.
领英推荐
In my old Data Analyst job I was used to working with drag-and-drop tools. Whenever I needed to filter some values or join a table with another table, I found the right tools and they did the job for me. No code and little configuration.
Now that I work in Pyspark, code formatting and style has definitely become a priority for me. I want to write readable and consistent code.
And so one of my colleagues recommended ?? black. It's a code formatter that can be integrated with different editors. And since I do not work in editors, what I am doing is taking some pointers and applying them when writing code.
?? Apparently Black follows?PEP8 format, which is the official Python style guide.
??
#datareadingclub?series is a weekly LinkedIn newsletter aimed at sharing a short list of thought-provoking material that I came across that week. The list is not limited to reading material, but also includes podcasts, video clips etc.
Software/Data Engineer at Novo Nordisk Digital Data & IT
1 年Maybe not relevant for you at this point but I have started reading Spark: The definitive guide by Bill Chambers to get deep/better understanding of Spark technology.