Data Reading Club #10
Source: https://unsplash.com/photos/HauxSOFvh6k?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyT

Data Reading Club #10

My focus these past few weeks has been on preparing for #dp203 certification from Microsoft. This meant reading a lot of Microsoft documentation and training material. I also read some articles created elsewhere and some not really related to the certification.

Here we go????


Spark Jargon For Starters

Nodes, workers, executors- this Spark jargon can be intimidating as you begin working with Spark.

Since this is the engine behind Azure Synapse Analytics and (Azure) Databricks- two of the services tested in detail in the Microsoft certification- I spent some extra time understanding it.

The #medium article that I link to begins high-level and then goes into quite some depths to explain how to configure parameters for Spark applications.

?? I cannot speak to the quality of the configuration discussion in this article because this is all abstracted away when working in Synapse, and also because the article is from a few years back. But I did find the explanations at the beginning of the article helpful and easy to understand.

?? Side note: skimming through the in-depth part about configurations needed for Spark makes me realize how much is abstracted away in Synapse.


Black code style ?

In my old Data Analyst job I was used to working with drag-and-drop tools. Whenever I needed to filter some values or join a table with another table, I found the right tools and they did the job for me. No code and little configuration.

Now that I work in Pyspark, code formatting and style has definitely become a priority for me. I want to write readable and consistent code.

And so one of my colleagues recommended ?? black. It's a code formatter that can be integrated with different editors. And since I do not work in editors, what I am doing is taking some pointers and applying them when writing code.

?? Apparently Black follows?PEP8 format, which is the official Python style guide.


??

#datareadingclub?series is a weekly LinkedIn newsletter aimed at sharing a short list of thought-provoking material that I came across that week. The list is not limited to reading material, but also includes podcasts, video clips etc.

Georgios (George) Zefkilis

Software/Data Engineer at Novo Nordisk Digital Data & IT

1 年

Maybe not relevant for you at this point but I have started reading Spark: The definitive guide by Bill Chambers to get deep/better understanding of Spark technology.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了