Data Reading Club #10
Source: https://unsplash.com/photos/HauxSOFvh6k?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyT

Data Reading Club #10

My focus these past few weeks has been on preparing for #dp203 certification from Microsoft. This meant reading a lot of Microsoft documentation and training material. I also read some articles created elsewhere and some not really related to the certification.

Here we go????


Spark Jargon For Starters

Nodes, workers, executors- this Spark jargon can be intimidating as you begin working with Spark.

Since this is the engine behind Azure Synapse Analytics and (Azure) Databricks- two of the services tested in detail in the Microsoft certification- I spent some extra time understanding it.

The #medium article that I link to begins high-level and then goes into quite some depths to explain how to configure parameters for Spark applications.

?? I cannot speak to the quality of the configuration discussion in this article because this is all abstracted away when working in Synapse, and also because the article is from a few years back. But I did find the explanations at the beginning of the article helpful and easy to understand.

?? Side note: skimming through the in-depth part about configurations needed for Spark makes me realize how much is abstracted away in Synapse.


Black code style ?

In my old Data Analyst job I was used to working with drag-and-drop tools. Whenever I needed to filter some values or join a table with another table, I found the right tools and they did the job for me. No code and little configuration.

Now that I work in Pyspark, code formatting and style has definitely become a priority for me. I want to write readable and consistent code.

And so one of my colleagues recommended ?? black. It's a code formatter that can be integrated with different editors. And since I do not work in editors, what I am doing is taking some pointers and applying them when writing code.

?? Apparently Black follows?PEP8 format, which is the official Python style guide.


??

#datareadingclub?series is a weekly LinkedIn newsletter aimed at sharing a short list of thought-provoking material that I came across that week. The list is not limited to reading material, but also includes podcasts, video clips etc.

Georgios (George) Zefkilis

Software/Data Engineer at Novo Nordisk Digital Data & IT

1 年

Maybe not relevant for you at this point but I have started reading Spark: The definitive guide by Bill Chambers to get deep/better understanding of Spark technology.

要查看或添加评论,请登录

Ivanna Jurkiv Ditlevsen的更多文章

  • I am now a 7(8?) months old data engineer part 2

    I am now a 7(8?) months old data engineer part 2

    I recently began a series of posts on my learnings in a role of a data engineer. The focus of my first post was on…

    2 条评论
  • Data Reading Club #9

    Data Reading Club #9

    How to Identify Your Business-Critical Data by Mikkel Dengs?e There comes a point when a data team delivers so many…

  • I am now a 7 months old data engineer part 1

    I am now a 7 months old data engineer part 1

    They say that time flies when you are having fun. It has certainly zoomed by at warp speed for me ?? in the last few…

    9 条评论
  • Data Reading Club #8

    Data Reading Club #8

    One Version of Truth According to My Cousin Vinny by Eckerson Group ?? Is there such a thing as one version of the…

    4 条评论
  • Data Reading Club #7

    Data Reading Club #7

    Microsoft Fabric Launch Digital Event ??It has been a big week in the world of Azure users. An annual Microsoft Build…

    1 条评论
  • Data Reading Club #6

    Data Reading Club #6

    The Rise of the Semantic Layer in the Modern Data Stack with Dave Mariani- Monday Morning Data Chat ???How do you…

  • Data Reading Club #5

    Data Reading Club #5

    The Failed Promises of Extract, Transform, and Load—and What Comes Next ??Disclaimer: this is one hell of a provocative…

    2 条评论
  • Data Reading Club #4

    Data Reading Club #4

    Testing Data Pipelines: The Modern Data Stack Challenge by Ari Bajo Rouvinen How do you know that what you deploy to…

    5 条评论
  • Data Reading Club #3

    Data Reading Club #3

    The Death of Data Modeling by Chad Sanderson Although data modeling has been around for decades, it is not something…

    2 条评论
  • Data Reading Club #2

    Data Reading Club #2

    Back to the Future: Where Dimensional Modeling Enters the Modern Data Stack by Tony Dahlager and John Barcheski Data…

    1 条评论

社区洞察

其他会员也浏览了