Apache Spark Transformations and Actions
Transformations & Actions

Apache Spark Transformations and Actions

In this detailed guide, we'll explore Transformations and Actions in detail, breaking down the complexities and providing easy examples to understand these concepts better.

Understanding Spark Transformations

What are Transformations?

Transformations in Apache Spark are operations that create a new data frame from an existing one. Think of transformations as the recipe steps in cooking. You have your raw ingredients (data), and with each transformation, you mix, filter, or reshape them to create a new dish (new DF).

There are two types are transformations

  • Narrow Transformation
  • Wide Transformation

Narrow Transformation

Transformations that do not result in data movement between partitions are called Narrow transformations.

Some Examples:

  1. map
  2. flatMap
  3. filter
  4. union

Wide Transformation

Transformations that involve data movement between partitions are called Wide transformations or shuffle transformations.

Some Examples:

  1. groupByKey
  2. aggregate
  3. join
  4. repartition

Complete list of transformations - https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations

Understanding Spark Actions

What are Actions?

Actions, on the other hand, are operations that trigger the execution of transformations, producing a result or side effect. Going back to our cooking analogy, actions are like turning on the oven to bake the final dish after all the preparation.

Some Examples:

  1. show
  2. count
  3. collect
  4. first
  5. head

Complete list of actions - https://spark.apache.org/docs/latest/rdd-programming-guide.html#actions

When Spark Consumes Resources

Lazy Evaluation

Spark operates on a principle called lazy evaluation. It doesn't execute transformations immediately but rather keeps track of them in a plan. It only springs into action when an action is called. Imagine creating a shopping list for your recipes – you plan everything first before hitting the store.

How Spark Remembers Transformations During Actions

Spark maintains a logical execution plan, known as the Directed Acyclic Graph (DAG). When an action is invoked, Spark refers to this plan to understand the sequence of transformations required to produce the final result. It's similar to following a cooking recipe step by step to create a delicious dish.

Conclusion:

Apache Spark's Transformations and Actions are like following a cooking recipe to prepare a delightful feast. Transformations are the recipe steps, and actions are the moments you put the plan into motion. Spark's lazy evaluation and DAG ensure efficient resource usage, making big data processing straightforward.

要查看或添加评论,请登录

Sai Prasad Padhy的更多文章

社区洞察

其他会员也浏览了