#25: Transformation and Action in Apache Spark

#25: Transformation and Action in Apache Spark

In Apache Spark, there are two types of operations that can be applied to RDDs (Resilient Distributed Datasets): transformations and actions. Here's a breakdown of each:

Transformations:

  • Transformations create a new RDD from an existing RDD.?
  • However, transformations are lazy, meaning they do not compute their results immediately. Instead, they create a lineage graph (DAG) representing the sequence of transformations applied to the base dataset.?
  • Spark keeps track of these transformations and only computes them when an action is called.?
  • This lazy evaluation allows Spark to optimize the execution plan.

Common transformations include map(), filter(), flatMap(), reduceByKey(), join(), groupByKey(), etc. These transformations typically perform data processing tasks like filtering, mapping, aggregating, joining, and sorting data.

Actions:

  • Actions, on the other hand, trigger the execution of the transformations and produce some output.?
  • When an action is called on an RDD, Spark evaluates the lineage graph and computes the result, which might involve executing the transformations on the distributed dataset across the cluster.?
  • Actions are the operations that initiate the actual computation and return the results to the driver program or write data to external storage.
  • Examples of actions include collect(), count(), reduce(), take(), saveAsTextFile(), foreach(), etc.?
  • These actions perform tasks such as collecting data to the driver, counting elements in an RDD, reducing elements to a single result, taking a sample of data, saving RDDs to external storage, or executing a function on each element of the RDD.

In summary, transformations are used to build a directed acyclic graph (DAG) of computation, describing how data is transformed from one RDD to another, while actions execute the computations and produce final results or write data to external storage.

要查看或添加评论,请登录

Mohammad Azzam的更多文章

社区洞察

其他会员也浏览了