#25: Transformation and Action in Apache Spark
Mohammad Azzam
Immediate Joiner Snaplogic Developer| Python | SQL | Spark | PySpark | Databricks | SnapLogic | ADF | Glue | Redshift | S3 | AWS Certified x2 | Databricks Certified Data Engineer Associate | SnapLogic Certified
In Apache Spark, there are two types of operations that can be applied to RDDs (Resilient Distributed Datasets): transformations and actions. Here's a breakdown of each:
Transformations:
Common transformations include map(), filter(), flatMap(), reduceByKey(), join(), groupByKey(), etc. These transformations typically perform data processing tasks like filtering, mapping, aggregating, joining, and sorting data.
Actions:
In summary, transformations are used to build a directed acyclic graph (DAG) of computation, describing how data is transformed from one RDD to another, while actions execute the computations and produce final results or write data to external storage.