#23 RDD Transformation and Action Operations Example with PySpark -B

#23 RDD Transformation and Action Operations Example with PySpark -B

Continuing from the previous post by using the same RDD created. If you haven't gone through the post A here is the link =>post 21 link

  • sortBy() operation to sort the RDD’s by values in descending order.

  • Count number of elements in RDD

  • Printing the results using collect()

  • "take(n)" retrieves the first 'n' elements of an RDD, suitable for small subsets.
  • "collect()" gathers all elements of the RDD back to the driver, but it's inefficient for large datasets due to potential memory issues.


要查看或添加评论,请登录

Mohammad Azzam的更多文章

  • #33 what is broadcast join in spark

    #33 what is broadcast join in spark

    In Apache Spark, a "broadcast join" is a type of join operation used to optimize performance when joining large and…

  • #32 Repartition vs coalsece

    #32 Repartition vs coalsece

    repartition() and coalesce() are both methods in Apache Spark used to manage the number of partitions in an RDD or…

  • #31: Partitions in spark

    #31: Partitions in spark

    In Apache Spark, partitions are the basic units of parallelism and data distribution. When you create an RDD (Resilient…

  • #30 Task, job and stage in spark

    #30 Task, job and stage in spark

    In Apache Spark, jobs, tasks, and stages are fundamental concepts that play a crucial role in the distributed execution…

  • #29 ReduceBy() key vs groupBy() key in spark RDD

    #29 ReduceBy() key vs groupBy() key in spark RDD

    In the context of Apache Spark's Resilient Distributed Datasets (RDDs), both reduceByKey and groupByKey are…

  • #28: reduce VS reduceByKey in Apache Spark RDDs

    #28: reduce VS reduceByKey in Apache Spark RDDs

    reduce() and reduceByKey() are two distinct operations available in Apache Spark, a distributed computing framework for…

    2 条评论
  • #27 Narrow vs Wide Transformations in Spark

    #27 Narrow vs Wide Transformations in Spark

    In Apache Spark, transformations are broadly categorized into two types based on how they operate across partitions of…

  • #26: Shuffling and Sorting in Apache Spark

    #26: Shuffling and Sorting in Apache Spark

    Shuffling and sorting are fundamental operations in Apache Spark, especially in distributed data processing. They play…

  • #25: Transformation and Action in Apache Spark

    #25: Transformation and Action in Apache Spark

    In Apache Spark, there are two types of operations that can be applied to RDDs (Resilient Distributed Datasets):…

  • #24: 10 Majorly Used Transformations in RDDs (Resilient Distributed Datasets)

    #24: 10 Majorly Used Transformations in RDDs (Resilient Distributed Datasets)

    Certainly! Here are 10 majorly used transformations in RDDs (Resilient Distributed Datasets) in Apache Spark:…

社区洞察

其他会员也浏览了