Spark code to create a random sample data

In this article you will learn how to create a random sample data by using spark.

import org.apache.spark.sql.functions._
import org.apache.spark.sql.{Dataset, Row, SparkSession}

def getRandomData(spark: SparkSession, fromRange:Long = 1, toRange:Long = 1000) = {
	
	val inputDF = spark.range(fromRange, toRange)

	val outputDF = inputDF
		.withColumn("name", concat(lit("name "), $"id"))
		.withColumn("age", when(col("id") < 101, col("id")).otherwise(($"id" % 100 )))
		.withColumn("salary", when(col("age") > 60, 500000f).when(col("age") > 40 and col("age") < 60, 350000f).otherwise(200000f))
		.withColumn("doj", 
			when(col("age") > 60, to_date(lit("2010-20-01"), "yyyy-dd-mm")).
			when(col("age") > 40 and col("age") < 60, to_date(lit("2017-20-01"), "yyyy-dd-mm")).
			otherwise(current_date()))

	outputDF
}

val df1 = getRandomData(spark)
df1.show(105)        

Link:

https://gist.github.com/rangareddy/bd006b1c91c2288aef051c2bc3e44151#file-spark_to_create_random_sample_data-md

要查看或添加评论,请登录

Ranga Reddy的更多文章

  • Apache Iceberg History & Spark Supportability Matrix

    Apache Iceberg History & Spark Supportability Matrix

    1. Introduction The Spark and Iceberg Supportability Matrix provides comprehensive information regarding the…

    2 条评论
  • Apache Spark Supportability Matrix

    Apache Spark Supportability Matrix

    1. Introduction: One of the most common challenges faced while developing Spark applications is determining the…

  • Spark History Server Docker Image

    Spark History Server Docker Image

    A Sample Docker image for Spark History Server to deploy and manage the Spark Event Logs locally. Step1: Pull the…

  • Shell Script to generate Random CSV data

    Shell Script to generate Random CSV data

    Source Code: https://gist.github.

    3 条评论
  • Spark Configuration Generator

    Spark Configuration Generator

    Hello Spark Enthusiast Are you looking for generating the Spark Configuration based on Resources (Hardware…

    3 条评论
  • Create your first Airflow DAG

    Create your first Airflow DAG

    Let's start creating a Hello World workflow, which does nothing other than sending "Hello World!" to the log. A DAG…

    22 条评论
  • Install Apache Airflow on Mac OS

    Install Apache Airflow on Mac OS

    Airflow is written in python, so python needs to be installed in the environment, and python must be greater than 2.7…

    17 条评论
  • Ranga's Spark Project Template Generator

    Ranga's Spark Project Template Generator

    Hi All, I have created open source spark project template generator application. By using this application you can…

    1 条评论

社区洞察

其他会员也浏览了