登录查看更多内容

The SparkSession

Manoj Chandrashekar

UAB’24?? | Former Lead Data Engineer @7-Eleven | I Torture The Data, To Confess To Anything

发布日期: 2022年3月24日

You control your Spark Application through a driver process called the SparkSession. The SparkSession instance is the way Spark executes user-defined manipulations across the cluster. There is a one-to-one correspondence between a SparkSession and a Spark Application. In Scala and Python, the variable is available as spark when you start the console. Let’s go ahead and look at the SparkSession in both Scala and/or Python:

spark

In Scala, you should see something like the following:res0:

org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@...

Python you’ll see something like this:

<pyspark.sql.session.SparkSession at 0x7efda4c1ccd0>

Let’s now perform the simple task of creating a range of numbers. This range of numbers is just like a named column in a spreadsheet:

// in Scala
val myRange = spark.range(1000).toDF("number")

# in Python
myRange = spark.range(1000).toDF("number")

We created a DataFrame with one column containing 1,000 rows with values from 0 to 999. This range of numbers represents a distributed collection. When run on a cluster, each part of this range of numbers exists on a different executor. This is a Spark DataFrame.

要查看或添加评论，请登录

Manoj Chandrashekar的更多文章

?????????????? ???????????????????? ????????????????????????

2022年4月12日

?????????????? ???????????????????? ????????????????????????

Spark makes it easy to develop and create big data programs. Spark also makes it easy to turn your interactive…

1 条评论
End to End Pyspark Example

2022年4月6日

End to End Pyspark Example

We’ll use Spark to analyze some flight data from the United States Bureau of Transportation statistics. Inside the CSV…

The SparkSession

Manoj Chandrashekar

UAB’24?? | Former Lead Data Engineer @7-Eleven | I Torture The Data, To Confess To Anything

Manoj Chandrashekar的更多文章

社区洞察

其他会员也浏览了

How Cython Combines the Power of C++ and Python for High-Performance Mathematical Calculations (Part #2)

Points to ponder - Python - spaces vs tabs - unexpected issue

Data Visualization with Python and Bokeh. 2

DEFTeam Solutions Pvt Ltd: SAS to Python Conversion

Python for data engineering

Twelve @dataclass Examples for Better Python?Code

NUMPY

[#MLDigest] All about POP

Decision Trees in Machine Learning

Important Functions in Numpy Library

Manoj Chandrashekar的更多文章

?????????????? ???????????????????? ????????????????????????

End to End Pyspark Example

社区洞察

其他会员也浏览了

How Cython Combines the Power of C++ and Python for High-Performance Mathematical Calculations (Part #2)

Points to ponder - Python - spaces vs tabs - unexpected issue

Data Visualization with Python and Bokeh. 2

DEFTeam Solutions Pvt Ltd: SAS to Python Conversion

Python for data engineering

Twelve @dataclass Examples for Better Python?Code

NUMPY

[#MLDigest] All about POP

Decision Trees in Machine Learning

Important Functions in Numpy Library