Self-Learn Yourself Apache Spark in 21 Blogs – #8
Kumar Chinnakali
Reimagining contact center as a hands-on architect bridging users, clients, developers, and business executives in their context.
Self-Learn Yourself Apache Spark in 21 Blogs – #8
In this blog let us discuss on How to loading data, what is Lambdas, How to do Transforming Data and more on Transformations. And want to have quick read on the other blogs in this learning series.
Apache Spark can load from any input sources like HDFS, S3, Casandra, RDBMS, Parquet, Avro, and also in memory. Let’s see how we can use it in command line,
Memory Loading Methods
- parallelize
- makeRDD
- range
External Loading Methods
- TextFiles
- wholeTextFiles
- sequenceFile(“file:///Data/SampleSequenceFile”, classOf[Text], classOf[IntWrittable])
- objectFile
- hadoopFile
- newAPIHadoopFile
- hadoopRDD
Now lets’ discuss what is a Lambdas expression, which is already used in above few examples. And which is used in future examples too. The lambda expression also known as anonymous functions. Below is the Lambda expression,
rdd.flatMap(line => line.split(“ “))
Let us know discuss on how to convert the named method to lambda expression,
NamedMethod:
def addOne(item: Int) = {
item+1
}
Val intList = List(1,2)
For(item <- intList) yield {
addOne(item)
}
Lambda:
def addOne(item: Int) = {
item+1
}
Val intList = List(1,2)
intList.map(X => {
addOne(x)
})
Still it can fine-tuned like this,
Val intList = List(1,2)
intList.map(item => item+1)
One more note Scala can multiline lambdas via user brackets.
Now let’s discuss on how to do transformation to have meaning full information’s.