Self-Learn Yourself Apache Spark in 21 Blogs – #8

Self-Learn Yourself Apache Spark in 21 Blogs – #8

Self-Learn Yourself Apache Spark in 21 Blogs – #8

In this blog let us discuss on How to loading data, what is Lambdas, How to do Transforming Data and more on Transformations. And want to have quick read on the other blogs in this learning series.

Apache Spark can load from any input sources like HDFS, S3, Casandra, RDBMS, Parquet, Avro, and also in memory. Let’s see how we can use it in command line,   

Memory Loading Methods

  1. parallelize
  2. makeRDD
  3. range

External Loading Methods

  1. TextFiles
  2. wholeTextFiles
  3. sequenceFile(“file:///Data/SampleSequenceFile”, classOf[Text], classOf[IntWrittable])
  4. objectFile
  5. hadoopFile
  6. newAPIHadoopFile
  7. hadoopRDD

Now lets’ discuss what is a Lambdas expression, which is already used in above few examples. And which is used in future examples too. The lambda expression also known as anonymous functions.  Below is the Lambda expression,

rdd.flatMap(line => line.split(“   “))

Let us know discuss on how to convert the named method to lambda expression,

NamedMethod:

def addOne(item: Int) = {

                item+1

}

Val intList = List(1,2)

For(item <- intList) yield {

addOne(item)

}

Lambda:

def addOne(item: Int) = {

                item+1

}

Val intList = List(1,2)

intList.map(X => {

addOne(x)

})

Still it can fine-tuned like this,

Val intList  = List(1,2)

intList.map(item => item+1)

One more note Scala can multiline lambdas via user brackets.

Now let’s discuss on how to do transformation to have meaning full information’s.

Keep Reading... 

要查看或添加评论,请登录

Kumar Chinnakali的更多文章

社区洞察

其他会员也浏览了