Big Data Cloudera Certifications: Everything you need to know

Big Data Cloudera Certifications: Everything you need to know

Objective

This is comprehensive guide about various Big Data cloudera certifications. In this cloudera certification tutorial we will discuss all the aspects like different certifications offered by cloudera, pattern of cloudera certification exam / test, number of questions passing score, time limits, required skills and weightage of each and every topic. We will discuss about all the certifications offered by cloudera like: “CCA Spark and Hadoop Developer Exam (CCA175)”, “Cloudera Certified Administrator for Apache Hadoop (CCAH)”, “CCP Data Scientist”, “CCP Data Engineer”.

1. CCA Spark and Hadoop Developer Exam (CCA175)

In CCA Spark and Hadoop Developer certification, you need to write code in Scala and Python and run it on the cluster to prove your skills. This exam can be taken from any computer at any time globally.

CCA175 is a hands-on, practical exam using Cloudera technologies. The users are given their own CDH5 (currently 5.3.2) cluster that is pre-loaded with Spark, Impala, Crunch, Hive, Pig, Sqoop, Kafka, Flume, Kite, Hue, Oozie, DataFu, and many other softwares that are needed by the users.

a. CCA Spark and Hadoop Developer Certification Exam (CCA175) Details:

  • Number of Questions: 10–12 performance-based (hands-on) tasks on CDH5 cluster
  • Time Limit: 120 minutes
  • Passing Score: 70%
  • Language: English, Japanese (forthcoming)
  • CCA Spark and Hadoop Developer certification Cost: USD $295

b. CCA175 Exam Question Format

In each CCA question, you would be required to solve a particular scenario. In some cases, a tool such as Impala or Hive may be used. In other cases, coding is required. In Spark problem, a template (in Scala or Python) is often provided that contains a skeleton of the solution, asking the candidate to fill in the missing lines with functional code.

c. Prerequisites

There are no prerequisites required to take any Cloudera certification exam.

d. Exam selection and related topics

I. Required Skills

Data Ingest: These are the skills required to transfer data between external systems and your cluster. It includes:

  • Using Sqoop to import data from a MySQL database into HDFS and Change the delimiter and file format of data
  • Using Sqoop to Export data to a MySQL database
  • Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume
  • Using Hadoop File System (FS) commands to load data into and out of HDFS

II. Transform, Stage, Store:

It converts a set of data values in a given format stored in HDFS into new data values and/or a new data format and write them into HDFS. This includes writing Spark applications in Scala / Python for the below tasks:

  • Load data from HDFS and store results back to HDFS
  • Join disparate datasets together
  • Calculate aggregate statistics (e.g., average or sum)
  • Filter data into a smaller dataset
  • Write a query that produces ranked or sorted data

III. Data Analysis

Data Definition Language (DDL) to create tables in the Hive metastore, use by Hive and Impala.

  • Read and/or create a table in the Hive metastore in a given schema
  • Avro schema extraction from a set of data-files
  • Hive metastore table creation using the Avro file format and an external schema file
  • Improve query performance by creating partitioned tables in the Hive meta-store
  • Evolve an Avro schema by changing JSON files

 Read the complete article>>



Harsh Mishra

Lead Tech | Generative AI | Java, Python and Kafka | GCP, AWS Certified Cloud Technology

8 年

could you tell me what would be configuration of laptop require for smooth certification experience. I have 4GB of RAM . and for training I am using cluster on cloud

回复

要查看或添加评论,请登录

Malini Shukla的更多文章

社区洞察

其他会员也浏览了