Big Data Cloudera Certifications: Everything you need to know
Malini Shukla
Senior Data Scientist || Hiring || 6M+ impressions || Trainer || Top Data Scientist || Speaker || Top content creator on LinkedIn || Tech Evangelist
Objective
This is comprehensive guide about various Big Data cloudera certifications. In this cloudera certification tutorial we will discuss all the aspects like different certifications offered by cloudera, pattern of cloudera certification exam / test, number of questions passing score, time limits, required skills and weightage of each and every topic. We will discuss about all the certifications offered by cloudera like: “CCA Spark and Hadoop Developer Exam (CCA175)”, “Cloudera Certified Administrator for Apache Hadoop (CCAH)”, “CCP Data Scientist”, “CCP Data Engineer”.
1. CCA Spark and Hadoop Developer Exam (CCA175)
In CCA Spark and Hadoop Developer certification, you need to write code in Scala and Python and run it on the cluster to prove your skills. This exam can be taken from any computer at any time globally.
CCA175 is a hands-on, practical exam using Cloudera technologies. The users are given their own CDH5 (currently 5.3.2) cluster that is pre-loaded with Spark, Impala, Crunch, Hive, Pig, Sqoop, Kafka, Flume, Kite, Hue, Oozie, DataFu, and many other softwares that are needed by the users.
a. CCA Spark and Hadoop Developer Certification Exam (CCA175) Details:
- Number of Questions: 10–12 performance-based (hands-on) tasks on CDH5 cluster
- Time Limit: 120 minutes
- Passing Score: 70%
- Language: English, Japanese (forthcoming)
- CCA Spark and Hadoop Developer certification Cost: USD $295
b. CCA175 Exam Question Format
In each CCA question, you would be required to solve a particular scenario. In some cases, a tool such as Impala or Hive may be used. In other cases, coding is required. In Spark problem, a template (in Scala or Python) is often provided that contains a skeleton of the solution, asking the candidate to fill in the missing lines with functional code.
c. Prerequisites
There are no prerequisites required to take any Cloudera certification exam.
d. Exam selection and related topics
I. Required Skills
Data Ingest: These are the skills required to transfer data between external systems and your cluster. It includes:
- Using Sqoop to import data from a MySQL database into HDFS and Change the delimiter and file format of data
- Using Sqoop to Export data to a MySQL database
- Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume
- Using Hadoop File System (FS) commands to load data into and out of HDFS
II. Transform, Stage, Store:
It converts a set of data values in a given format stored in HDFS into new data values and/or a new data format and write them into HDFS. This includes writing Spark applications in Scala / Python for the below tasks:
- Load data from HDFS and store results back to HDFS
- Join disparate datasets together
- Calculate aggregate statistics (e.g., average or sum)
- Filter data into a smaller dataset
- Write a query that produces ranked or sorted data
III. Data Analysis
Data Definition Language (DDL) to create tables in the Hive metastore, use by Hive and Impala.
- Read and/or create a table in the Hive metastore in a given schema
- Avro schema extraction from a set of data-files
- Hive metastore table creation using the Avro file format and an external schema file
- Improve query performance by creating partitioned tables in the Hive meta-store
- Evolve an Avro schema by changing JSON files
Lead Tech | Generative AI | Java, Python and Kafka | GCP, AWS Certified Cloud Technology
8 年could you tell me what would be configuration of laptop require for smooth certification experience. I have 4GB of RAM . and for training I am using cluster on cloud