Preparation Strategy and Resources for Databricks Certified Developer for Apache Spark 3.0 (Python)

Preparation Strategy and Resources for Databricks Certified Developer for Apache Spark 3.0 (Python)

Hello Everyone,

This article focusses on the preparation strategies and resources to clear the Databricks Certified Developer for Apache Spark 3.0 exam (Python).

Note :- The emphasis of the article will be solely on PySpark (Python + Spark). However, the same preparation strategy can be used for appearing the exam in Scala.

Please refer to the following resource about the certification details and weightage to different sections in the exam.




1) First and foremost, an intermediate knowledge of SQL is expected from the folks as a prerequisite who are willing to take this exam. One should be comfortable in performing basic data transformations using SQL such as SELECT, WHERE, aggregate functions (GROUP BY, SUM, COUNT, MAX, MIN), JOINS, and UNIONS.


RESOURCES :-

1) You can refer this resource for SQL theory.

2) You can refer it for SQL hands-on (use virtual labs which is there inside the below resource)


Note :- You don't have to memorize the syntax of SQL, rather you need to understand how the above functions works in SQL. That's enough as a prerequisite. (Feel free to skip the first step if you already well versed with SQL.


2) Once you are comfortable with the above SQL functions, you are now ready to start PySpark (Optimized version of Python on Spark framework)

You can start with the following PySpark course :-


The course explains all Pyspark contents in-depth. I would request you to download the course materials and get extensive hands-on by practising the notebooks in the Databricks platform.

You can't clear the exam by just watching the videos. Hands - On is a MUST.

Please don't hesitate to re-watch the lectures in case you need to. There might be few videos which may not be grasped at once.


3) Once you get a good context and clarity of the syllabus from the above course, you can practice your can brush up the contents from the following course material, and additionally please practice the skills which you have learnt from the notebook which you can find in the attachment section of the following course :-

URL :- https://customer-academy.databricks.com/learn/course/63/apache-spark-programming-with-databricks


4) By now, you should be comfortable about the working of different PySpark APIs, and should be able to do data transformations using PySpark. Now its time to get a feel of the actual exam. Please refer the following URLs for practice tests.

a) Official Practice Test from Databricks

b) Practice Tests which simulate the real exam.

The aforementioned resources should suffice to have a solid understanding of PySpark APIs and Spark Internals. A serious preparation of 6-8 months should be good enough to clear the exam (if you are a beginner to SQL and PySpark).


Feel free to reach out to me on DM, in case you have more queries / need additional clarifications.


Best wishes for your exam :)



Yavnika Sharma

Data Engineer @ Fractal | SQL | Databricks | Azure Data Factory | PySpark | Python | 2xAzure Certified

1 å¹´

Informative. ??

Harsh Brahmbhatt

Senior Data Engineer at Tredence ||4X Azure || 3X Databricks || 5 ?? Hacker Rank SQL || ex-Accenture || ex-TCS

1 å¹´

Thanks for sharing

Saikat Dutta

Azure Data Engineer - Senior Specialist

1 å¹´

Keep writing and helping the community.

Pawan Kumar Chahar

Data Engineer | Python | PySpark | Pandas | DBT | ETL | BigQuery | Snowflake | SQL | GCP | AWS | Docker | GenAI | LLM

1 å¹´

Thanks for Sharing Ananya Nayak

Nagpritam Naik

Data Engineer @Tesla | Walmart, Volvo Group, Fractal.ai | AWS Certified; 3x Azure Certified | MIS @ UT Dallas | Big Data | Data & Software Engineering (Spark, Airflow, Kafka, Flask, Trino)

1 å¹´

Very nicely articulated Ananya Nayak! Congratulations ??

要查看或添加评论,请登录

Ananya Nayak的更多文章

社区洞察

其他会员也浏览了