Google Cloud Professional Machine Learning Engineer Certification: Post Exam Impressions

Carlos Timoteo

Solutions Architect, ML Engineer & Data Scientist in Applied AI Engineering

发布日期: 2020年8月20日

Note: This article is a feedback on top of the Exam Guide written by Dmitri Lerko and his comrade Steven MacManus.

The exam has a huge emphasis on engineering ML solutions. Most of the questions are on the engineering side. The Data Science portion of it is more focused on the technique, than in the algorithms details, implementation and limitation. For that reason, you will not find questions asking you to propose a very complex model architecture or details on how to improve specific DNN architectures. I would measure 60% of the questions are devoted to Engineering, Architecture, optimizations and devops. Knowing all the offerings in detail for AI on GCP is a must. In addition, I recommend you to know Big Data Engineering solutions on GCP.

Python and SQL are the default languages that you may find source codes. TensorFlow 2.0 is the framework that you need to be good at to answer some questions. Also focus on the TensorFlow ecosystem and how to connect TF to GCP solutions and how to use it in production.

General Insights after taking the exam

All the free material provided is very important. When it comes to problem framing and defining business metrics, it is very important to understand that monitoring and evaluating ML solutions is production/real-world, you will always assess/monitor using a measurable business metric or KPI. When it comes to training and model evaluation before deploying the model in production, you are gonna use the traditional metrics and losses function depending on the regression or classification problem in hand.

In regards to the optimization task, you have to understand how SGD works and the relationship between batch_size and learning rate to maximize the performance of the learning algorithm.

You also have to understand that although TF.Estimator was the first high-level api implemented by TF team, beginning with TF 2.0, Keras API is the best api for multiple situations, from converting low-level TF code to high-level code and to adapting local on-prem custom model code to distributed training on the cloud.

You need to understand the difference between overfitting and underfitting, and to know ways to prevent both of them from happening.

In regards to splitting the data into training and testing dataset, make sure you know how to split data for different scenarios. For example, for general regression and classification problems, you should randomly split in 60/40 or 80/20 proportion. But on time series use cases that ingest sequences of data, you cannot randomly split. Instead, you have to split oriented by datetime to avoid data leakage.

You should also understand what are the main causes for different situations involving performances on training, testing and evaluation (continuous evaluation) datasets.

Higher performance on training compared to testing.
Lower performance on training compared to testing.
Lower performance on evaluation compared to training/testing.

Never train on test data. If you are seeing surprisingly good results on your evaluation metrics, it might be a sign that you are accidentally training on the test set. For example, high accuracy might indicate that test data has leaked into the training set.

Understanding that cross-validation prevents overfitting. For that you will need training, validation and testing sets. Avoid overfitting promotes model generalization to unseen data. There are ML models that work better after cross-validation, for example tree based models.

In regards to feature engineering, you need to know what are good features. You need to understand that highly correlated features are not, instead they must be highly correlated to the target variable.

You also need to understand that features transformations must be the same for training and inference/serving purposes. You need to understand how you can guarantee that.

You need to know what to do with categorical variables and numerical variables, knowing that they are good predictors (highly correlated to the target variable) or not. What to do with missing values, with some or few missing values. What to do with data in different magnitudes? Normalize! What to do with data that shows tendency. What to handle outliers.

Talking about feature cross, understand why people use feature cross, why is it important to have nonlinearity in a model and other methods to accomplish nonlinearity, like using better activation functions. To know the traditional example of feature cross, on the house pricing dataset: binned "latitude" X binned "longitude" X binned "roomsperperson".

In terms of categorical/ textual values, you need to basically know how to manipulate the data using sparse or dense representations, using vocabularies or not. What is tokenization, what is word2vec, bag of words, one-hot vectors.

You should also understand why using regularization and what the final result of L1 and L2 regularization. L1 is responsible for zeroing weights, which is the same thing as not using that input. L2 is responsible for reducing the weight, it makes them close to zero and average to zero. It weighs close to zero and has little effect on model complexity, while outlier weights can have a huge impact. https://developers.google.com/machine-learning/crash-course/regularization-for-simplicity/lambda

Observe that we can use early-stopping on continuous learning and to prevent overfitting, together with regularization.

Understand what is and how to deal with vanishing gradient and gradients explosion. For the first, use ReLU activation functions, use residual connections and use Batch normalization. For gradients explosion, normalize the data, reduce batch size, and use batch normalization, or even change the optimizer or tweak it. Understand that it is an intricate problem with DNN complexity and gradient calculation (derivatives) using some activation functions.

You need to know when you're gonna use logistic regression to calculate probabilities instead of values. What are the DNN architecture tweaks to output the probabilities instead of values? You need to know that there are benefits promoted by regularization and early-stopping, also knowing that there are better activation functions like sigmoid and loss functions like Log Loss.

Moving forward, you need to understand the different objectives with classification. Classify inputs to only one class (higher wins all), inputs to more than one class (prob ranking) and binary classification. Which would be the loss functions to be used in each case?

You also need to understand that classification models in real-world can perform differently from the training setup, you control precision, accuracy, ROC, … using the decision threshold, which is something that is manually tuned when serving predictions.

You need to know the tradeoff between TP, TN, FP and FN for multiple use cases. What is the damage of giving less attention to one outcome than the other. What are the best loss metrics to use, in general? Accuracy.

Accuracy alone doesn't tell the full story when you're working with a class-imbalanced data set, like this one, where there is a significant disparity between the number of positive and negative labels. To fully evaluate the effectiveness of a model, you must examine both precision and recall. Unfortunately, precision and recall are often in tension. That is, improving precision typically reduces recall and vice versa. When using recall, you want to decrease FN to maximize recall. When using precision, you want to decrease FP to maximize precision. Last, you need to understand the benefit of using AUC as an evaluation metric.

Understand that with imbalance data, you may have prediction bias. For example, let's say we know that on average, 1% of all emails are spam. If we don't know anything at all about a given email, we should predict that it's 1% likely to be spam. Similarly, a good spam model should predict on average that emails are 1% likely to be spam. (In other words, if we average the predicted likelihoods of each individual email being spam, the result should be 1%.) If instead, the model's average prediction is 20% likelihood of being spam, we can conclude that it exhibits prediction bias.

You should know that there is another problem, Dead ReLU units. Why it happens and how to prevent it. Look at ReLu based loss functions. There are many regularization methods, one used sometimes is dropout regularization. It is useful for neural networks. It works by randomly "dropping out" unit activations in a network for a single gradient step.

You need to know the motivation for collaborative filtering instead of using any other regression method that does not take into account past experiences and embeddings. You also need to know embeddings, how they work and why they’re useful.

You need to know the difference between online and batch prediction and when to use each.

In regards to fairness, you need to know the kinds of bias and how to prevent them. How can you identify bias? Which components of the training pipeline helps you to identify data bias? I would answer data validation and model validation. Here is an example of how to evaluate biases for a trained model. https://developers.google.com/machine-learning/crash-course/fairness/evaluating-for-bias

How can you evaluate bias for predictions? How can you explain each prediction value, according to its features?

In regards to problem framing, you also need to understand what you can do with the available data and the business question? Can you do a regression or classification? Only, if you have variables that will work as labels. If not, what can you do? Clustering, segmentation. Also, understand that some business questions don’t need a ML solution. As a complement, I would also consider looking at hard problems like determining causation, detecting anomaly and clustering.

When framing a problem, decide on a good metric or use proxies. Also, you need your outputs to be actionable. In summary, https://developers.google.com/machine-learning/problem-framing/formulate.

You need to know what to do with features that have PII. When they are good predictors or not. And how to use DLP to deal with PII.

You need to know techniques to deal with imbalance data like boosting and downsampling and upweight.

You need to know good randomization techniques, mostly in conjunction with BigQuery. For example, avoid RAND().

You might have two different features with widely different ranges (e.g., age and income), causing the gradient descent to "bounce" and slow down convergence. Optimizers like Adagrad and Adam protect against this problem by creating a separate effective learning rate per feature. Also depending on the task, you have different ways to prepare features. For Clustering, check the following link: https://developers.google.com/machine-learning/clustering/prepare-data

Talking about recommendation systems, you need to understand how the solution works and also the three major candidates, content-based filtering, collaborative filtering and DNN with softmax layer as a last layer and ranking probabilities. https://developers.google.com/machine-learning/recommendation/overview/candidate-generation

Also, you need to understand the difference between parameters, hyperparameters and meta-parameters. https://developers.google.com/machine-learning/testing-debugging/common/model-errors

You need to understand how to interpret loss curves, https://developers.google.com/machine-learning/testing-debugging/metrics/interpretic

You need to know how to test the solution in production: https://developers.google.com/machine-learning/testing-debugging/pipeline/production

Learning resources used

Certification Guide

I've prepared for this exam following Dmitri blog post, you can check it here https://deploy.live/blog/google-cloud-professional-machine-learning-engineer-certification-preparation-guide/

In the next sections, I write my feedback on very specific points described by Dmitri in his blog post. Therefore, don't expect that I will repeat Dmitri's blog post content, instead, I append extra information and the number of questions I found for some of the topics.

Appending my Feedback on top of Dmitri's blog post

Courses

I think these are the most important courses you need to take offline so you can learn more about how a ML Engineer uses GCP. But they are not enough. You need to know a lot of TensorFlow and new solutions for AI and Data Engineering like Data Fusion, Data Catalog, AI Platform Evaluation, KubeFlow, DLP.

Tutorials

I had one question on TFX, indirectly you see that they wanted you to answer that it is best to use TFX, although there were also other valid answers.

Google Cloud Solutions (Free)

Architecture for MLOps using TFX, Kubeflow Pipelines, and Cloud Build
Best practices for performance and cost optimization for machine learning
Building production-ready data pipelines using Dataflow: Overview
Minimizing real-time prediction serving latency in machine learning

Learning these solutions are very very important, there is no online training material that gives you the insight on which components to use. I had about 4 or 5 questions asking which components to use in a specific architecture.

You also need to understand the difference between serverless architecture, managed services architecture, API based architecture, a Cloud Native/Kubernetes based architecture and a SQL based architecture by using BigQuery end to end. In terms of costs, performance, scalability and limitations.

Relevant Open Source tools

what-if-tool
katib
Kubeflow
TensorFlow Ecosystem including TF Profiler.

Google Cloud Professional Machine Learning Engineer Certification Preparation Guide

Section 1: ML Problem Framing

1.1 Translate business challenge into ML use case. Considerations include:

Defining business problems
identifying Good Problems for ML
Cast as ML problem - In basic terms, ML is the process of training a piece of software, called a model, to make useful predictions using a data set. This predictive model can then serve up predictions about previously unseen data. We use these predictions to take action in a product; for example, the system predicts that a user will like a certain video, so the system recommends that video to the user.
Does business problem satisfy above criteria?
What is being predicted? What is being classified?
What data do is needed?
ML problem in question of software
What is the API for the problem during prediction?
Who will use this service? How are they doing it today?
Data problem
What data are we analyzing?
What data are we predicting?
What data are we reaching to?
Identifying non ML solutions
Don’t be afraid to launch a product without machine learning
Machine learning is cool, but it requires data. Theoretically, you can take data from a different problem and then tweak the model for a new product, but this will likely underperform basic heuristics. If you think that machine learning will give you a 100% boost, then a heuristic will get you 50% of the way there. For instance, if you are ranking apps in an app marketplace, you could use the install rate or number of installs as heuristics. If you are detecting spam, filter out publishers that have sent spam before. Don’t be afraid to use human editing either. If you need to rank contacts, rank the most recently used highest (or even rank alphabetically). If machine learning is not absolutely required for your product, don’t use it until you have data.

I had one question where I answered with a solution that doesn’t use ML, because there was no need of it.

1.2 Define ML problem. Considerations include:

Defining problem type (classification, regression, clustering, etc.)
Classification - Pick one of N labels - Cat, dog, horse, or bear
Regression - Predict numerical values - Click-through rate
Clustering - Group similar examples - Most relevant documents (unsupervised)
Association Rule Learning - Infer likely association patterns in data - If you buy hamburger buns, you’re likely to buy hamburgers (unsupervised)
Structured outputs - Create complex output - Natural language parse trees, image recognition bounding boxes
Ranking - Identify position on a scale or status - Search result ranking
Defining outcome of model predictions
Problem Framing
Don’t overthink which objective you choose to directly optimize
Choose a simple, observable and attributable metric for your first objective.
Good idea to set accuracy benchmark before ever creating the model, then start with the simplest solution as a baseline.

I had a couple questions, asking me to define the best metric to perform how effective or useful the ML solution is. I’ve chosen always one with direct business impact.

Defining the input (features) and predicted output format
Which features are actually important? Which features seek to only add noise? Are there any Linear dependencies between features? What is the fewest number of features required for good performance? What is the maximum number of features we are willing to use?
What is the target audience/platform for the output? What format do they expect?
Is the output data streamed? Published at set intervals? Published adhoc?

I had a question where the input was streamed, you need to aggregate a variable in the last two weeks, and the output doesn’t need to be streamed. So the solution uses dataflow streaming mode, with windowing, and calls the model from an online endpoint hosted on AI Platform model and saves predictions to BQ. That was the most cost efficient solution.

1.4 Identify risks to feasibility and implementation of ML solution. Considerations include:

Aligning with Google AI principles and practices (e.g. different biases)
Reporting bias - occurs when the frequency of events, properties, and/or outcomes captured in a data set does not accurately reflect their real-world frequency. This bias can arise because people tend to focus on documenting circumstances that are unusual or especially memorable, assuming that the ordinary can “go without saying.”
Automation bias - is a tendency to favour results generated by automated systems over those generated by non-automated systems, irrespective of the error rates of each.
Selection bias - occurs if a data set’s examples are chosen in a way that is not reflective of their real-world distribution. Selection bias can take many different forms:
Coverage bias: Data is not selected in a representative fashion.
Non-response bias (participation bias): Data ends up being unrepresentative due to participation gaps in the data-collection process.
Sampling bias: Proper randomization is not used during data collection.

I had one question on how to prevent selection bias.

Group attribution bias is a tendency to generalize what is true of individuals to an entire group to which they belong. Two key manifestations of this bias are:
In-group bias: A preference for members of a group to which you also belong, or for characteristics that you also share.
Out-group homogeneity bias: A tendency to stereotype individual members of a group to which you do not belong, or to see their characteristics as more uniform.
Implicit bias occurs when assumptions are made based on one’s own mental models and personal experiences that do not necessarily apply more generally.

1.5 Design of experiments:

Defining experiment setup to experiment a ML solution for the first time.

Consider a basic heuristic vs. ML solution for a random chosen subset of users under the same conditions (same geographic region) to minimize uncertainty.
Split into control and alternative groups and evaluate both options.

Defining experiment to deploy new version of models in production.

Start with canary, check requisites. 10% traffic.
Promote to A/B and evaluate performance metrics. 50/50% traffic.
Direct 100% of traffic to the winner

Defining experiment to improve user experience.

Collect more data from user interactions and have a better success metric.
Segment users to understand preferences depending on how mature with the solution they are
Divide into groups, run the experiments and draw conclusions to understand causal impact.

Section 2: ML Solution Architecture

2.1 Design reliable, scalable, highly available ML solutions. Considerations include:

Optimizing data use and storage
There are ways to optimise data for faster ingestion, cheaper storage. Think of ways to avoid ingestion pipeline bottlenecks.
I had some questions on where it would be better to store the data, where it would be better to store the model, how it would be better to serve the model.
Data connections
Think of all the ways data can travel to a ML model
On prem sources
Proprietary formats
Events streaming from IoT
GCS
Cloud SQL
BigQuery
For all of the above, there are various ways to ingest the data, pre-process it and make it available for current or future training. This is a vast topic you should become familiar in.

I had many questions involving these technologies.

Automation of data preparation and model training/deployment
KubeFlow, TFX, Dataflow, PubSub, BigQuery and GCS are likely to be core components of this.
It is well worth knowing that GCS can send you events when you place new files into the bucket.

I had questions where they informed me that you would need many experiments, keeping tracking on things, hyperparameter tuning, working with multiple models, managing metadata and artifacts and you would be looking for a tool to do it: Kubeflow.

SDLC best practices = CI/CD automation
Source control changes
Reproducible builds by automation
Reproducible deployments by automation
Version models
Choosing best deployment strategy: A/B, canary deployment.

You need to be familiar with DevOps in the context of ML. Proposing solutions with less manual intervention. Also you need to know how to setup deployment experiments.

2.2 Choose appropriate Google Cloud software components. Considerations include:

A variety of component types - data collection; data management know what they do and why they exist. Know these products really well!
AI Hub - is an enterprise-grade hosted repository for discovering, sharing, and reusing artificial intelligence (AI) and ML assets. To store trained and validated models, plus their relevant metadata, you can use AI Hub as a model registry.
AI Platform Training not only supports TensorFlow, Scikit-learn, and XGboost models, but also supports models implemented in any framework using a user-provided custom container. In addition, a scalable, Bayesian optimization-based service for a hyperparameter tuning is available.
AI Platform Prediction - Trained models can be deployed to AI Platform Prediction as a microservice that has a REST API.
AI Platform Notebooks
AI Platform Pipelines
AI Platform Evaluation
Cloud SQL
Cloud Bigtable
Cloud Spanner
Cloud Data Fusion
DLP
BigQuery
PubSub
Dataproc
Cloud Dataflow - is a fully managed, serverless, and reliable service for running Apache Beam pipelines at scale on Google Cloud. Dataflow is used to scale the following processes: Computing the statistics to validate the incoming data. Performing data preparation and transformation. Evaluating the model on a large dataset. Computing metrics on different aspects of the evaluation dataset.
Cloud Storage - is a highly available and durable storage for binary large objects. Cloud Storage hosts artifacts produced throughout the execution of the ML pipeline, including the following: Data anomalies (if any). Transformed data and artifacts. Exported (trained) model. Model evaluation metrics.
Cloud Data Loss Prevention API

Exploration/analysis. Some of the tools available for the task:

KMS: Using AES with sugar or other encryption methods
Logging/management *
Using Cloud monitoring, KubeFlow metrics on experiments page or writing predictions on BigQuery and evaluating predictions.
Automation
Cloud Build
Container Registry
TensorFlow Extended (TFX)
KubeFlow
AI Platform
Monitoring
Cloud Monitoring (formerly Stackdriver)
Tensorboard
TF Profiler
KubeFlow
When to retrain or not, and why retrain.
BigQuery ML serving. Importing ML models or using prebuilt. Automatically feature engineering and hyperparameter using BQML

2.3 Choose appropriate Google Cloud hardware components. Considerations include:

Selection of quotas and compute/accelerators with components
When GPU is enough, when TPU is a demand, when working with large or small models, when to use distributed training or not?

2.4 Design architecture that complies with regulatory, security, cost and performance concerns.

Considerations include:

Building secure ML systems
Privacy implications of data usage
Considerations for Sensitive Data within Machine Learning Datasets
Identifying potential regulatory issues
How to deal with PII: DLP, removing features?
I had no questions on GDPR, but in case you have, you need to retrain the model from scratch, fine tuning isn’t enough.

Section 3: Data Preparation and Processing

3.1 Data ingestion. Considerations include:

DataFlow also reads from Kafka, so it is not a problem.

3.5 Feature engineering. Considerations include:

Class imbalance - The learning phase and the subsequent prediction of machine learning algorithms can be affected by the problem of an imbalanced data set. The balancing issue corresponds to the difference of the number of samples in the different classes. With a greater imbalanced ratio, the decision function favour the class with the larger number of samples, usually referred as the majority class.
Imbalanced data. You could then resolve it by using proper validation measures for the data such as Balanced Accuracy, Precision-Recall Curves or F1-score.
More on this: over-sampling, Imballanced Classes, 4 Tips for Advanced Feature Engineering and Preprocessing

I had some questions on class imbalance. I also had two or three questions on how to choose the best loss function for a classification problem.

Section 4: ML Model Development

4.2 Train a model. Considerations include:

Retraining/redeployment evaluation: When to retrain, when to deploy and how to rollback.

4.3 Test a model. Considerations include:

Unit tests for model training and serving. Test the infrastructure independently from the machine learning.
Model performance against baselines, simpler models, and across the time dimension
Try to quantify observed undesirable behaviour.
Model explainability on Cloud AI Platform. Explainability on training and serving phases. Explain images or structured data as inputs, in aggregation or case a case.
Continue Evaluation AI Platform. Ground-truth dataset labelling. Why, when update ground-truth. Which frequency to evaluate. How to submit an evaluation job.

Explainability and Continue Evaluation is very important, I had few or some questions on it.

Section 5: ML Pipeline Automation & Orchestration

5.5 Use CI/CD to test and deploy models. Considerations include:

Hooking modes into existing CI/CD deployment system
How to carry out CI/CD in Machine Learning (“MLOps”) using Kubeflow ML pipelines (#3)
Kubeflow (kfctl) GitHub Action for AI/ML CI/CD
MLOps: Continuous delivery and automation pipelines in machine learning
AB and Canary testing: Split traffic in production with small portion going to a new version of the model and verify that all metrics are as expected, gradually increase the traffic split or rollback.
Multi-armed Bandit Deployment: Dueling or collaborative. https://vwo.com/blog/multi-armed-bandit-algorithm/

Conclusion

In my opinion, the certification is a good one. It is pointing to the right direction and it proves to be useful to understand if the applicant has analytical capabilities of proposing a solution that satisfy many requirements to problems in several industries and in several stages of the project.

Would I recommend this certification? Yes, it doesn't prove that you're a good ML Engineer but it shows that you went through a analytical thinking and really understands how to put a solution together.

Swapnali Waman

Full Stack Developer at Elastiq || 5X Google Cloud Certified || Passionate Runner ??♀

2 年

Thanks for sharing!

Hrishikesh Deshpande

3 年

This is a very very useful post - thanks for sharing. Bookmarking it for my ML Certification journey

2 次回应

M K Sakeesh

IT consultant/Engineer

3 年

Congrats and Thank you for this post. "Lower performance on evaluation compared to training/testing" . if possible could you give more details on this. What I can think of is cross validation not done correctly.

Peter Ho

Senior Data Scientist @ Bank Negara Malaysia | PhD, Data Analysis

4 年

Thanks for the indepth write up

1 次回应

Joe H ☆

Dog Dad and full-time nerd.

4 年

Congrats! What did you think of it?

1 次回应

查看更多评论

要查看或添加评论，请登录

Carlos Timoteo的更多文章

Vertex AI - The Data Science Platform for Practitioners

2021年5月27日

Vertex AI - The Data Science Platform for Practitioners

What is Vertex AI? Vertex AI is a unified environment to accelerate experiments and deploy custom machine learning…

1 条评论
Become a certified Machine Learning Engineer now!

2020年11月13日

Become a certified Machine Learning Engineer now!

I'm Carlos Timoteo, I write from Ottawa, Canada. I've been a Data Scientist and Machine Learning Engineer at Pythian…

General Insights after taking the exam

Learning resources used

Certification Guide

Appending my Feedback on top of Dmitri's blog post

Courses

Tutorials

Google Cloud Solutions (Free)

Relevant Open Source tools

Google Cloud Professional Machine Learning Engineer Certification Preparation Guide

Section 1: ML Problem Framing

1.1 Translate business challenge into ML use case. Considerations include:

1.2 Define ML problem. Considerations include:

1.4 Identify risks to feasibility and implementation of ML solution. Considerations include:

1.5 Design of experiments:

Section 2: ML Solution Architecture

2.1 Design reliable, scalable, highly available ML solutions. Considerations include:

2.2 Choose appropriate Google Cloud software components. Considerations include:

2.3 Choose appropriate Google Cloud hardware components. Considerations include:

2.4 Design architecture that complies with regulatory, security, cost and performance concerns.

Section 3: Data Preparation and Processing

3.1 Data ingestion. Considerations include:

3.5 Feature engineering. Considerations include:

Section 4: ML Model Development

4.2 Train a model. Considerations include:

4.3 Test a model. Considerations include:

Section 5: ML Pipeline Automation & Orchestration

5.5 Use CI/CD to test and deploy models. Considerations include:

Conclusion

Carlos Timoteo的更多文章

Vertex AI - The Data Science Platform for Practitioners

Become a certified Machine Learning Engineer now!

社区洞察

其他会员也浏览了

Everything About Azure ML Service- A Must Knowledge - NareshIT

Expert Insider's Guide to Becoming a Google Cloud Machine Learning Engineer

How I passed my AWS ML Specialty Exam

Model Deployment Techniques for Machine Learning Models

AWS update of Week 30 (24Jul - 30Jul)

Unlocking the Power of Generative AI with AWS Services

How to pass AWS Machine Learning Engineer Associate Certification?

Deploying AI Models in Amazon SageMaker: An In-Depth Guide

Intelligent Document Processing comparing AWS GenIA and ML Services (Part II)

Building Generative AI Applications on AWS: A Step-by-Step Guide