GCP Machine Learning Engineer Certification Preparation Guide

Professional Machine Learning Engineer Certification

I recently passed Google Professional Machine Learning Engineer Certification, during the preparation I went throught lot resources about the exam. The exam is relatively eaiser than the Data engineer certification exam as the questions are more direct (almost no ambigous question) but it has 60 questions instead of the typical 50. It focuses on the following areas:

I could not find a comprehensive resource that covers all aspect of the exam when I started preparing. I had to go over a lot of Google Cloud products page and general Machine Learning resources and at no point I felt ready as both topics are huge. Here I will try to provide a summary of the resources I did found helpful for passing the exam.

Machine Learning

Big part of the exam are general ML questions that touches concept not specific to Google. This is a huge topic by itself but it should be enough for the exam to go over most of the materials in Google ML Crash Course

Also you should get familliar with Privacy in Machine Learning - link

Metrics

You need to know what are the metrics you can use and for what kind of ML problem they can be applied to. For instance, for a Classification problem you can use:

Regularization techniques

Neural Networks

Some common issues with Neural Networks training and how to address them:

To summaries:

AI Explanations

Explainable AI is another topic to know about and the different techniques available to explain a model.

Also, an important tool to know about is WhatIf Tool — when do you use it? How do you use it? How do you discover different outcomes? How do you conduct experiments?

Tensorflow

You need to know basic model architectures, layers (e.g. dense, dropout, convolutional, pooling) and which one define training parameters. Also, knowing the Keras API is important.

Accelerators

You need to know the differences between CPUs, TPUs and GPUs and when to use each one. The general answer is that GPU training is faster than CPU training, and GPU usually doesn’t require any additional setup. TPUs are faster than GPUs but they don’t support custom operations.

You may also want learn about basic troubleshooting - link

Distributed training

You need to know the differences between the different Distributed training strategies in Tensorflow link.

Strategy Synchronous / Asynchronous Number of nodes Number of GPUs/TPUs per node How model parameters are stored
MirroredStrategy Synchronous one many On each GPU
TPUStrategy Synchronous one many On each TPU
MultiWorkerMirroredStrategy Synchronous many many On each GPU on each node
ParameterServerStrategy Asynchronous many one On the Parameter Server
CentralStorageStrategy Synchronous one many On CPU, could be placed on GPU if there is only one
Default Strategy no distribution one one on any GPU picked by TensorFlow
OneDeviceStrategy no distribution one one on the specified GPU

Make also sure to know the components of a distributed training architecture: master, worker, parameter server, evaluator, and how many of each you can get.

MLOps

TFX

You have to know TFX (TensorFlow Extended) and its limitations (can be used to build pipelines for Tensoflow models only), what are its standard components (e.g. ingestion, validation, transform) and how to build a pipeline out of them.

Kubeflow

You need to know Kubeflow and that you should use if your modeling framework is not TensorFlow (i.e. when you need PyTorch, XGBoost) or if you want to dockerize every step of the flow - link

CI/CD

Here is a flow chart to help with deciding what Google ML product to use depending on the situation:

gcp-ml-decision-flow

BigQuery ML

BigQuery is a managed data warehouse service, it also has ML capabilities. So if you see a question where the data is in BigQuery and the output will also be there then a natural answer is to use BigQuery ML for modeling.

AI Platform

You need to know AI Platform, built-in algorithms, hyperparameter tuning, and distributed training and what container images to use based on your modeling framework (e.g. tensorflow, pytorch, xgboost, sklearn). The following resources covers most of what you need to know for the exam:

Natural Language

AutoML API

Train your own high-quality machine learning custom models to classify, extract, and detect sentiment with minimum effort and machine learning expertise using Vertex AI for natural language, powered by AutoML. You can use the AutoML UI to upload your training data and test your custom model without a single line of code. - link

Natural Language API

The powerful pre-trained models of the Natural Language API empowers developers to easily apply natural language understanding (NLU) to their applications with features including sentiment analysis, entity analysis, entity sentiment analysis, content classification, and syntax analysis. - link

Healthcare Natural Language AI

Gain real-time analysis of insights stored in unstructured medical text. Healthcare Natural Language API allows you to distill machine-readable medical insights from medical documents, while AutoML Entity Extraction for Healthcare makes it simple to build custom knowledge extraction models for healthcare and life sciences apps—no coding skills required. - link

Translation

Cloud Translation API helps: Translating text, Discovering supported languages, Detecting language of Text, Creating and using glossaries when translating.

Vision AI

Create a dataset of images, train a custom AutoML for Cloud or Edge, then deploy it. If Edge is target you can then export the model in TF Lite, TF.js, CoreML, or Coral Edge TPU.

Video AI

Other products

Certification SWAG

After passing the exam, you can choose one of the official certification swags:

ml-engineer-certification-swags