GCP Data Engineer Certification Preparation Guide

Professional Data Engineer Certification

I recently passed Google Professional Data Engineer Certification, during the preparation I went throught lot resources about Google Cloud. I also read this book but as Google update its services very often lot of the information in the book become out dated. The book is still a good read if you have little knowledge on Google Cloud services but make sure to also read the official documentation.

In this article, I compile the different resources I found most usefull/accurate during the preparation for the exam which can be useful to someone preparing for the exam. The exam itself is not very tough, although most of the questions are ambiguous and hence you need to be well prepared. You can learn more about the certification in the official page - link.

Storage

Data Engineer role is all about data, hence the focus on storage technologies Google Cloud provides.

BigQuery

BigQuery takes big chunk of the exam, lot questions are around how to design you table, optimize performance, migrated data into bigquery and how to use other Google cloud resources along with BigQuery.

Overview

Pseudo columns

Security

Data Transfer

BigQuery ML

Cloud Spanner

Cloud Bigtable

Cloud Datastore

Pub/Sub

Data migrations

Processing

Data processing also takes big part of the exam, good knowledge of Dataflow/Beam operators may be required, and less for Dataproc/Hadoop/Spark.

Cloud Dataflow

Cloud Dataproc

Machine Learning

There was few ML questions, but AI platform (now called vertext) may save you from surprises.

ML Concepts

Cloud AutoML

Cloud Data Loss Prevention (DLP)

Kubeflow

Edge TPU

Cloud services

The remaining of the exam can touch more or less the following services, you need to at least read the overview of each one of the following services:

Cloud Composer

Data Catalog

Cloud Dataprep

Data Studio

IAM

Other topics