Google Sample Question 20 of 27

You have terabytes of customer behavioral data streaming from Google Analytics into BigQuery daily. Your customers’ information, such as their preferences, is hosted on a Cloud SQL for MySQL database. Your CRM database is hosted on a Cloud SQL for PostgreSQL instance. The marketing team wants to use your customers’ information from the two databases and the customer behavioral data to create marketing campaigns for yearly active customers. You need to ensure that the marketing team can run the campaigns over 100 times a day on typical days and up to 300 during sales. At the same time you want to keep the load on the Cloud SQL databases to a minimum. What should you do?

Source: Google Cloud OFFICIAL

Official sample question published by Google Cloud. WiseOwlLearns is not affiliated with Google LLC.

All explanations and Option Analyzer™ content are generated by WiseOwlLearns and are not endorsed by Google Cloud.

A Create BigQuery connections to both Cloud SQL databases. Use BigQuery federated queries on the two databases and the Google Analytics data on BigQuery to run these queries.
B Create a Dataproc cluster with Trino to establish connections to both Cloud SQL databases and BigQuery, to execute the queries.
C Create streams in Datastream to replicate the required tables from both Cloud SQL databases to BigQuery for these queries. ✓ Correct
D Create a job on Apache Spark with Dataproc Serverless to query both Cloud SQL databases and the Google Analytics data on BigQuery for these queries.
🦉 Explanation by WiseOwl Tutor™ — not endorsed by Google

A is not correct because using BigQuery federated queries will increase the load to the databases significantly as the queries for these queries involve full table scan. B is not correct because using Trino on Dataproc might require a large cluster to query the data in BigQuery which is not going to be performant loading large amount of data from BigQuery alongside with the data from 2 databases, therefore query execution should be pushed down to BigQuery as much as possible. Does not meet the requirement of keeping load to Cloud SQL databases to a minimum. C is correct because the required data that reside on the transactional databases are replicated in BigQuery which will minimize the load on the transactional databases as it is handled by change data capture and allow efficient queries for the campaigns queries. D is not correct because using serverless Spark might require high amount of Data Compute Unit or Shuffle storage to query the data in BigQuery (using Storage Read API) which is not going to be performant and will be costly to maintain, therefore query execution should be pushed down to BigQuery as much as possible. Does not meet the requirement of keeping load to Cloud SQL databases to a minimum.

Ready to practice?

These 27 official sample questions are free to practice on WiseOwlLearns — no account required. Get real-time tutoring from WiseOwl Tutor™ and step-by-step elimination reasoning from Option Analyzer™.