You have terabytes of customer behavioral data streaming from Google Analytics into BigQuery daily. Your customers’ information, such as their preferences, is hosted on a Cloud SQL for MySQL databa...

Question

You have terabytes of customer behavioral data streaming from Google Analytics into BigQuery daily. Your customers’ information, such as their preferences, is hosted on a Cloud SQL for MySQL database. Your CRM database is hosted on a Cloud SQL for PostgreSQL instance. The marketing team wants to use your customers’ information from the two databases and the customer behavioral data to create marketing campaigns for yearly active customers. You need to ensure that the marketing team can run the campaigns over 100 times a day on typical days and up to 300 during sales. At the same time you want to keep the load on the Cloud SQL databases to a minimum. What should you do?

Accepted Answer

A is not correct because using BigQuery federated queries will increase the load to the databases significantly as the queries for these queries involve full table scan.
B is not correct because using Trino on Dataproc might require a large cluster to query the data in BigQuery which is not going to be performant loading large amount of data from BigQuery alongside with the data from 2 databases, therefore query execution should be pushed down to BigQuery as much as possible. Does not meet the requirement of keeping load to Cloud SQL databases to a minimum.
C is correct because the required data that reside on the transactional databases are replicated in BigQuery which will minimize the load on the transactional databases as it is handled by change data capture and allow efficient queries for the campaigns queries.
D is not correct because using serverless Spark might require high amount of Data Compute Unit or Shuffle storage to query the data in BigQuery (using Storage Read API) which is not going to be performant and will be costly to maintain, therefore query execution should be pushed down to BigQuery as much as possible. Does not meet the requirement of keeping load to Cloud SQL databases to a minimum.

Ready to practice?