Google Sample Question 7 of 27

You have an upstream process that writes data to Cloud Storage. This data is then read by an Apache Spark job that runs on Dataproc. These jobs are run in the us-central1 region, but the data could be stored anywhere in the United States. You need to have a recovery process in place in case of a catastrophic single region failure. You need an approach with a maximum of 15 minutes of data loss (RPO=15 mins). You want to ensure that there is minimal latency when reading the data. What should you do?

Source: Google Cloud OFFICIAL

Official sample question published by Google Cloud. WiseOwlLearns is not affiliated with Google LLC.

All explanations and Option Analyzer™ content are generated by WiseOwlLearns and are not endorsed by Google Cloud.

A First, create a Cloud Storage bucket in the US multi-region. Then run the Dataproc cluster in a zone in the us-central1 region, reading data from the US multi-region bucket. In case of a regional failure, redeploy the Dataproc cluster to the us-central2 region and continue reading from the same bucket.
B First, create a dual-region Cloud Storage bucket in the us-central1 and us-south1 regions. Enable turbo replication. Then run the Dataproc cluster in a zone in the us-central1 region, reading from the bucket in the same region. In case of a regional failure, redeploy the Dataproc clusters to the us-south1 region and read from the same bucket. ✓ Correct
C First, create two regional Cloud Storage buckets, one in the us-central1 region and one in the us-south1 region. Have the upstream process write data to the us-central1 bucket. Use the Storage Transfer Service to copy data hourly from the us-central1 bucket to the us-south1 bucket. Then run the Dataproc cluster in a zone in the us-central1 region, reading from the bucket in that region. In case of regional failure, redeploy your Dataproc clusters to the us-south1 region and read from the bucket in that region instead.
D First, create a dual-region Cloud Storage bucket in the us-central1 and us-south1 regions. Enable turbo replication. Then run the Dataproc cluster in a zone in the us-central1 region, reading from the bucket in the us-south1 region. In case of a regional failure, redeploy your Dataproc cluster to the us-south1 region and continue reading from the same bucket.
🦉 Explanation by WiseOwl Tutor™ — not endorsed by Google

A is incorrect because 1) multi-region buckets do not provide the lowest latency or sufficient bandwidth, and 2) multi-region buckets have an RPO of 1 hour (ie. data from the last hour could be lost). B is correct because dual-region buckets with turbo-replication have an RPO of 15 mins as required. Dataproc cluster gets redeployed to the available region and is colocated with the bucket. C is incorrect because STS only allows hourly transfers, which would not meet the 15 min RPO. D is incorrect because the primary Dataproc cluster runs in a different region to the bucket, increasing latency.

Ready to practice?

These 27 official sample questions are free to practice on WiseOwlLearns — no account required. Get real-time tutoring from WiseOwl Tutor™ and step-by-step elimination reasoning from Option Analyzer™.