You need to modernize your existing on-premises data strategy. Your organization currently uses: - Apache Hadoop clusters for processing multiple large data sets, including on-premises Hadoop Distributed File System (HDFS) for data replication. - Apache Airflow to orchestrate hundreds of ETL pipelines with thousands of job steps. You need to set up a new architecture in Google Cloud that can handle your Hadoop workloads and requires minimal changes to your existing orchestration processes. What should you do?
A is not correct because in this scenario we are not looking to convert ETL pipelines, but orchestrate them within the Cloud. Dataflow is not an orchestration tool. B is correct because we would want to convert this company’s Hadoop workloads to Dataproc, and utilize Cloud Storage for their HDFS data. Additionally, Composer would be a great fit for moving their Airflow orchestration on to the cloud. C is not correct because while Bigtable is a great choice for large analytical and operational workloads, and would be able to hold this organization’s many large datasets, Bigtable is more for an HBase migration, i.e. NOSQL database migration. Therefore this answer wouldn’t solve for migrating Hadoop clusters to the cloud. Bigtable could easily be part of the solution, but there was no indication that the organization was interested in a database move. D is not correct because although this solution does accurately choose Dataproc and Cloud Storage, Cloud Data Fusion would not be the right choice for orchestration, as it is not an orchestration tool and the scenario did not have any requirements to translate their ETL pipelines into a different product or to do so in a code free fashion.
Ready to practice?
These 27 official sample questions are free to practice on WiseOwlLearns — no account required. Get real-time tutoring from WiseOwl Tutor™ and step-by-step elimination reasoning from Option Analyzer™.