pde
BigQuery, BigLake, and Dataplex: The PDE Exam's New Core | WiseOwlLearns
BigQuery, BigLake, and Dataplex form the modern GCP data platform tested on the PDE exam. Here's how they work together and what the exam tests.
BigQuery, BigLake, and Dataplex represent three layers of the modern GCP data platform: analytics, unified storage, and governance. On the PDE exam, they appear together across Sections 3, 4, and 5 — and on the renewal exam, these three services alone account for a significant portion of the reweighted emphasis.
Here’s what each one does, how they interconnect, and exactly how the exam tests them.
BigQuery: the analytics engine
BigQuery is the most heavily tested service on the PDE exam. It appears in every section. You already know what it does — serverless data warehouse, SQL analytics, columnar storage. What the exam tests is your judgment about when and how to use its advanced features:
Materialized views for pre-aggregated query optimization. The exam trap: BI Engine can’t cache tables over 10GB of result set — so for petabyte-scale tables queried repeatedly, materialized views with incremental refresh are the answer, not BI Engine.
BigQuery ML for in-database model training. On the renewal exam, this extends to embedding generation and vector search for RAG applications — preparing unstructured data for retrieval-augmented generation without leaving BigQuery.
BigQuery Editions with reservations for cost-predictable workloads. The exam tests whether you know the difference between on-demand pricing and committed-use reservations, and when each makes financial sense.
BigLake: the unified storage layer
BigLake is the service most likely to be new if you certified before 2024. It provides a unified storage API that lets BigQuery and open-source engines (Spark, Presto) query data across:
- Google Cloud Storage
- Amazon S3
- Azure Blob Storage
…with fine-grained IAM at the table and column level, without moving the data.
When the exam tests BigLake
The typical scenario: “Your organization has data in both Cloud Storage and AWS S3. Analysts need to query it from BigQuery with column-level access control. What do you use?”
The distractor is BigQuery external tables. External tables can also query Cloud Storage, but they lack fine-grained IAM and don’t support multi-cloud sources. BigLake is the answer when any of these conditions apply:
- Data lives in multiple clouds (GCS + S3 + Azure Blob)
- Column-level or row-level security is required on external data
- You need Apache Iceberg or Delta Lake table format support
🚨 Exam Trap: BigLake vs External Tables. External tables are NOT BigLake. External tables connect BigQuery to Cloud Storage with basic access. BigLake adds fine-grained IAM, multi-cloud support, and open table formats. The exam tests this distinction explicitly.
Dataplex: the governance layer
Dataplex is GCP’s data mesh and governance platform. It manages decentralized data across multiple storage systems without requiring you to centralize everything into one data warehouse.
What Dataplex does
- Data discovery: Dataplex Catalog provides a unified metadata catalog across BigQuery, Cloud Storage, and other data sources.
- Data quality: Define and run quality checks (assertions) across distributed data — completeness, uniqueness, freshness.
- Lineage tracking: Trace how data flows from source to downstream consumers.
- Federated governance: Apply consistent policies across teams and storage systems without centralizing data ownership.
How the exam tests Dataplex
The typical scenario: “Your company has data distributed across 5 teams, each using different BigQuery datasets and Cloud Storage buckets. You need to implement governance, lineage, and quality checks without centralizing all data into one project.”
The distractor is “build a custom governance tool on GKE.” The exam always favors managed services — Dataplex is the answer when governance needs to span multiple storage systems.
🚨 Exam Trap: Dataplex vs Dataform. Dataplex governs data across systems. Dataform transforms data within BigQuery with assertions. They’re complementary, not competitors. The exam tests whether you know which to use where: in-pipeline quality → Dataform assertions; cross-system governance → Dataplex.
How they work together
In the modern GCP data platform pattern the exam tests:
- Data lands in Cloud Storage (raw zone) or directly in BigQuery
- BigLake provides unified access across Cloud Storage, S3, and Azure Blob with fine-grained IAM
- Dataplex governs the entire estate — discovery, quality, lineage, access policies
- BigQuery serves as the analytics engine — querying BigLake tables, running ML models, serving materialized views
- Dataform handles in-BigQuery transformations with data quality assertions
- Analytics Hub enables zero-copy sharing of curated datasets to other organizations
This is the architecture the PDE exam rewards. Knowing each service individually isn’t enough — the exam tests whether you can assemble them into a coherent, governed data platform.
Preparing for these services
For the full exam: Study these services in the context of all 5 sections. Our 8-Week PDE Study Plan covers BigLake and Dataplex in Weeks 4–5.
For the renewal exam: These three services are the renewal’s core. Our 2-Week Refresher front-loads them in Week 1.
In both cases, Option Analyzer™ walks you through the service-differentiation logic that the exam actually tests — not just what each service does, but why the other three options are wrong for the specific scenario.