Home > PDE > 8-Week Study Plan

8-Week PDE Study Plan (2026)

This 8-week plan maps directly to the PDE exam's 5 sections: Designing (~22%), Ingesting (~25%), Storing (~20%), Analysis (~15%), and Automating (~18%). Each week targets 8–10 hours of hands-on study. By Week 8, you'll have practiced with case studies, worked through scenario-based questions, and built the production-judgment skills the exam actually tests.

Already certified? Use the 2-Week Refresher instead.

If you're renewing your PDE certification, our 2-Week Refresher Plan focuses only on what changed — modern services, reweighted sections, scenario-only format.

Overview Calendar

Week Focus Area Hours Key Services
1Data Architecture & Security9IAM, Cloud KMS, VPC-SC, Cloud DLP
2Batch & Streaming Pipelines10Dataflow, Pub/Sub, Datastream
3Orchestration & Processing9Cloud Composer, Dataproc, Dataform
4Storage & Data Modeling10BigQuery, BigLake, AlloyDB, Bigtable, Spanner
5Data Lakes & Governance9Dataplex, Dataplex Catalog, Cloud Storage
6Analytics, Sharing & ML10BigQuery ML, BI Engine, Analytics Hub
7Automation & Monitoring9Cloud Monitoring, BigQuery Editions, Memorystore
8Case Studies & Simulation8Flowlogistic, MJTelco, WiseOwl Practice Exam

Week-by-Week Details

Week 1: Data Architecture & Security Foundations

Build the security mental model the exam tests throughout. Understand IAM at the project, dataset, and table level. Know when to use CMEK vs CSEK vs Google-managed keys. Study Cloud KMS key rotation (it does NOT re-encrypt existing data — a known exam trap). Learn VPC Service Controls for serverless services like Pub/Sub where firewall rules don't apply.

Practice Cloud DLP — specifically data masking that preserves analytical utility. Study data sovereignty requirements and dual-region vs multi-region storage for disaster recovery. Understand Cloud EKM for when keys must stay on-premises.

Week 2: Batch & Streaming Pipelines

The heaviest exam section (~25%). Build a Dataflow pipeline end-to-end. Understand fusion optimization and when to insert Reshuffle operations to break fused stages. Study windowing: tumbling (fixed), hopping (overlapping), and session windows. Know that "aggregate last 30 seconds every 2 seconds" = hopping window.

Study Pub/Sub for high-throughput event ingestion and Datastream for serverless CDC replication (Oracle/MySQL/PostgreSQL → BigQuery). Know that Kafka + Debezium require VMs while Datastream is serverless.

Week 3: Orchestration & Processing

Study Cloud Composer (Airflow) for DAG-based orchestration. Key trap: worker pod evictions mean OOM in workers — increase worker memory and max workers, not environment size (environment size only affects the backend database/queue). Use BigQueryInsertJobOperator for SQL transformations, NOT BigQueryUpsertTableOperator.

Study Dataproc for Spark/Hadoop workloads — persistent vs ephemeral clusters. Learn Dataform for SQL-based transformations with assertions for data quality (uniqueness, null checks). Know when to use Workflows for lightweight serverless orchestration vs Cloud Composer for complex DAGs.

Week 4: Storage Selection & Data Modeling

The exam tests your ability to choose the right storage for specific access patterns. BigQuery for analytics, Bigtable for sub-10ms NoSQL (time-series, IoT — NOT analytics), Spanner for globally distributed relational, Cloud SQL for transactional, AlloyDB for high-performance PostgreSQL with AI/ML extensions.

Study BigLake — it lets BigQuery and open-source engines query data across Cloud Storage, S3, and Azure Blob with fine-grained IAM. Design data warehouse schemas, understand normalization trade-offs, and practice lifecycle management (storage class transitions, retention policies).

Week 5: Data Lakes & Governance

Study Dataplex for data mesh governance — decentralized governance across multiple storage systems with lineage tracking and quality validation. Know Dataplex Catalog for metadata discovery. This is the data platform section the exam emphasizes for modern GCP patterns.

Practice building federated governance models. Understand why building custom governance tools on GKE is overengineering when Dataplex exists. Study dataset-level IAM isolation (give viewers Data Viewer on shared dataset only, not project-level).

Week 6: Analytics, Sharing & ML Readiness

Study BigQuery ML for in-database model training and feature engineering. Understand BI Engine for in-memory acceleration (it can't cache 50TB tables). Know materialized views for pre-aggregated query optimization with incremental refresh. Study Connected Sheets for non-technical users and Looker Studio for dashboards.

Master Analytics Hub for zero-copy data sharing — authorized datasets at subscription time. Study RAG and embeddings: preparing unstructured data for retrieval-augmented generation with BigQuery ML vector search. Know Cloud DLP masking for analytics — preserving utility while protecting PII.

Week 7: Automation, Monitoring & Disaster Recovery

Study resource optimization: BigQuery Editions with reservations for predictable costs, persistent vs job-based Dataproc clusters, batch vs interactive query modes. Understand Cloud Monitoring and Cloud Logging for pipeline observability.

Design for fault tolerance: dual-region Cloud Storage with turbo replication for 15-min RPO (multi-region is 1-hour RPO). Multi-zone Dataflow (use --region, not --zone). Cloud SQL failover replicas and Memorystore Redis clusters for data replication.

Week 8: Case Studies & Exam Simulation

Work through the Flowlogistic case study (logistics IoT data, real-time tracking, global scale) and MJTelco case study (telecom CDR processing, subscriber analytics). These appear on the full exam as multi-question scenarios testing your ability to design end-to-end architectures.

Run full-length practice exams. Use Option Analyzer™ to understand elimination logic. Analyze your performance breakdown across all 5 sections. Re-study sections where you score below 75%. Focus on constraint-driven elimination — the correct technical solution might be wrong if it violates the stated constraint.

Start Your PDE Prep

Practice with questions verified against current Google Cloud documentation. Chat with WiseOwl Tutor™ to clarify BigQuery, Dataflow, and Dataplex scenarios in real time.