You maintain ETL pipelines. You notice that a streaming pipeline running on Dataflow is taking a long time to process incoming data, which causes output delays. You also noticed that the pipeline graph was automatically optimized by Dataflow and merged into one step. You want to identify where the potential bottleneck is occurring. What should you do?
A is not correct as it would create a lot of unnecessary output and it is not really a practical approach to identify processing bottlenecks in the pipeline. B is not correct as this approach might create an unnecessary amount of logs and it does not really make it easier to identify the step that is causing the bottleneck. C is correct as the Reshuffle operation prevents the fusion of steps allowing us to observe the throughput of each individual step of the Dataflow pipeline. D is not correct as the issue is related to the processing of incoming data and not on the writing of output data.
Ready to practice?
These 27 official sample questions are free to practice on WiseOwlLearns — no account required. Get real-time tutoring from WiseOwl Tutor™ and step-by-step elimination reasoning from Option Analyzer™.