Your company manages an ecommerce website. You developed an ML model that recommends additional products to users in near real time based on items currently in the user’s cart. The workflow will in...

Question

Your company manages an ecommerce website. You developed an ML model that recommends additional products to users in near real time based on items currently in the user’s cart. The workflow will include the following processes:
1. The website will send a Pub/Sub message with the relevant data, and then receive a message with the prediction from Pub/Sub.
2. Predictions will be stored in BigQuery.
3. The model will be stored in a Cloud Storage bucket and will be updated frequently.
You want to minimize prediction latency and the effort required to update the model. How should you reconfigure the architecture?

Accepted Answer

The RunInference API with a locally loaded model minimizes the prediction latency and makes model updates seamless by watching for new files using WatchFilePattern. Cloud Functions will run into limitations based on request rate and model size. Exposing the model as a Vertex AI endpoint and calling it from Dataflow adds to the total latency. Provisioning Vertex AI Pipelines is slow and adds significant latency, making it unsuitable for near-real-time cart predictions.

Ready to practice?