GCP Batch Prediction Infrastructure with Terraform, Vertex AI, Artifact Registry, Kubeflow, and PubSub
This repository contains the code to set up a batch prediction infrastructure on Google Cloud Platform (GCP) using Terraform. The infrastructure includes Cloud Scheduler, Vertex AI, Artifact Registry, Kubeflow, and PubSub. The project leverages GCP Artifact Registry to run Kubeflow pipeline components on custom images built locally and uses Poetry for package management.
- Automates the creation of GCP resources for batch prediction using Terraform
- Sets up a Vertex AI pipeline that processes sample data upon PubSub trigger
- Utilizes GCP Artifact Registry for running Kubeflow pipeline components on custom images
- Employs Poetry for package management
- GCP Account with Billing enabled
- Google Cloud SDK installed and configured
- Terraform v0.13 or higher
- Docker installed
- Poetry installed
- Kubeflow Pipelines SDK v1.8.0 or higher
Clone this repository to your local machine.
git clone https://github.com/yourusername/gcp-batch-prediction-infrastructure.git
cd gcp-batch-prediction-infrastructure
Ensure you have a GCP account with billing enabled. Create a new GCP project or select an existing one.
gcloud projects create PROJECT_ID
gcloud config set project PROJECT_ID
Create a service account with the necessary roles and download the JSON key:
gcloud iam service-accounts create terraform --display-name "Terraform Service Account"
gcloud projects add-iam-policy-binding PROJECT_ID --member "serviceAccount:terraform@PROJECT_ID.iam.gserviceaccount.com" --role "roles/owner"
gcloud iam service-accounts keys create terraform-key.json --iam-account terraform@PROJECT_ID.iam.gserviceaccount.com
Install the Python dependencies using Poetry:
poetry install
Log in to Google Container Registry:
gcloud auth configure-docker
Build and push custom images to GCP Artifact Registry:
docker build -t eu.gcr.io/PROJECT_ID/YOUR_IMAGE_NAME:YOUR_TAG -f Dockerfile . docker push eu.gcr.io/PROJECT_ID/YOUR_IMAGE_NAME:YOUR_TAG