Skip to main content

Vertex AI

Vertex AI is Google Cloud's unified ML platform. It provides managed infrastructure for every stage of the ML lifecycle — from interactive notebooks to automated training to online serving. Instead of managing your own GPU servers, you let Vertex AI handle the infrastructure while you focus on your models.

Vertex AI Workbench

Vertex AI Workbench provides managed Jupyter notebooks that are deeply integrated with GCP services. Unlike a local Jupyter instance, Workbench notebooks come with pre-installed ML libraries, automatic auth to GCP services, and the ability to scale to GPU/TPU instances.

Creating a Workbench Instance

bash
# Create a managed notebook instance
gcloud ai workbench instances create my-notebook \
--location=us-central1 \
--type=managed

# Create with a GPU for training
gcloud ai workbench instances create gpu-notebook \
--location=us-central1 \
--type=managed \
--machine-type=n1-standard-4 \
--accelerator-type=NVIDIA_TESLA_T4 \
--accelerator-count=1

Why Workbench over Local Notebooks?

FeatureLocal JupyterVertex AI Workbench
GPU accessYour hardwareOn-demand GPU/TPU
GCP authManual key managementAutomatic IAM
Pre-installed libsManual setupML stack pre-loaded
CollaborationShare files manuallyShared instances
Idle shutdownNone (wastes $$)Configurable idle shutdown
Data accessManual GCS mountNative GCS/BigQuery access

AutoML vs Custom Training

Vertex AI offers two training paths: AutoML (no code) and Custom Training (full control).

AutoML

AutoML automatically selects features, tunes hyperparameters, and trains an optimized model. You provide the data — Vertex AI handles the rest.

bash
# Create an AutoML tabular classification dataset
gcloud ai datasets create tabular-dataset \
--display-name="churn-dataset" \
--bigquery-source=bq://project.dataset.table \
--region=us-central1

# Submit an AutoML training job
gcloud ai models upload \
--display-name="churn-automl" \
--region=us-central1 \
--container-image-uri=us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-11:latest
python
from google.cloud import aiplatform

aiplatform.init(project="your-project", location="us-central1")

# Create and run an AutoML tabular training job
job = aiplatform.AutoMLTabularTrainingJob(
display_name="churn-prediction-automl",
optimization_prediction_type="classification",
column_specs={
"customer_id": "auto",
"tenure": "numeric",
"monthly_charges": "numeric",
"total_charges": "numeric",
"churn": "categorical",
},
)

model = job.run(
dataset=dataset,
target_column="churn",
training_fraction_split=0.8,
validation_fraction_split=0.1,
test_fraction_split=0.1,
budget_milli_node_hours=1000, # ~1 node-hour
)

When to use AutoML:

  • You have tabular, image, text, or video data
  • You want a strong baseline quickly
  • You don't need a specific model architecture
  • Your team has limited ML expertise

Custom Training

Custom training gives you full control over the training code, environment, and hardware. You write the training script and package it as a Docker container.

python
from google.cloud import aiplatform

aiplatform.init(project="your-project", location="us-central1")

# Define a custom training job
job = aiplatform.CustomPythonPackageTrainingJob(
display_name="custom-xgboost-training",
python_package_gcs_uri="gs://your-bucket/training/trainer.tar.gz",
python_module="trainer.task",
container_uri="us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.2-0:latest",
requirements=["xgboost==2.0.0", "pandas==2.1.0"],
)

model = job.run(
dataset=dataset,
model_display_name="xgboost-churn-model",
args=[
"--n_estimators=500",
"--max_depth=6",
"--learning_rate=0.01",
],
replica_count=1,
machine_type="n1-standard-4",
accelerator_type="NVIDIA_TESLA_T4",
accelerator_count=1,
)

When to use Custom Training:

  • You need a specific model architecture
  • You have custom preprocessing logic
  • You want to use frameworks beyond TensorFlow/PyTorch
  • You need distributed training

Vertex AI Endpoints for Serving

Once you have a trained model, you deploy it to a Vertex AI Endpoint for online prediction.

Deploying a Model

python
from google.cloud import aiplatform

aiplatform.init(project="your-project", location="us-central1")

# Get the model
model = aiplatform.Model("projects/your-project/locations/us-central1/models/12345")

# Create an endpoint
endpoint = aiplatform.Endpoint.create(
display_name="churn-endpoint",
location="us-central1",
)

# Deploy the model to the endpoint
model.deploy(
endpoint=endpoint,
deployed_model_display_name="churn-v1",
machine_type="n1-standard-2",
min_replica_count=1,
max_replica_count=3,
traffic_percentage=100,
)

Getting Predictions

python
# Online prediction
response = endpoint.predict(
instances=[
{"tenure": 12, "monthly_charges": 85.0, "total_charges": 1020.0},
{"tenure": 48, "monthly_charges": 55.0, "total_charges": 2640.0},
]
)

for prediction in response.predictions:
print(f"Churn probability: {prediction[1]:.4f}")

# Batch prediction (for large datasets)
batch_job = model.batch_predict(
job_display_name="churn-batch-predict",
gcs_source="gs://your-bucket/input/data.jsonl",
gcs_destination_prefix="gs://your-bucket/output/",
predictions_format="jsonl",
machine_type="n1-standard-2",
)

Managing Endpoints with gcloud

bash
# List endpoints
gcloud ai endpoints list --region=us-central1

# Get endpoint details
gcloud ai endpoints describe ENDPOINT_ID --region=us-central1

# Predict using gcloud
gcloud ai endpoints predict ENDPOINT_ID \
--region=us-central1 \
--json-request=predict_request.json

# Undeploy a model
gcloud ai endpoints undeploy-model ENDPOINT_ID \
--deployed-model-id=DEPLOYED_MODEL_ID \
--region=us-central1

Pricing and Free Tier

Understanding GCP pricing is critical to avoid surprise bills.

ResourceFree TierPrice (us-central1)
Workbench (n1-standard-4)None~$0.19/hr
AutoML Tabular1 node-hour free$3.78/node-hour
Custom Training (n1-standard-4)None$0.19/hr
Custom Training (T4 GPU)None$0.35/hr + GPU $0.95/hr
Online Prediction (n1-standard-2)None$0.095/hr
Batch PredictionNoneSame as training
Model Storage10 GB free$0.05/GB/month
Cost Control Tips
  • Always set idle shutdown on Workbench instances (default: 30 min)
  • Use min_replica_count=0 for endpoints with low traffic (scales to zero)
  • Set budget alerts in GCP Billing
  • Use preemptible/spot VMs for training — up to 80% cheaper
  • Delete endpoints you're not actively using

Useful gcloud Commands

bash
# List all training jobs
gcloud ai custom-jobs list --region=us-central1

# Check a training job's status
gcloud ai custom-jobs describe JOB_ID --region=us-central1

# List models
gcloud ai models list --region=us-central1

# Upload a custom model from GCS
gcloud ai models upload \
--region=us-central1 \
--display-name="my-custom-model" \
--container-image-uri=us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-3:latest \
--artifact-uri=gs://your-bucket/models/sklearn/

# Cancel a running job
gcloud ai custom-jobs cancel JOB_ID --region=us-central1

# Delete a model
gcloud ai models delete MODEL_ID --region=us-central1