Vertex AI

Vertex AI is Google Cloud's unified ML platform. It provides managed infrastructure for every stage of the ML lifecycle — from interactive notebooks to automated training to online serving. Instead of managing your own GPU servers, you let Vertex AI handle the infrastructure while you focus on your models.

Vertex AI Workbench

Vertex AI Workbench provides managed Jupyter notebooks that are deeply integrated with GCP services. Unlike a local Jupyter instance, Workbench notebooks come with pre-installed ML libraries, automatic auth to GCP services, and the ability to scale to GPU/TPU instances.

Creating a Workbench Instance

bash

# Create a managed notebook instance
gcloud ai workbench instances create my-notebook \
  --location=us-central1 \
  --type=managed

# Create with a GPU for training
gcloud ai workbench instances create gpu-notebook \
  --location=us-central1 \
  --type=managed \
  --machine-type=n1-standard-4 \
  --accelerator-type=NVIDIA_TESLA_T4 \
  --accelerator-count=1

Why Workbench over Local Notebooks?

Feature	Local Jupyter	Vertex AI Workbench
GPU access	Your hardware	On-demand GPU/TPU
GCP auth	Manual key management	Automatic IAM
Pre-installed libs	Manual setup	ML stack pre-loaded
Collaboration	Share files manually	Shared instances
Idle shutdown	None (wastes $$)	Configurable idle shutdown
Data access	Manual GCS mount	Native GCS/BigQuery access

AutoML vs Custom Training

Vertex AI offers two training paths: AutoML (no code) and Custom Training (full control).

AutoML

AutoML automatically selects features, tunes hyperparameters, and trains an optimized model. You provide the data — Vertex AI handles the rest.

bash

# Create an AutoML tabular classification dataset
gcloud ai datasets create tabular-dataset \
  --display-name="churn-dataset" \
  --bigquery-source=bq://project.dataset.table \
  --region=us-central1

# Submit an AutoML training job
gcloud ai models upload \
  --display-name="churn-automl" \
  --region=us-central1 \
  --container-image-uri=us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-11:latest

python

from google.cloud import aiplatform

aiplatform.init(project="your-project", location="us-central1")

# Create and run an AutoML tabular training job
job = aiplatform.AutoMLTabularTrainingJob(
    display_name="churn-prediction-automl",
    optimization_prediction_type="classification",
    column_specs={
        "customer_id": "auto",
        "tenure": "numeric",
        "monthly_charges": "numeric",
        "total_charges": "numeric",
        "churn": "categorical",
    },
)

model = job.run(
    dataset=dataset,
    target_column="churn",
    training_fraction_split=0.8,
    validation_fraction_split=0.1,
    test_fraction_split=0.1,
    budget_milli_node_hours=1000,  # ~1 node-hour
)

When to use AutoML:

You have tabular, image, text, or video data
You want a strong baseline quickly
You don't need a specific model architecture
Your team has limited ML expertise

Custom Training

Custom training gives you full control over the training code, environment, and hardware. You write the training script and package it as a Docker container.

python

from google.cloud import aiplatform

aiplatform.init(project="your-project", location="us-central1")

# Define a custom training job
job = aiplatform.CustomPythonPackageTrainingJob(
    display_name="custom-xgboost-training",
    python_package_gcs_uri="gs://your-bucket/training/trainer.tar.gz",
    python_module="trainer.task",
    container_uri="us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.2-0:latest",
    requirements=["xgboost==2.0.0", "pandas==2.1.0"],
)

model = job.run(
    dataset=dataset,
    model_display_name="xgboost-churn-model",
    args=[
        "--n_estimators=500",
        "--max_depth=6",
        "--learning_rate=0.01",
    ],
    replica_count=1,
    machine_type="n1-standard-4",
    accelerator_type="NVIDIA_TESLA_T4",
    accelerator_count=1,
)

When to use Custom Training:

You need a specific model architecture
You have custom preprocessing logic
You want to use frameworks beyond TensorFlow/PyTorch
You need distributed training

Vertex AI Endpoints for Serving

Once you have a trained model, you deploy it to a Vertex AI Endpoint for online prediction.

Deploying a Model

python

from google.cloud import aiplatform

aiplatform.init(project="your-project", location="us-central1")

# Get the model
model = aiplatform.Model("projects/your-project/locations/us-central1/models/12345")

# Create an endpoint
endpoint = aiplatform.Endpoint.create(
    display_name="churn-endpoint",
    location="us-central1",
)

# Deploy the model to the endpoint
model.deploy(
    endpoint=endpoint,
    deployed_model_display_name="churn-v1",
    machine_type="n1-standard-2",
    min_replica_count=1,
    max_replica_count=3,
    traffic_percentage=100,
)

Getting Predictions

python

# Online prediction
response = endpoint.predict(
    instances=[
        {"tenure": 12, "monthly_charges": 85.0, "total_charges": 1020.0},
        {"tenure": 48, "monthly_charges": 55.0, "total_charges": 2640.0},
    ]
)

for prediction in response.predictions:
    print(f"Churn probability: {prediction[1]:.4f}")

# Batch prediction (for large datasets)
batch_job = model.batch_predict(
    job_display_name="churn-batch-predict",
    gcs_source="gs://your-bucket/input/data.jsonl",
    gcs_destination_prefix="gs://your-bucket/output/",
    predictions_format="jsonl",
    machine_type="n1-standard-2",
)

Managing Endpoints with gcloud

bash

# List endpoints
gcloud ai endpoints list --region=us-central1

# Get endpoint details
gcloud ai endpoints describe ENDPOINT_ID --region=us-central1

# Predict using gcloud
gcloud ai endpoints predict ENDPOINT_ID \
  --region=us-central1 \
  --json-request=predict_request.json

# Undeploy a model
gcloud ai endpoints undeploy-model ENDPOINT_ID \
  --deployed-model-id=DEPLOYED_MODEL_ID \
  --region=us-central1

Pricing and Free Tier

Understanding GCP pricing is critical to avoid surprise bills.

Resource	Free Tier	Price (us-central1)
Workbench (n1-standard-4)	None	~$0.19/hr
AutoML Tabular	1 node-hour free	$3.78/node-hour
Custom Training (n1-standard-4)	None	$0.19/hr
Custom Training (T4 GPU)	None	$0.35/hr + GPU $0.95/hr
Online Prediction (n1-standard-2)	None	$0.095/hr
Batch Prediction	None	Same as training
Model Storage	10 GB free	$0.05/GB/month

Cost Control Tips

Always set idle shutdown on Workbench instances (default: 30 min)
Use min_replica_count=0 for endpoints with low traffic (scales to zero)
Set budget alerts in GCP Billing
Use preemptible/spot VMs for training — up to 80% cheaper
Delete endpoints you're not actively using

Useful gcloud Commands

bash

# List all training jobs
gcloud ai custom-jobs list --region=us-central1

# Check a training job's status
gcloud ai custom-jobs describe JOB_ID --region=us-central1

# List models
gcloud ai models list --region=us-central1

# Upload a custom model from GCS
gcloud ai models upload \
  --region=us-central1 \
  --display-name="my-custom-model" \
  --container-image-uri=us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-3:latest \
  --artifact-uri=gs://your-bucket/models/sklearn/

# Cancel a running job
gcloud ai custom-jobs cancel JOB_ID --region=us-central1

# Delete a model
gcloud ai models delete MODEL_ID --region=us-central1

Vertex AI Workbench​

Creating a Workbench Instance​

Why Workbench over Local Notebooks?​

AutoML vs Custom Training​

AutoML​

Custom Training​

Vertex AI Endpoints for Serving​

Deploying a Model​

Getting Predictions​

Managing Endpoints with gcloud​

Pricing and Free Tier​

Useful gcloud Commands​

Vertex AI Workbench

Creating a Workbench Instance

Why Workbench over Local Notebooks?

AutoML vs Custom Training

AutoML

Custom Training

Vertex AI Endpoints for Serving

Deploying a Model

Getting Predictions

Managing Endpoints with gcloud

Pricing and Free Tier

Useful gcloud Commands