Vertex AI
Vertex AI is Google Cloud's unified ML platform. It provides managed infrastructure for every stage of the ML lifecycle — from interactive notebooks to automated training to online serving. Instead of managing your own GPU servers, you let Vertex AI handle the infrastructure while you focus on your models.
Vertex AI Workbench
Vertex AI Workbench provides managed Jupyter notebooks that are deeply integrated with GCP services. Unlike a local Jupyter instance, Workbench notebooks come with pre-installed ML libraries, automatic auth to GCP services, and the ability to scale to GPU/TPU instances.
Creating a Workbench Instance
# Create a managed notebook instance
gcloud ai workbench instances create my-notebook \
--location=us-central1 \
--type=managed
# Create with a GPU for training
gcloud ai workbench instances create gpu-notebook \
--location=us-central1 \
--type=managed \
--machine-type=n1-standard-4 \
--accelerator-type=NVIDIA_TESLA_T4 \
--accelerator-count=1
Why Workbench over Local Notebooks?
| Feature | Local Jupyter | Vertex AI Workbench |
|---|---|---|
| GPU access | Your hardware | On-demand GPU/TPU |
| GCP auth | Manual key management | Automatic IAM |
| Pre-installed libs | Manual setup | ML stack pre-loaded |
| Collaboration | Share files manually | Shared instances |
| Idle shutdown | None (wastes $$) | Configurable idle shutdown |
| Data access | Manual GCS mount | Native GCS/BigQuery access |
AutoML vs Custom Training
Vertex AI offers two training paths: AutoML (no code) and Custom Training (full control).
AutoML
AutoML automatically selects features, tunes hyperparameters, and trains an optimized model. You provide the data — Vertex AI handles the rest.
# Create an AutoML tabular classification dataset
gcloud ai datasets create tabular-dataset \
--display-name="churn-dataset" \
--bigquery-source=bq://project.dataset.table \
--region=us-central1
# Submit an AutoML training job
gcloud ai models upload \
--display-name="churn-automl" \
--region=us-central1 \
--container-image-uri=us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-11:latest
from google.cloud import aiplatform
aiplatform.init(project="your-project", location="us-central1")
# Create and run an AutoML tabular training job
job = aiplatform.AutoMLTabularTrainingJob(
display_name="churn-prediction-automl",
optimization_prediction_type="classification",
column_specs={
"customer_id": "auto",
"tenure": "numeric",
"monthly_charges": "numeric",
"total_charges": "numeric",
"churn": "categorical",
},
)
model = job.run(
dataset=dataset,
target_column="churn",
training_fraction_split=0.8,
validation_fraction_split=0.1,
test_fraction_split=0.1,
budget_milli_node_hours=1000, # ~1 node-hour
)
When to use AutoML:
- You have tabular, image, text, or video data
- You want a strong baseline quickly
- You don't need a specific model architecture
- Your team has limited ML expertise
Custom Training
Custom training gives you full control over the training code, environment, and hardware. You write the training script and package it as a Docker container.
from google.cloud import aiplatform
aiplatform.init(project="your-project", location="us-central1")
# Define a custom training job
job = aiplatform.CustomPythonPackageTrainingJob(
display_name="custom-xgboost-training",
python_package_gcs_uri="gs://your-bucket/training/trainer.tar.gz",
python_module="trainer.task",
container_uri="us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.2-0:latest",
requirements=["xgboost==2.0.0", "pandas==2.1.0"],
)
model = job.run(
dataset=dataset,
model_display_name="xgboost-churn-model",
args=[
"--n_estimators=500",
"--max_depth=6",
"--learning_rate=0.01",
],
replica_count=1,
machine_type="n1-standard-4",
accelerator_type="NVIDIA_TESLA_T4",
accelerator_count=1,
)
When to use Custom Training:
- You need a specific model architecture
- You have custom preprocessing logic
- You want to use frameworks beyond TensorFlow/PyTorch
- You need distributed training
Vertex AI Endpoints for Serving
Once you have a trained model, you deploy it to a Vertex AI Endpoint for online prediction.
Deploying a Model
from google.cloud import aiplatform
aiplatform.init(project="your-project", location="us-central1")
# Get the model
model = aiplatform.Model("projects/your-project/locations/us-central1/models/12345")
# Create an endpoint
endpoint = aiplatform.Endpoint.create(
display_name="churn-endpoint",
location="us-central1",
)
# Deploy the model to the endpoint
model.deploy(
endpoint=endpoint,
deployed_model_display_name="churn-v1",
machine_type="n1-standard-2",
min_replica_count=1,
max_replica_count=3,
traffic_percentage=100,
)
Getting Predictions
# Online prediction
response = endpoint.predict(
instances=[
{"tenure": 12, "monthly_charges": 85.0, "total_charges": 1020.0},
{"tenure": 48, "monthly_charges": 55.0, "total_charges": 2640.0},
]
)
for prediction in response.predictions:
print(f"Churn probability: {prediction[1]:.4f}")
# Batch prediction (for large datasets)
batch_job = model.batch_predict(
job_display_name="churn-batch-predict",
gcs_source="gs://your-bucket/input/data.jsonl",
gcs_destination_prefix="gs://your-bucket/output/",
predictions_format="jsonl",
machine_type="n1-standard-2",
)
Managing Endpoints with gcloud
# List endpoints
gcloud ai endpoints list --region=us-central1
# Get endpoint details
gcloud ai endpoints describe ENDPOINT_ID --region=us-central1
# Predict using gcloud
gcloud ai endpoints predict ENDPOINT_ID \
--region=us-central1 \
--json-request=predict_request.json
# Undeploy a model
gcloud ai endpoints undeploy-model ENDPOINT_ID \
--deployed-model-id=DEPLOYED_MODEL_ID \
--region=us-central1
Pricing and Free Tier
Understanding GCP pricing is critical to avoid surprise bills.
| Resource | Free Tier | Price (us-central1) |
|---|---|---|
| Workbench (n1-standard-4) | None | ~$0.19/hr |
| AutoML Tabular | 1 node-hour free | $3.78/node-hour |
| Custom Training (n1-standard-4) | None | $0.19/hr |
| Custom Training (T4 GPU) | None | $0.35/hr + GPU $0.95/hr |
| Online Prediction (n1-standard-2) | None | $0.095/hr |
| Batch Prediction | None | Same as training |
| Model Storage | 10 GB free | $0.05/GB/month |
- Always set idle shutdown on Workbench instances (default: 30 min)
- Use
min_replica_count=0for endpoints with low traffic (scales to zero) - Set budget alerts in GCP Billing
- Use preemptible/spot VMs for training — up to 80% cheaper
- Delete endpoints you're not actively using
Useful gcloud Commands
# List all training jobs
gcloud ai custom-jobs list --region=us-central1
# Check a training job's status
gcloud ai custom-jobs describe JOB_ID --region=us-central1
# List models
gcloud ai models list --region=us-central1
# Upload a custom model from GCS
gcloud ai models upload \
--region=us-central1 \
--display-name="my-custom-model" \
--container-image-uri=us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-3:latest \
--artifact-uri=gs://your-bucket/models/sklearn/
# Cancel a running job
gcloud ai custom-jobs cancel JOB_ID --region=us-central1
# Delete a model
gcloud ai models delete MODEL_ID --region=us-central1