Skip to main content

Security Best Practices

The OWASP LLM Top 10 defines the most critical security risks for LLM applications. Understanding these is essential before deploying any LLM service to production.

OWASP LLM Top 10

#RiskDescription
LLM01Prompt InjectionManipulating LLM behavior through crafted inputs
LLM02Insecure Output HandlingTreating LLM output as safe without validation
LLM03Training Data PoisoningManipulating training data to introduce vulnerabilities
LLM04Model Denial of ServiceOverloading LLM with resource-intensive requests
LLM05Supply Chain VulnerabilitiesCompromised third-party models, data, or plugins
LLM06Sensitive Information DisclosureLLM revealing confidential data in responses
LLM07Insecure Plugin DesignPlugins with inadequate input validation or access control
LLM08Excessive AgencyLLM with too much autonomy or access
LLM09OverrelianceBlindly trusting LLM output without verification
LLM10Model TheftUnauthorized access, copying, or extraction of model weights

Rate limiting

Essential for preventing abuse and DoS attacks:

python
from fastapi import FastAPI, Request, HTTPException
from time import time
from collections import defaultdict

app = FastAPI()

class RateLimiter:
def __init__(self, max_requests: int = 10, window_seconds: int = 60):
self.max_requests = max_requests
self.window = window_seconds
self.requests: dict[str, list[float]] = defaultdict(list)

def is_allowed(self, key: str) -> bool:
now = time()
self.requests[key] = [t for t in self.requests[key] if now - t < self.window]
if len(self.requests[key]) >= self.max_requests:
return False
self.requests[key].append(now)
return True

limiter = RateLimiter(max_requests=20, window_seconds=60)

@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
client_ip = request.client.host
if not limiter.is_allowed(client_ip):
raise HTTPException(status_code=429, detail="Rate limit exceeded")
return await call_next(request)

Input sanitization

Never trust user input — validate and sanitize before sending to the LLM:

python
import re
from html import escape

def sanitize_llm_input(text: str, max_length: int = 4000) -> str:
"""Sanitize user input before sending to LLM."""
# Truncate
text = text[:max_length]

# Remove null bytes
text = text.replace('\x00', '')

# Remove control characters (except newline, tab)
text = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f\x7f]', '', text)

# Remove common injection patterns
injection_patterns = [
r'(?i)ignore\s+(all\s+)?previous\s+instructions',
r'(?i)system\s*:',
r'(?i)you\s+are\s+now\s+a',
r'(?i)jailbreak',
]
for pattern in injection_patterns:
text = re.sub(pattern, '[FILTERED]', text)

return text.strip()

API key management

python
import os
from dotenv import load_dotenv
from google.cloud import secretmanager

def get_api_key(key_name: str) -> str:
"""Get API key from environment or Secret Manager."""
# First check environment (for local dev)
env_key = os.environ.get(key_name)
if env_key:
return env_key

# Fall back to Google Secret Manager (for production)
client = secretmanager.SecretManagerServiceClient()
project_id = os.environ.get("GCP_PROJECT_ID")
name = f"projects/{project_id}/secrets/{key_name}/versions/latest"
response = client.access_secret_version(request={"name": name})
return response.payload.data.decode("UTF-8")

# Key rotation strategy
class APIKeyRotator:
def __init__(self, keys: list[str]):
self.keys = keys
self.index = 0

def get_key(self) -> str:
return self.keys[self.index]

def rotate(self):
self.index = (self.index + 1) % len(self.keys)

Audit logging

Log all LLM interactions for security analysis:

python
import logging
import json
from datetime import datetime
from fastapi import Request

logger = logging.getLogger("llm_audit")

def log_llm_interaction(
user_id: str,
input_text: str,
output_text: str,
model: str,
flagged: bool = False,
):
"""Log LLM interaction for audit trail."""
entry = {
"timestamp": datetime.utcnow().isoformat(),
"user_id": user_id,
"model": model,
"input_hash": hash(input_text), # Don't log raw input for privacy
"output_hash": hash(output_text),
"input_length": len(input_text),
"output_length": len(output_text),
"flagged": flagged,
}
logger.info(json.dumps(entry))

# Flag suspicious interactions
def is_suspicious(text: str) -> bool:
suspicious_patterns = [
r'(?i)(hack|exploit|attack|vulnerability)',
r'(?i)(password|credential|secret|token)',
r'(?i)(admin|root|sudo)',
]
return any(re.search(p, text) for p in suspicious_patterns)

Dependency scanning

bash
# Scan Python dependencies
pip install safety
safety check --json

# Scan Docker image for vulnerabilities
docker scout cves my-image:latest

# Scan with Trivy
trivy image my-image:latest

Production security checklist

Before deploying your LLM application:

  • Rate limiting on all endpoints
  • Input sanitization for all user inputs
  • Output validation before sending to users
  • API keys stored in Secret Manager (not env vars)
  • Audit logging for all LLM interactions
  • Dependency scanning in CI pipeline
  • Container image scanning
  • HTTPS only (no HTTP)
  • CORS properly configured
  • Authentication on all endpoints
  • Content filtering / guardrails enabled
  • Monitoring and alerting for abuse patterns
  • Incident response plan documented
  • Regular key rotation schedule
  • Model access restricted by IAM
Security is a process, not a feature

Security requires ongoing attention. Set up automated scanning, regular audits, and a culture of security awareness in your team.