LLM Agents
An LLM agent is a system where a language model acts as a reasoning engine that decides what actions to take in order to accomplish a goal. Unlike a simple chatbot that responds to a single prompt, an agent operates in a loop — observing its environment, thinking about what to do next, executing actions, and learning from the results.
The Agent Loop
At its core, every agent follows a similar cycle:
┌─────────────────────────────────┐
│ Observe │
│ (Read input / environment) │
└──────────────┬──────────────────┘
│
▼
┌─────────────────────────────────┐
│ Think │
│ (LLM reasons about next step) │
└──────────────┬──────────────────┘
│
▼
┌─────────────────────────────────┐
│ Act │
│ (Execute tool / take action) │
└──────────────┬──────────────────┘
│
▼
┌─────────────────────────────────┐
│ Reflect │
│ (Evaluate result, adapt plan) │
└──────────────┴──────────────────┘
│
└──────► Loop back to Observe
ReAct (Reasoning + Acting)
The ReAct pattern, introduced by Yao et al. (2022), interleaves reasoning traces with action steps. The LLM generates a thought, then an action, observes the observation, and repeats.
import openai
import json
SYSTEM_PROMPT = """You are a helpful assistant that can use tools.
Available tools:
- search(query: str) -> str: Search the web for information
- calculator(expression: str) -> float: Evaluate a math expression
- get_weather(city: str) -> str: Get current weather for a city
Respond in the following format:
Thought: <your reasoning>
Action: <tool_name>(<arguments>)
OR
Thought: <your reasoning>
Answer: <final answer>
"""
def run_react_agent(question: str, max_iterations: int = 5) -> str:
"""Run a simple ReAct agent loop."""
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": question},
]
for i in range(max_iterations):
response = openai.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
temperature=0,
)
reply = response.choices[0].message.content
messages.append({"role": "assistant", "content": reply})
print(f"--- Iteration {i + 1} ---")
print(reply)
# Check if the agent has a final answer
if "Answer:" in reply:
return reply.split("Answer:")[-1].strip()
# Parse and execute the action
if "Action:" in reply:
action_line = [
line for line in reply.split("\n") if line.startswith("Action:")
][0]
action_str = action_line.replace("Action:", "").strip()
# Parse tool call (simplified)
func_name, args_str = action_str.split("(", 1)
args_str = args_str.rstrip(")")
# Execute tool
observation = execute_tool(func_name, args_str)
messages.append({"role": "user", "content": f"Observation: {observation}"})
return "Agent exceeded maximum iterations."
def execute_tool(func_name: str, args_str: str) -> str:
"""Execute a tool and return the observation."""
tools = {
"search": lambda q: f"Search results for: {q}",
"calculator": lambda e: str(eval(e)), # In production, use a safe evaluator
"get_weather": lambda c: f"Weather in {c}: 22°C, sunny",
}
func = tools.get(func_name)
if func:
return func(args_str)
return f"Unknown tool: {func_name}"
# Example usage
result = run_react_agent("What is the temperature in Tokyo multiplied by 2?")
ReAct is different from simple Chain-of-Thought (CoT) prompting. CoT only generates reasoning traces, while ReAct interleaves reasoning with actual tool calls and their results. This makes the agent grounded in real data rather than just its own reasoning.
Plan-and-Execute
The Plan-and-Execute pattern separates the agent into two components:
- Planner: Generates a high-level plan (list of steps)
- Executor: Carries out each step sequentially, potentially calling tools
This approach is better for complex, multi-step tasks where you need a clear strategy upfront.
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class PlanStep(BaseModel):
step_number: int
description: str
tool: str | None = None
expected_output: str
class Plan(BaseModel):
goal: str
steps: list[PlanStep]
def create_plan(task: str) -> Plan:
"""Use the LLM to create a structured plan."""
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{"role": "system", "content": """You are a planning agent.
Given a task, break it down into concrete steps.
For each step, specify which tool to use (if any) and what the expected output is.
Available tools: search, calculator, file_read, file_write, code_execute."""},
{"role": "user", "content": task},
],
response_format=Plan,
)
return response.choices[0].message.parsed
def execute_plan(plan: Plan) -> dict:
"""Execute each step of the plan."""
results = {}
for step in plan.steps:
print(f"Executing Step {step.step_number}: {step.description}")
if step.tool:
result = execute_tool(step.tool, step.description)
results[step.step_number] = result
print(f" Result: {result}")
else:
results[step.step_number] = "No tool needed - reasoning step"
return results
# Example
plan = create_plan("Research the population of France and Germany, then calculate the ratio.")
print(f"Plan: {plan.goal}")
for step in plan.steps:
print(f" {step.step_number}. {step.description} [tool: {step.tool}]")
Plans are not set in stone. If execution reveals unexpected results, the agent should be able to re-plan. Production systems often re-invoke the planner after each step or when a step fails.
Reflexion
Reflexion (Shinn et al., 2023) adds a self-evaluation layer. After generating a response, the agent critiques its own output, identifies issues, and tries again with the feedback incorporated.
def reflexion_agent(
question: str,
max_attempts: int = 3,
evaluator_model: str = "gpt-4o",
) -> str:
"""Agent that self-reflects and improves its answers."""
attempt_history = []
for attempt in range(max_attempts):
# Generate answer with full history
context = f"Question: {question}\n\n"
if attempt_history:
context += "Previous attempts and reflections:\n"
for h in attempt_history:
context += f"- Attempt: {h['answer']}\n"
context += f" Critique: {h['critique']}\n\n"
context += "Please provide your best answer:"
answer = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": context}],
temperature=0.3,
).choices[0].message.content
# Self-critique
critique = client.chat.completions.create(
model=evaluator_model,
messages=[
{"role": "system", "content": """You are an expert evaluator.
Critique the given answer. Identify:
1. Factual errors
2. Missing information
3. Logical inconsistencies
4. Areas for improvement
If the answer is satisfactory, say 'VERIFIED'."""},
{"role": "user", "content": f"Question: {question}\nAnswer: {answer}"},
],
).choices[0].message.content
attempt_history.append({"answer": answer, "critique": critique})
if "VERIFIED" in critique:
print(f"Answer verified on attempt {attempt + 1}")
return answer
print(f"Attempt {attempt + 1} critique: {critique[:100]}...")
return attempt_history[-1]["answer"]
Comparing Agent Patterns
| Pattern | Best For | Strengths | Weaknesses |
|---|---|---|---|
| ReAct | Simple tool-calling tasks | Grounded in real data, interleaved reasoning | Can get stuck in loops, slow for complex plans |
| Plan-and-Execute | Complex multi-step tasks | Clear strategy, parallelizable steps | Inflexible if plan becomes stale |
| Reflexion | Tasks requiring high accuracy | Self-improving, catches errors | Expensive (multiple LLM calls), slower |
Start with ReAct for simple tasks. Use Plan-and-Execute when you can decompose the problem into independent steps. Add Reflexion when accuracy is critical and you can afford the extra compute.