Skip to main content

LLM Agents

An LLM agent is a system where a language model acts as a reasoning engine that decides what actions to take in order to accomplish a goal. Unlike a simple chatbot that responds to a single prompt, an agent operates in a loop — observing its environment, thinking about what to do next, executing actions, and learning from the results.

The Agent Loop

At its core, every agent follows a similar cycle:

code
┌─────────────────────────────────┐
│ Observe │
│ (Read input / environment) │
└──────────────┬──────────────────┘


┌─────────────────────────────────┐
│ Think │
│ (LLM reasons about next step) │
└──────────────┬──────────────────┘


┌─────────────────────────────────┐
│ Act │
│ (Execute tool / take action) │
└──────────────┬──────────────────┘


┌─────────────────────────────────┐
│ Reflect │
│ (Evaluate result, adapt plan) │
└──────────────┴──────────────────┘

└──────► Loop back to Observe

ReAct (Reasoning + Acting)

The ReAct pattern, introduced by Yao et al. (2022), interleaves reasoning traces with action steps. The LLM generates a thought, then an action, observes the observation, and repeats.

python
import openai
import json

SYSTEM_PROMPT = """You are a helpful assistant that can use tools.

Available tools:
- search(query: str) -> str: Search the web for information
- calculator(expression: str) -> float: Evaluate a math expression
- get_weather(city: str) -> str: Get current weather for a city

Respond in the following format:
Thought: <your reasoning>
Action: <tool_name>(<arguments>)
OR
Thought: <your reasoning>
Answer: <final answer>
"""

def run_react_agent(question: str, max_iterations: int = 5) -> str:
"""Run a simple ReAct agent loop."""
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": question},
]

for i in range(max_iterations):
response = openai.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
temperature=0,
)
reply = response.choices[0].message.content
messages.append({"role": "assistant", "content": reply})
print(f"--- Iteration {i + 1} ---")
print(reply)

# Check if the agent has a final answer
if "Answer:" in reply:
return reply.split("Answer:")[-1].strip()

# Parse and execute the action
if "Action:" in reply:
action_line = [
line for line in reply.split("\n") if line.startswith("Action:")
][0]
action_str = action_line.replace("Action:", "").strip()
# Parse tool call (simplified)
func_name, args_str = action_str.split("(", 1)
args_str = args_str.rstrip(")")

# Execute tool
observation = execute_tool(func_name, args_str)
messages.append({"role": "user", "content": f"Observation: {observation}"})

return "Agent exceeded maximum iterations."


def execute_tool(func_name: str, args_str: str) -> str:
"""Execute a tool and return the observation."""
tools = {
"search": lambda q: f"Search results for: {q}",
"calculator": lambda e: str(eval(e)), # In production, use a safe evaluator
"get_weather": lambda c: f"Weather in {c}: 22°C, sunny",
}
func = tools.get(func_name)
if func:
return func(args_str)
return f"Unknown tool: {func_name}"


# Example usage
result = run_react_agent("What is the temperature in Tokyo multiplied by 2?")
ReAct vs Chain-of-Thought

ReAct is different from simple Chain-of-Thought (CoT) prompting. CoT only generates reasoning traces, while ReAct interleaves reasoning with actual tool calls and their results. This makes the agent grounded in real data rather than just its own reasoning.

Plan-and-Execute

The Plan-and-Execute pattern separates the agent into two components:

  1. Planner: Generates a high-level plan (list of steps)
  2. Executor: Carries out each step sequentially, potentially calling tools

This approach is better for complex, multi-step tasks where you need a clear strategy upfront.

python
from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class PlanStep(BaseModel):
step_number: int
description: str
tool: str | None = None
expected_output: str

class Plan(BaseModel):
goal: str
steps: list[PlanStep]

def create_plan(task: str) -> Plan:
"""Use the LLM to create a structured plan."""
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{"role": "system", "content": """You are a planning agent.
Given a task, break it down into concrete steps.
For each step, specify which tool to use (if any) and what the expected output is.
Available tools: search, calculator, file_read, file_write, code_execute."""},
{"role": "user", "content": task},
],
response_format=Plan,
)
return response.choices[0].message.parsed

def execute_plan(plan: Plan) -> dict:
"""Execute each step of the plan."""
results = {}
for step in plan.steps:
print(f"Executing Step {step.step_number}: {step.description}")
if step.tool:
result = execute_tool(step.tool, step.description)
results[step.step_number] = result
print(f" Result: {result}")
else:
results[step.step_number] = "No tool needed - reasoning step"
return results

# Example
plan = create_plan("Research the population of France and Germany, then calculate the ratio.")
print(f"Plan: {plan.goal}")
for step in plan.steps:
print(f" {step.step_number}. {step.description} [tool: {step.tool}]")
Plan Re-evaluation

Plans are not set in stone. If execution reveals unexpected results, the agent should be able to re-plan. Production systems often re-invoke the planner after each step or when a step fails.

Reflexion

Reflexion (Shinn et al., 2023) adds a self-evaluation layer. After generating a response, the agent critiques its own output, identifies issues, and tries again with the feedback incorporated.

python
def reflexion_agent(
question: str,
max_attempts: int = 3,
evaluator_model: str = "gpt-4o",
) -> str:
"""Agent that self-reflects and improves its answers."""

attempt_history = []

for attempt in range(max_attempts):
# Generate answer with full history
context = f"Question: {question}\n\n"
if attempt_history:
context += "Previous attempts and reflections:\n"
for h in attempt_history:
context += f"- Attempt: {h['answer']}\n"
context += f" Critique: {h['critique']}\n\n"

context += "Please provide your best answer:"

answer = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": context}],
temperature=0.3,
).choices[0].message.content

# Self-critique
critique = client.chat.completions.create(
model=evaluator_model,
messages=[
{"role": "system", "content": """You are an expert evaluator.
Critique the given answer. Identify:
1. Factual errors
2. Missing information
3. Logical inconsistencies
4. Areas for improvement
If the answer is satisfactory, say 'VERIFIED'."""},
{"role": "user", "content": f"Question: {question}\nAnswer: {answer}"},
],
).choices[0].message.content

attempt_history.append({"answer": answer, "critique": critique})

if "VERIFIED" in critique:
print(f"Answer verified on attempt {attempt + 1}")
return answer

print(f"Attempt {attempt + 1} critique: {critique[:100]}...")

return attempt_history[-1]["answer"]

Comparing Agent Patterns

PatternBest ForStrengthsWeaknesses
ReActSimple tool-calling tasksGrounded in real data, interleaved reasoningCan get stuck in loops, slow for complex plans
Plan-and-ExecuteComplex multi-step tasksClear strategy, parallelizable stepsInflexible if plan becomes stale
ReflexionTasks requiring high accuracySelf-improving, catches errorsExpensive (multiple LLM calls), slower
Choosing a Pattern

Start with ReAct for simple tasks. Use Plan-and-Execute when you can decompose the problem into independent steps. Add Reflexion when accuracy is critical and you can afford the extra compute.