LLM Agents

An LLM agent is a system where a language model acts as a reasoning engine that decides what actions to take in order to accomplish a goal. Unlike a simple chatbot that responds to a single prompt, an agent operates in a loop — observing its environment, thinking about what to do next, executing actions, and learning from the results.

The Agent Loop

At its core, every agent follows a similar cycle:

code

┌─────────────────────────────────┐
│          Observe                │
│   (Read input / environment)    │
└──────────────┬──────────────────┘
               │
               ▼
┌─────────────────────────────────┐
│           Think                 │
│  (LLM reasons about next step)  │
└──────────────┬──────────────────┘
               │
               ▼
┌─────────────────────────────────┐
│            Act                  │
│  (Execute tool / take action)   │
└──────────────┬──────────────────┘
               │
               ▼
┌─────────────────────────────────┐
│          Reflect                │
│  (Evaluate result, adapt plan)  │
└──────────────┴──────────────────┘
               │
               └──────► Loop back to Observe

ReAct (Reasoning + Acting)

The ReAct pattern, introduced by Yao et al. (2022), interleaves reasoning traces with action steps. The LLM generates a thought, then an action, observes the observation, and repeats.

python

import openai
import json

SYSTEM_PROMPT = """You are a helpful assistant that can use tools.

Available tools:
- search(query: str) -> str: Search the web for information
- calculator(expression: str) -> float: Evaluate a math expression
- get_weather(city: str) -> str: Get current weather for a city

Respond in the following format:
Thought: <your reasoning>
Action: <tool_name>(<arguments>)
OR
Thought: <your reasoning>
Answer: <final answer>
"""

def run_react_agent(question: str, max_iterations: int = 5) -> str:
    """Run a simple ReAct agent loop."""
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": question},
    ]

    for i in range(max_iterations):
        response = openai.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            temperature=0,
        )
        reply = response.choices[0].message.content
        messages.append({"role": "assistant", "content": reply})
        print(f"--- Iteration {i + 1} ---")
        print(reply)

        # Check if the agent has a final answer
        if "Answer:" in reply:
            return reply.split("Answer:")[-1].strip()

        # Parse and execute the action
        if "Action:" in reply:
            action_line = [
                line for line in reply.split("\n") if line.startswith("Action:")
            ][0]
            action_str = action_line.replace("Action:", "").strip()
            # Parse tool call (simplified)
            func_name, args_str = action_str.split("(", 1)
            args_str = args_str.rstrip(")")

            # Execute tool
            observation = execute_tool(func_name, args_str)
            messages.append({"role": "user", "content": f"Observation: {observation}"})

    return "Agent exceeded maximum iterations."


def execute_tool(func_name: str, args_str: str) -> str:
    """Execute a tool and return the observation."""
    tools = {
        "search": lambda q: f"Search results for: {q}",
        "calculator": lambda e: str(eval(e)),  # In production, use a safe evaluator
        "get_weather": lambda c: f"Weather in {c}: 22°C, sunny",
    }
    func = tools.get(func_name)
    if func:
        return func(args_str)
    return f"Unknown tool: {func_name}"


# Example usage
result = run_react_agent("What is the temperature in Tokyo multiplied by 2?")

ReAct vs Chain-of-Thought

ReAct is different from simple Chain-of-Thought (CoT) prompting. CoT only generates reasoning traces, while ReAct interleaves reasoning with actual tool calls and their results. This makes the agent grounded in real data rather than just its own reasoning.

Plan-and-Execute

The Plan-and-Execute pattern separates the agent into two components:

Planner: Generates a high-level plan (list of steps)
Executor: Carries out each step sequentially, potentially calling tools

This approach is better for complex, multi-step tasks where you need a clear strategy upfront.

python

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class PlanStep(BaseModel):
    step_number: int
    description: str
    tool: str | None = None
    expected_output: str

class Plan(BaseModel):
    goal: str
    steps: list[PlanStep]

def create_plan(task: str) -> Plan:
    """Use the LLM to create a structured plan."""
    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": """You are a planning agent.
Given a task, break it down into concrete steps.
For each step, specify which tool to use (if any) and what the expected output is.
Available tools: search, calculator, file_read, file_write, code_execute."""},
            {"role": "user", "content": task},
        ],
        response_format=Plan,
    )
    return response.choices[0].message.parsed

def execute_plan(plan: Plan) -> dict:
    """Execute each step of the plan."""
    results = {}
    for step in plan.steps:
        print(f"Executing Step {step.step_number}: {step.description}")
        if step.tool:
            result = execute_tool(step.tool, step.description)
            results[step.step_number] = result
            print(f"  Result: {result}")
        else:
            results[step.step_number] = "No tool needed - reasoning step"
    return results

# Example
plan = create_plan("Research the population of France and Germany, then calculate the ratio.")
print(f"Plan: {plan.goal}")
for step in plan.steps:
    print(f"  {step.step_number}. {step.description} [tool: {step.tool}]")

Plan Re-evaluation

Plans are not set in stone. If execution reveals unexpected results, the agent should be able to re-plan. Production systems often re-invoke the planner after each step or when a step fails.

Reflexion

Reflexion (Shinn et al., 2023) adds a self-evaluation layer. After generating a response, the agent critiques its own output, identifies issues, and tries again with the feedback incorporated.

python

def reflexion_agent(
    question: str,
    max_attempts: int = 3,
    evaluator_model: str = "gpt-4o",
) -> str:
    """Agent that self-reflects and improves its answers."""

    attempt_history = []

    for attempt in range(max_attempts):
        # Generate answer with full history
        context = f"Question: {question}\n\n"
        if attempt_history:
            context += "Previous attempts and reflections:\n"
            for h in attempt_history:
                context += f"- Attempt: {h['answer']}\n"
                context += f"  Critique: {h['critique']}\n\n"

        context += "Please provide your best answer:"

        answer = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": context}],
            temperature=0.3,
        ).choices[0].message.content

        # Self-critique
        critique = client.chat.completions.create(
            model=evaluator_model,
            messages=[
                {"role": "system", "content": """You are an expert evaluator.
Critique the given answer. Identify:
1. Factual errors
2. Missing information
3. Logical inconsistencies
4. Areas for improvement
If the answer is satisfactory, say 'VERIFIED'."""},
                {"role": "user", "content": f"Question: {question}\nAnswer: {answer}"},
            ],
        ).choices[0].message.content

        attempt_history.append({"answer": answer, "critique": critique})

        if "VERIFIED" in critique:
            print(f"Answer verified on attempt {attempt + 1}")
            return answer

        print(f"Attempt {attempt + 1} critique: {critique[:100]}...")

    return attempt_history[-1]["answer"]

Comparing Agent Patterns

Pattern	Best For	Strengths	Weaknesses
ReAct	Simple tool-calling tasks	Grounded in real data, interleaved reasoning	Can get stuck in loops, slow for complex plans
Plan-and-Execute	Complex multi-step tasks	Clear strategy, parallelizable steps	Inflexible if plan becomes stale
Reflexion	Tasks requiring high accuracy	Self-improving, catches errors	Expensive (multiple LLM calls), slower

Choosing a Pattern

Start with ReAct for simple tasks. Use Plan-and-Execute when you can decompose the problem into independent steps. Add Reflexion when accuracy is critical and you can afford the extra compute.

The Agent Loop​

ReAct (Reasoning + Acting)​

Plan-and-Execute​

Reflexion​

Comparing Agent Patterns​

The Agent Loop

ReAct (Reasoning + Acting)

Plan-and-Execute

Reflexion

Comparing Agent Patterns