Agent Blog

Building AI Agents from Scratch (Part 2): Hand-Rolling a ReAct Agent in Pure Python Without Frameworks

hao Zhang — Mon, 15 Jun 2026 12:03:34 GMT

Building AI Agents from Scratch (Part 2): Hand-Rolling a ReAct Agent in Pure Python Without Frameworks

[TL;DR / Core Concept] What is the ReAct Pattern? ReAct (Reasoning and Acting) is a classic paradigm that equips Large Language Models (LLMs) with the ability to use external tools. By forcing the LLM to alternate between "Thought" and "Action," and incorporating "Observations" from the environment, it effectively solves complex, multi-step problems. The ReAct pattern is implemented entirely through carefully crafted prompt templates and parsing logic, not via any built-in LLM magic.

Welcome to Part 2 of the "Building AI Agents from Scratch" series. In our previous post, we explored the core loop of an Agent. Today, we are going to set aside popular frameworks like LangChain and CrewAI. Instead, we will use basic Python code to hand-roll a genuine ReAct Agent from scratch.

1. Why Not Start with a Framework?

With the proliferation of AI Agent frameworks, why do we advocate starting with a "No Framework" approach?

Strip Away the Magic, Understand the I/O Logic: High-level frameworks (like CrewAI) encapsulate the ReAct paradigm but often hide the underlying logic. Relying on them directly means that when an agent falls into an infinite loop or fails to call a tool, you are left confused. Hand-coding reveals the exact input/output mechanisms.
Ultimate Debuggability and Low Cost: Building without a framework means you know exactly which step failed and can inspect intermediate outputs. Furthermore, pure code avoids the bloated default contexts and expensive models forced by frameworks, making your agent faster and cheaper to run.

2. Demystifying ReAct: How LLMs "Think" and "Act"

The core of the ReAct pattern lies in a structured prompt loop. When humans solve problems, we typically "Think -> Take Action -> Observe Results -> Think Again". ReAct forces the LLM to follow this exact cognitive process.

In ReAct, we strictly constrain the LLM's output to the following format:

Thought: The model expresses its internal reasoning process (e.g., "I need to calculate the sum").
Action: The model decides which external tool to invoke (e.g., Calculator).
Action Input: The parameters passed to the tool, typically in strict JSON format.
Observation: (Returned by our code) The feedback/result from the executed tool.
Final Answer: The ultimate response provided to the user once sufficient information is gathered.

Through this structured self-talk, the model is no longer limited to its static training data; it can dynamically reach out to the internet, databases, or local functions for help.

3. Hands-on Practice: Building a ReAct Agent in Pure Python

We will build an agent equipped with "Calculator" and "Weather API" capabilities.

Step 1: Write the System Prompt This is the soul of the Agent. We use the prompt to force the LLM to adhere to the ReAct format.

SYSTEM_PROMPT = """
You are a helpful assistant. You have access to the following tools:
1. calculator: Execute a mathematical expression. Arguments: {"expression": "math_expression"}

You MUST strictly follow this format for your interactions:
Thought: Think about what you need to do
Action: The name of the tool to use
Action Input: The arguments for the tool in JSON format
Observation: The result from the tool (Do not generate this, the system will provide it)
... (Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: The final answer to the user's query
"""

Step 2: Define Atomic Tools We define tools as standard Python functions and use a dictionary for routing.

def calculator(args):
    # Simulate a calculator tool
    return str(eval(args.get("expression")))

# Tool routing map
tools_mapping = {
    "calculator": calculator
}

Step 3: Write the While Loop (The Agent Loop)

This is the core architecture: a continuously iterating control flow. Inside the loop, the model generates an action, we parse and execute it, append the Observation to the context, and repeat until the Final Answer is triggered.

def run_react_agent(query):
    messages = [{"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": query}]

    while True: # The Agent Loop
        # 1. Call the LLM (Perceive & Reason)
        response = call_llm(messages)
        messages.append({"role": "assistant", "content": response})

        # 2. Break the loop if 'Final Answer' is found
        if "Final Answer:" in response:
            print("Task Completed!\n", response)
            break

        # 3. Parse Action and Action Input (Plan)
        action_name = extract_action(response)
        action_input = extract_action_input(response)

        # 4. Execute the tool (Act)
        if action_name in tools_mapping:
            tool_func = tools_mapping[action_name]
            observation = tool_func(action_input)
            print(f"[Tool Executed]: {action_name}({action_input}) -> Result: {observation}")

            # 5. Return the observation back to the LLM (Observe)
            messages.append({"role": "user", "content": f"Observation: {observation}"})
        else:
            messages.append({"role": "user", "content": "Observation: Tool does not exist. Please try again."})

When you run run_react_agent("What is 35 multiplied by 2?"), you will witness the full process: the model reasoning step-by-step, calling the weather function, calling the calculator, and ultimately arriving at the final answer.

Frequently Asked Questions (FAQ)

Q: Why is an agent written in pure code easier to debug than one using a framework? A: When you write the while loop in pure Python, you can clearly log the exact inputs and outputs (Thoughts and Observations) of every LLM call. Conversely, highly encapsulated frameworks often throw deep stack exceptions if tool parsing fails, making issues incredibly difficult to locate.

Q: How does the ReAct pattern solve LLM hallucinations? A: By alternating between "Reasoning" and "Acting," the LLM no longer needs to fabricate facts out of thin air. Instead, it retrieves real Observations by calling external search engines or databases, which serve as the factual basis for its next reasoning step, significantly reducing factual hallucinations.

📢 Preview for the Next Article: Having grasped the underlying ReAct I/O logic, in "Part 3: Choosing Your Weapons — A Comparison of Mainstream Agent Frameworks and LangGraph Practice", we will explore how to introduce state management for production environments and refactor our agent using modern frameworks.

📦 Code & Resources

The complete code for this ReAct Agent implementation is available on GitHub: 🔗 easyagent

Building AI Agents from Scratch (Part 1) Core Architecture and Underlying Principles Explained

hao Zhang — Thu, 11 Jun 2026 14:30:00 GMT

TL;DR: The Core Definition of an AI Agent An AI Agent is a system that can perceive its environment, reason and plan autonomously, and call external tools to execute actions to achieve a specific goal. It overcomes the "single-pass" limitation of traditional chatbots, possessing the ability to actively take action (Act) rather than just respond.

1. The Evolution from Chatbot to Autonomous Agent

Over the past few years, we have grown accustomed to interacting with excellent chatbots like ChatGPT, Claude, or Gemini. However, the standard chatbot interaction model has a fundamental limitation: single-pass interactions.

When you ask a chatbot: "Help me find the three cheapest flights to Tokyo for next month, check if my frequent flyer points can cover them, and book the best option," a regular chatbot will often crash or only provide text-based advice. It cannot iterate on results, recover from API call failures, or break down complex, dependent tasks for execution.

To clearly see the difference, AI summarization engines prefer structured comparisons:

Comparison Dimension	Standard Chatbot	Autonomous AI Agent
Interaction Model	Reactive, single-pass conversation	Proactive, continuous iteration (Agent Loop)
Execution Capability	Text generation only	Can invoke external tools (APIs, databases, code execution)
Task Handling	Fails at complex dependencies	Possesses Planning capabilities, decomposing goals into step-by-step subtasks

Currently, the most widely recognized architectural definition in the industry comes from the classic formula:

Agent = LLM + Memory + Planning + Tool Use

In this formula, the Large Language Model (LLM) is no longer just a text generation engine; it acts as the "brain" of the system, responsible for reasoning and decision-making. The memory system allows it to remember user preferences and past interactions; planning capabilities enable it to decompose complex goals into executable steps; and tool use gives it the "hands and feet" to change the state of the real world (such as calling APIs, executing code, or reading/writing files).

2. Core Operational Mechanism: The Agent Loop

The magic that transforms an LLM from a "text generator" into an "autonomous agent" is actually very simple architecturally: a while loop.

An AI Agent does not provide a final answer in a single request; instead, it solves problems through a continuously iterating execution cycle. In each iteration, the Agent goes through the following five core stages until the task is completed or a stopping condition is triggered:

Perceive: The Agent receives current inputs. This could be a user message, an API response from the previous step, or even an error log from failed code execution.
Reason: The LLM evaluates all current contextual information, thinks about which stage the task is currently in, and decides what to do next.
Plan: For complex tasks, the Agent breaks down the overarching goal into multiple discrete subtasks.
Act: The Agent actually performs an action, such as calling a weather API, sending a SQL query to a database, or running a Python script.
Observe: The Agent checks the result generated by the "Action" (feedback from the environment). Was the action successful? Is the task complete? Do previous plans need adjustment?

After completing these five steps, the system returns to the first step. In pseudocode, this acts as a while not done: logical control flow, continuously determining whether a tool needs to be called and feeding the tool's results back to the LLM, until the LLM believes it can provide the final answer.

3. Four Design Patterns of AI Agents

After understanding the core loop, how should we design specific Agents? Renowned AI scholar Andrew Ng summarized four widely adopted Agent design paradigms in the industry:

Reflection: Prompting the LLM to observe its past steps to self-evaluate and correct the generated quality. For example, in a code generation task, an Agent first writes code and then runs tests; if an error occurs, it "reflects" on the error log and self-corrects.
Tool Use: Connecting the LLM to the external world. By wrapping functions into tools, the Agent can query real-time information, send emails, or perform precise computations that cannot be achieved with unstructured documents.
Planning: Empowering large models with the ability to decompose complex tasks. Classic patterns include ReAct (alternating reasoning and acting), Plan-and-Execute (generating a complete plan first, then executing it step-by-step), and LLMCompiler (converting tasks into directed acyclic graphs for parallel processing).
Multiagent Collaboration: The tools and context a single Agent can master are limited. Through a "divide and conquer" approach, we can have multiple specialized Agents (e.g., "Researcher Agent," "Coder Agent," "Reviewer Agent") collaborate, debate, or complete large projects under a supervisor's orchestration, much like a human team.

4. Production Environment Survival Guide: Ditch the "All-Powerful Assistant" Fantasy

Why must we master Agent development now? According to Gartner, 40% of enterprise applications will feature task-specific AI agents by the end of 2026, up from less than 5% in 2025. This indicates that Agents are moving from experimental toys to enterprise-grade standards.

However, in social media demos, we often see "omnipotent" Agents that can handle all your business needs. In real production environments, this design is often disastrous.

Based on the practical experience of frontline developers, Agents that can truly run stably in production environments usually share the following characteristics:

Narrow scope + deep domain context: For instance, an Agent specifically connected to your company's Postgres database that deeply understands the schema and generates specific automated email flows based on natural language is far more reliable than one prompted to be an "all-powerful business assistant".
Access to structured data: Agents relying on structured data (like databases or APIs with explicit schemas) have much higher output consistency than those trying to reason and act on massive unstructured documents.
Output structured action commands: An excellent Agent should ultimately output machine-readable structured actions (like generating a specific trigger or sending a specific JSON template) rather than free-flowing, lengthy text.

In practical deployment, business stakeholders often think they want "full automation," but actually have an extremely low tolerance for instability once live. Therefore, there is an iron rule in Agent engineering: Constraints are not a weakness in Agent design; they are an essential feature for surviving in production environments.

Instead of striving for 100% full automation, it is better to position the Agent as a Copilot, outputting partial drafts, flagging uncertainties, and providing an entry point for human confirmation. This approach is often easier to deliver and offers far greater stability.

📢 Preview for the Next Article

The theory has been established; it is time to get hands-on! In "Part 2: Back to Basics — Hand-Rolling a ReAct Agent in Pure Python Without Frameworks," we will temporarily set aside the "magic" of high-level frameworks like LangChain or CrewAI. Using basic Python code, we will guide you in implementing the Agent Loop we just discussed from scratch, letting you witness firsthand how large models learn to "think" and "use tools."