Agentic Architectures (ReAct + Tool Use)

slug: 009-agentic-react number: 9 title: "Agentic Architectures (ReAct + Tool Use)" description: "ReAct, tool use, and the systems patterns that make LLM agents survive contact with production." youtubeId: null publishedAt: null anchor: authors: "Shunyu Yao et al." year: 2023 title: "ReAct: Synergizing Reasoning and Acting in Language Models" institution: "Princeton & Google Research" venue: "ICLR"

The pattern at a glance

Long-form article coming soon. The narration below is the spoken version of this episode — read it as a quick transcript while the written companion is in draft.

Transcript

A user asks your customer support agent: what's my order status?

The agent thinks. It calls the order lookup API. It thinks again. It calls the user profile API. It thinks. It calls the shipping carrier API. It thinks. It tries again. Forty-seven tool calls and twenty dollars in API costs later, it answers — wrongly.

This is what most agent failures look like. Not crashes. Not hallucinations. Loops, and waste.

This is a ReAct agent in production.

Large language models are good at two things that used to be separate: reasoning about a problem, and calling external tools to gather information.

Reasoning alone makes the model a smart-sounding answer machine. It hallucinates whatever it doesn't know.

Tool-calling alone is just a worse version of writing API client code.

The combination — let the model decide what to fetch next based on what it just learned — is what makes an agent. It's also what makes an agent expensive, slow, and unpredictable when done badly.

The pattern that combines them is called ReAct.

ReAct, short for Reason and Act, was named in a 2023 paper by Shunyu Yao and colleagues at Princeton and Google Research. The paper showed that interleaving reasoning steps with action steps produced better results than either alone — and named the prompt structure that nearly every modern agent framework uses underneath.

ReAct is a loop with three parts.

Thought: the model writes a sentence about what it's trying to figure out next. "I need to check the user's order history."

Action: the model emits a structured tool call. "lookup orders for user twelve thousand three hundred forty-five."

Observation: the system runs the tool and feeds the result back to the model. "Three orders found. Most recent: shipped, tracking ABC one two three."

The model then writes another Thought, based on the new Observation. Another Action. Another Observation.

The loop continues until the model decides it has enough information to give a final answer — at which point it outputs that answer instead of another tool call.

The whole conversation, including all Thoughts, Actions, and Observations, lives in a growing scratchpad that the model sees on every iteration.

A ReAct agent is four components.

The model — usually a frontier large language model with tool-calling support.

The toolset — a list of functions the model is allowed to call, each with a name, a parameter schema, and a description.

The scratchpad — the running history of the conversation, including every Thought, Action, and Observation so far.

The termination logic — code that decides when to stop. Maximum iterations, maximum tokens, maximum cost, or a model-decided final answer.

That's it. The frameworks add abstractions, retry logic, prompt templates, and observability — but the core is this four-component loop.

Four traps every ReAct agent hits.

One: infinite loops. The model gets confused about whether it has answered the question. It keeps calling tools, hoping the next observation will help. A hard cap on iterations is non-negotiable.

Two: hallucinated tool calls. The model invents a tool that doesn't exist, or calls a real tool with invented parameters. Validate every call against the schema before executing. Reject and retry with an error observation.

Three: context window pollution. The scratchpad grows with every iteration. By turn ten, the original user question is buried under ten Observations of varying relevance. Use scratchpad compression — summarize old turns, keep only the most recent observations verbatim.

Four: cost. Every iteration calls the model again with a longer prompt. A simple question that triggers ten iterations on a frontier model costs real money. Set per-request cost ceilings, log them, alert on outliers.

If the answer can be reached by deterministic code, use deterministic code. An agent is a model that decides what to do at runtime — that's the whole point, and also the whole cost.

A workflow that always queries the same three APIs in the same order is a workflow, not an agent. It's faster, cheaper, more reliable, and easier to debug.

Reach for an agent when the path through the data depends on what the data is. Otherwise, write the function.

ReAct is not magic. It's a four-component loop with one unusual property: the model decides what to do next based on what it just learned. That decision-making is what makes agents useful, and what makes them dangerous in production.

The frameworks add scaffolding. The pattern underneath is always ReAct.

Next episode: idempotency, and why almost every distributed system gets it wrong.