AI research, systems, and engineering notes.
The Production Agent Stack
A reliable agent is not just an LLM connected to tools. A production agent stack is a system of layered responsibilities. The runtime owns execution state and governs workflow progression. The planner proposes next steps, but proposals are not execution. Memory provides contextual recall without serving as the source of truth. Agent interoperability enables structured delegation, while tools expose external capabilities through standardized protocols such as MCP. Validation transforms probabilistic model outputs into structured, policy-constrained proposals that can safely enter the execution pipeline. Execution itself occurs inside isolated runtime environments where side effects can be controlled, audited, recovered, or rolled back.
- May 10, 2026
World Models are becoming the simulation substrate for Agents
Agent world models are emerging as an important simulation layer between reasoning and execution. Early LLM agents followed a fragile loop: prompt, think, call a tool, wait for the result. In production, that loop is expensive and risky because the agent often cannot predict whether an action will move the workflow forward, fail silently, or mutate state in an unsafe way. A world model acts as a surrogate environment. Given a current state and candidate action, it predicts likely next states, failure modes, and observations. This allows agents to rank possible actions before touching real systems.
#AI Agent#Production AI#System Design - May 2, 2026
The Runtime Behind Production AI
A layered framework for scaling production AI systems begins with the SLA: latency, throughput, reliability, cost per resolved task, fallback behavior, and quality targets. Those requirements drive the architecture of the runtime — spanning the edge gateway, safety and governance, orchestration and routing, inference serving, compute scheduling, context and state management, model lifecycle operations, and observability.
#AI Agent#Production AI#AI Infra#System Design - Apr 21, 2026
Agent Observability Is Not Optional
Production agents are hard to operate because teams need to understand why they acted. Traditional observability tracks service health: latency, errors, throughput, CPU, and memory. Agent observability must go further. It must capture intent, workflow state, retrieved context, model and prompt versions, tool proposals, policy checks, approvals, state mutations, and final outcomes. Enterprise trust requires replay. A reliable agent system must be able to prove which decision, context bundle, and policy check allowed an autonomous action to happen.
#AI Agent#Production AI#System Design#AI Infra - Apr 6, 2026
State Is the Hard Part of Production Agents
As AI agents move from short-lived chat interactions to long-running autonomous systems, the hardest engineering problems are no longer about prompts or model quality. They are about state management, replay safety, memory hierarchy, checkpointing, and transactional execution. Production agents need a cache-aware, transactional runtime. Agent state should not be a probabilistic byproduct of a chat log; it should be a deterministic projection of validated events.
#AI Agent#AI Infra#Production AI - Mar 17, 2026
Production Agents Run on an Autonomy Spectrum
Production agents should not be designed around the fantasy of full autonomy. In real environments, agents face brittle interfaces, evolving user preferences, security gates, ambiguous state, and irreversible actions. The goal is not to remove humans entirely, but to build systems that know when autonomy is safe and when control should be reduced. A reliable agent is not one that never needs help. It is one that knows when to slow down, ask for confirmation, or hand control back.
#AI Agent#System Design#Production AI - Mar 10, 2026
Agent Reliability Lives in the Runtime
In production, agent behavior is shaped by the runtime around the model: which tools are visible, when retrieval happens, how retries are handled, what state is persisted, and who is allowed to commit mutations. Reliable agents require more than better prompts or stronger models. They need runtime architecture. Framework defaults, tool visibility, retry policies, and context assembly rules can change behavior even when the underlying model stays the same.
#AI Agent#System Design#AI Infra#Production AI - Feb 20, 2026
Design Agents Around Workflows, Not Chat Turns
Chat is a useful interface, but it becomes a weak system design primitive once agents are expected to complete real work. A reliable agent should advance a process, not merely generate text. That requires routing simple requests to deterministic paths, using retrieval when grounding is needed, reserving reasoning for ambiguous tasks, and separating planning from execution. For repeatable workflows, LLMs can generate structured plans while deterministic engines handle tool calls, retries, and state transitions. Production agents should be designed around explicit, inspectable, and evaluable workflow state—not reconstructed from chat history every time.
#AI Agent#Production AI#System Design - Feb 9, 2026
Routing Before Reasoning
Production agents should not send every request to the most expensive reasoning path. As reasoning models become more capable, they also introduce new production risks: higher latency, unpredictable cost, KV-cache pressure, and unnecessary “overthinking” for simple requests. Before invoking deep inference, tool use, or multi-step planning, a production agent should first decide which path is actually needed. Production agents are control systems. The real engineering value is not only in the model, but in the controller that decides when to reason, when to execute, and when to ask for human approval.
#AI Agent#Production AI#System Design - Dec 16, 2025
Search Is Becoming Agent Infrastructure
Search is no longer just a user-facing answer interface. In production agent systems, it is becoming the context acquisition layer of the agent runtime. Traditional search returned ranked documents and left the user to interpret results. Early RAG systems followed a similar pattern: retrieve evidence, inject it into the prompt, and generate a response. But agents use search differently. They invoke search as an internal workflow step to clarify intent, retrieve evidence, choose tools, verify state, inspect logs, and recover from failures.
#AI Agent#AI Infra#Search#Production AI - Nov 19, 2025
Demystifying Agentic Search Engines
Agentic search engines—such as Google AI Mode, Perplexity, Bing Copilot, ChatGPT Search no longer means “type keywords, get ten blue links.” AI Search experience capable of understanding tasks, planning queries, calling tools, and synthesizing results and deliver a conversational response with inline citations, minimizing user effort. In this post, I’ll walk through the stack from bottom to top, how it crawls and indexes pages, how it retrieves and ranks information, and how recent features like RAG and Agentic search build upon these foundations.
#System Design#RAG#Retrieval#LLM - Nov 1, 2025
Modern Recommendation System Infrastructure
Building Modern Recommendation Systems introduces a comprehensive, end-to-end pipeline that drives intelligent recommendations. The post walks through the full machine learning workflow — from raw data preparation and feature engineering to model training, deployment, real-time inference, and system monitoring.
#System Design#RecSys#Recommendation#ML system