Tag
Applied AI
- Apr 21, 2026
Agent Observability Is Not Optional
Production agents are hard to operate because teams need to understand why they acted. Traditional observability tracks service health: latency, errors, throughput, CPU, and memory. Agent observability must go further. It must capture intent, workflow state, retrieved context, model and prompt versions, tool proposals, policy checks, approvals, state mutations, and final outcomes. Enterprise trust requires replay. A reliable agent system must be able to prove which decision, context bundle, and policy check allowed an autonomous action to happen.
#AI Agent#Production AI#Applied AI#System Design - Apr 6, 2026
State Is the Hard Part of Production Agents
As AI agents move from short-lived chat interactions to long-running autonomous systems, the hardest engineering problems are no longer about prompts or model quality. They are about state management, replay safety, memory hierarchy, checkpointing, and transactional execution. Production agents need a cache-aware, transactional runtime. Agent state should not be a probabilistic byproduct of a chat log; it should be a deterministic projection of validated events.
#AI Agent#AI Infra#Production AI#Applied AI - Mar 17, 2026
Production Agents Run on an Autonomy Spectrum
Production agents should not be designed around the fantasy of full autonomy. In real environments, agents face brittle interfaces, evolving user preferences, security gates, ambiguous state, and irreversible actions. The goal is not to remove humans entirely, but to build systems that know when autonomy is safe and when control should be reduced. A reliable agent is not one that never needs help. It is one that knows when to slow down, ask for confirmation, or hand control back.
#AI Agent#System Design#Production AI#Applied AI - Mar 10, 2026
Agent Reliability Lives in the Runtime
Agent reliability is a systems problem, not just a model-quality problem. Frameworks are not neutral wrappers; their orchestration rules, error formats, and tool-calling conventions can change how the same model behaves. Similarly, LLMs should generate proposals, but deterministic runtimes should own validation and committed state changes.
#AI Agent#System Design#AI Infra#Production AI - Feb 20, 2026
Design Agents Around Workflows, Not Chat Turns
Chat is a useful interface, but it becomes a weak system design primitive once agents are expected to complete real work. A reliable agent should advance a process, not merely generate text. That requires routing simple requests to deterministic paths, using retrieval when grounding is needed, reserving reasoning for ambiguous tasks, and separating planning from execution. For repeatable workflows, LLMs can generate structured plans while deterministic engines handle tool calls, retries, and state transitions. Production agents should be designed around explicit, inspectable, and evaluable workflow state—not reconstructed from chat history every time.
#AI Agent#Production AI#Applied AI#System Design - Feb 9, 2026
Routing Before Reasoning
Production agents should not send every request to the most expensive reasoning path. As reasoning models become more capable, they also introduce new production risks: higher latency, unpredictable cost, KV-cache pressure, and unnecessary “overthinking” for simple requests. Before invoking deep inference, tool use, or multi-step planning, a production agent should first decide which path is actually needed. Production agents are control systems. The real engineering value is not only in the model, but in the controller that decides when to reason, when to execute, and when to ask for human approval.
#AI Agent#Applied AI#Production AI#System Design - Jan 25, 2026
The Production Agent Stack
The model is only one component in a larger production agent stack. A production agent needs an orchestrator that owns workflow state, persists progress, handles retries, and recovers from failures. It also needs specialized workers rather than one monolithic “general assistant.” Retrievers, validators, executors, and planners should have clear responsibilities and permission boundaries. Production agents are built from control, state, tools, and evaluation—not prompts alone.
#AI Agent#Production AI#Applied AI#System Design - Dec 16, 2025
Search Is Becoming Agent Infrastructure
Search is no longer just a user-facing answer interface. In production agent systems, it is becoming the context acquisition layer of the agent runtime. Traditional search returned ranked documents and left the user to interpret results. Early RAG systems followed a similar pattern: retrieve evidence, inject it into the prompt, and generate a response. But agents use search differently. They invoke search as an internal workflow step to clarify intent, retrieve evidence, choose tools, verify state, inspect logs, and recover from failures.
#AI Agent#AI Infra#Search#Production AI - Nov 19, 2025
Demystifying Agentic Search Engines
Agentic search engines—such as Google AI Mode, Perplexity, Bing Copilot, ChatGPT Search no longer means “type keywords, get ten blue links.” AI Search experience capable of understanding tasks, planning queries, calling tools, and synthesizing results and deliver a conversational response with inline citations, minimizing user effort. In this post, I’ll walk through the stack from bottom to top, how it crawls and indexes pages, how it retrieves and ranks information, and how recent features like RAG and Agentic search build upon these foundations.
#System Design#RAG#Retrieval#LLM