Category
System Design
- Nov 1, 2025
Modern Recommendation System Infrastructure
Building Modern Recommendation Systems introduces a comprehensive, end-to-end pipeline that drives intelligent recommendations. The post walks through the full machine learning workflow — from raw data preparation and feature engineering to model training, deployment, real-time inference, and system monitoring.
#System Design#RecSys#Recommendation#ML system - Oct 24, 2025
Design a Modern Recommendation System
This post explores the full RecSys architecture, emphasizing the core models that drive each stage of the RecSys pipeline — from Retrieval for large-scale candidate generation, to Pre-ranking for efficient filtering, Ranking for fine-grained relevance modeling, and Re-ranking for balancing diversity and control.
#System Design#RecSys#Recommendation#ML system - Jun 29, 2025
The ML Factory: Building Production ML Systems
Building production ML systems is far more than selecting a model. Success requires thinking in terms of a full lifecycle: defining precise functional and non-functional requirements, designing robust data pipelines, splitting logic between models and rules, versioning and deploying models, prompts, and embeddings as coherent units, and continuously monitoring system performance and product impact.
#ML#AI#System Design#ML system - Jun 9, 2026
Production Agents Need Workflow Graphs
The production abstraction for AI is not a “smarter agent loop.” It is a stateful execution graph: a system organized around explicit state transitions, dependency-aware scheduling, versioned plans, deterministic control boundaries, and carefully managed side effects. Intelligence alone is insufficient. Production reliability emerges from how execution, authority, recovery, and state mutation are structured across the graph.
#AI Agent#Production AI#System Design - May 26, 2026
Building Auditable LLM Workflows for Medical Coding
Medical coding is a high-stakes extraction and verification problem, not a simple text generation task. Asking an LLM to read a long clinical note and directly output ICD codes risks hallucinated mappings, missed comorbidities, and results that are difficult for human coders to audit. A reliable medical coding system may benefit from an LLM-assisted workflow: extract clinical evidence, retrieve candidate codes, verify mappings, validate against the taxonomy, and route uncertainty to human review. The model should not be expected to memorize every code. Its job is to help produce auditable evidence inside a controlled workflow.
#Applied AI#NLP#System Design - May 2, 2026
The Runtime Behind Production AI
A layered framework for scaling production AI systems begins with the SLA: latency, throughput, reliability, cost per resolved task, fallback behavior, and quality targets. Those requirements drive the architecture of the runtime — spanning the edge gateway, safety and governance, orchestration and routing, inference serving, compute scheduling, context and state management, model lifecycle operations, and observability.
#AI Agent#Production AI#AI Infra#System Design - Apr 6, 2026
State Is the Hard Part of Production Agents
As AI agents move from short-lived chat interactions to long-running autonomous systems, the hardest engineering problems are no longer about prompts or model quality. They are about state management, replay safety, memory hierarchy, checkpointing, and transactional execution. Production agents need a cache-aware, transactional runtime. Agent state should not be a probabilistic byproduct of a chat log; it should be a deterministic projection of validated events.
#AI Agent#AI Infra#Production AI - Mar 2, 2026
Automating the Prompt Production Line
In production LLM systems, a prompt is no longer just a string written by a human. It is a deployable artifact. This post explains how automated prompt optimization actually works: build eval sets, collect optimization signals, generate candidates, and evaluate changes in stages. Prompts become versioned, testable artifacts with eval gates, canary rollouts, observability, and rollback.
#LLMOps#AI Infra#LLM#Production AI - Feb 20, 2026
Design Agents Around Workflows, Not Chat Turns
Chat is a useful interface, but it becomes a weak system design primitive once agents are expected to complete real work. A reliable agent should advance a process, not merely generate text. That requires routing simple requests to deterministic paths, using retrieval when grounding is needed, reserving reasoning for ambiguous tasks, and separating planning from execution. For repeatable workflows, LLMs can generate structured plans while deterministic engines handle tool calls, retries, and state transitions. Production agents should be designed around explicit, inspectable, and evaluable workflow state—not reconstructed from chat history every time.
#AI Agent#Production AI#System Design - Feb 9, 2026
Routing Before Reasoning
Production agents should not send every request to the most expensive reasoning path. As reasoning models become more capable, they also introduce new production risks: higher latency, unpredictable cost, KV-cache pressure, and unnecessary “overthinking” for simple requests. Before invoking deep inference, tool use, or multi-step planning, a production agent should first decide which path is actually needed. Production agents are control systems. The real engineering value is not only in the model, but in the controller that decides when to reason, when to execute, and when to ask for human approval.
#AI Agent#Production AI#System Design - Jan 25, 2026
The Production Agent Stack
A reliable agent is not just an LLM connected to tools. A production agent stack is a system of layered responsibilities. The runtime owns execution state and governs workflow progression. The planner proposes next steps, but proposals are not execution. Memory provides contextual recall without serving as the source of truth. Agent interoperability enables structured delegation, while tools expose external capabilities through standardized protocols such as MCP. Validation transforms probabilistic model outputs into structured, policy-constrained proposals that can safely enter the execution pipeline. Execution itself occurs inside isolated runtime environments where side effects can be controlled, audited, recovered, or rolled back.
#AI Agent#Production AI#System Design - Jan 16, 2026
Building a Simple Agent from Scratch
This post walks through the implementation of a minimal invoice-processing agent. The agent parses an invoice, verifies it against a ledger, requests approval when needed, and writes the final entry only after validation. The core pattern is simple: state constrains actions, the planner proposes one, validation gates it, tools return observations, the reducer updates state, and the runtime decides whether to stop. Before adopting complex orchestration frameworks, build this loop first.
#AI Agent#Applied AI#System Design