Archive
Archive
Browse past posts by year and month.
2026-05
- May 26, 2026
Building Auditable LLM Workflows for Medical Coding
Medical coding is a high-stakes extraction and verification problem, not a simple text generation task. Asking an LLM to read a long clinical note and directly output ICD codes risks hallucinated mappings, missed comorbidities, and results that are difficult for human coders to audit. A reliable medical coding system should be built as an LLM-assisted workflow: extract clinical evidence, retrieve candidate codes, verify mappings, validate against the taxonomy, and route uncertainty to human review. The model should not be expected to memorize every code. Its job is to help produce auditable evidence inside a controlled workflow.
#Production AI#Applied AI#NLP#System Design - May 10, 2026
World Models are becoming the simulation substrate for Agents
Agent world models are emerging as an important simulation layer between reasoning and execution. Early LLM agents followed a fragile loop: prompt, think, call a tool, wait for the result. In production, that loop is expensive and risky because the agent often cannot predict whether an action will move the workflow forward, fail silently, or mutate state in an unsafe way. A world model acts as a surrogate environment. Given a current state and candidate action, it predicts likely next states, failure modes, and observations. This allows agents to rank possible actions before touching real systems.
#AI Agent#Production AI#System Design - May 2, 2026
The Runtime Behind Production AI
A layered framework for scaling production AI systems begins with the SLA: latency, throughput, reliability, cost per resolved task, fallback behavior, and quality targets. Those requirements drive the architecture of the runtime — spanning the edge gateway, safety and governance, orchestration and routing, inference serving, compute scheduling, context and state management, model lifecycle operations, and observability.
#AI Agent#Production AI#AI Infra#System Design
2026-04
- Apr 21, 2026
Agent Observability Is Not Optional
Production agents are hard to operate because teams need to understand why they acted. Traditional observability tracks service health: latency, errors, throughput, CPU, and memory. Agent observability must go further. It must capture intent, workflow state, retrieved context, model and prompt versions, tool proposals, policy checks, approvals, state mutations, and final outcomes. Enterprise trust requires replay. A reliable agent system must be able to prove which decision, context bundle, and policy check allowed an autonomous action to happen.
#AI Agent#Production AI#System Design#AI Infra - Apr 6, 2026
State Is the Hard Part of Production Agents
As AI agents move from short-lived chat interactions to long-running autonomous systems, the hardest engineering problems are no longer about prompts or model quality. They are about state management, replay safety, memory hierarchy, checkpointing, and transactional execution. Production agents need a cache-aware, transactional runtime. Agent state should not be a probabilistic byproduct of a chat log; it should be a deterministic projection of validated events.
#AI Agent#AI Infra#Production AI
2026-03
- Mar 17, 2026
Production Agents Run on an Autonomy Spectrum
Production agents should not be designed around the fantasy of full autonomy. In real environments, agents face brittle interfaces, evolving user preferences, security gates, ambiguous state, and irreversible actions. The goal is not to remove humans entirely, but to build systems that know when autonomy is safe and when control should be reduced. A reliable agent is not one that never needs help. It is one that knows when to slow down, ask for confirmation, or hand control back.
#AI Agent#System Design#Production AI - Mar 10, 2026
Agent Reliability Lives in the Runtime
In production, agent behavior is shaped by the runtime around the model: which tools are visible, when retrieval happens, how retries are handled, what state is persisted, and who is allowed to commit mutations. Reliable agents require more than better prompts or stronger models. They need runtime architecture. Framework defaults, tool visibility, retry policies, and context assembly rules can change behavior even when the underlying model stays the same.
#AI Agent#System Design#AI Infra#Production AI - Mar 4, 2026
从传统摘要到语义合成
LLM 时代,摘要不再只是“把长文变短”,而是演化为上下文工程中的信息密度管理:在运行时压缩 KV Cache,在协议层裁剪低价值上下文,在应用层完成层级摘要、结构化摘要与轨迹摘要。传统摘要负责减少体积,语义合成负责重构信息,让文本成为可检索、可验证、可执行的高密度语义资产。
#AI#NLP#LLM#RAG
2026-02
- Feb 20, 2026
Design Agents Around Workflows, Not Chat Turns
Chat is a useful interface, but it becomes a weak system design primitive once agents are expected to complete real work. A reliable agent should advance a process, not merely generate text. That requires routing simple requests to deterministic paths, using retrieval when grounding is needed, reserving reasoning for ambiguous tasks, and separating planning from execution. For repeatable workflows, LLMs can generate structured plans while deterministic engines handle tool calls, retries, and state transitions. Production agents should be designed around explicit, inspectable, and evaluable workflow state—not reconstructed from chat history every time.
#AI Agent#Production AI#System Design - Feb 9, 2026
Routing Before Reasoning
Production agents should not send every request to the most expensive reasoning path. As reasoning models become more capable, they also introduce new production risks: higher latency, unpredictable cost, KV-cache pressure, and unnecessary “overthinking” for simple requests. Before invoking deep inference, tool use, or multi-step planning, a production agent should first decide which path is actually needed. Production agents are control systems. The real engineering value is not only in the model, but in the controller that decides when to reason, when to execute, and when to ask for human approval.
#AI Agent#Production AI#System Design - Feb 7, 2026
[Leetcode 204] 质数计数
给定整数 n ,返回 所有小于非负整数 n 的质数的数量 。
#数学#数组 - Feb 7, 2026
[Leetcode 206] 反转链表
#链表#双指针#递归 - Feb 7, 2026
[Leetcode 236] 二叉树的最近公共祖先
给定一个二叉树, 找到该树中两个指定节点的最近公共祖先。
#二叉树#DFS#BFS - Feb 7, 2026
[Leetcode 179] 最大数
#字符串#排序#贪心 - Feb 7, 2026
[Leetcode 239] 滑动窗口最大值
大小为 k 的滑动窗口从数组的最左侧移动到数组的最右侧,返回 滑动窗口中的最大值 。
#滑动窗口#堆#单调队列 - Feb 7, 2026
[Leetcode 208] 实现 Trie
实现 Trie 类:初始化、插入字符串 、检索、前缀检索
#设计#数据结构#Tries#哈希 - Feb 7, 2026
[Leetcode 84] 柱状图中最大的矩形
柱状图能够勾勒出来的矩形的最大面积。
#单调栈 - Feb 7, 2026
[Leetcode 121] 买卖股票的最佳时机
#贪心#滑动窗口 - Feb 7, 2026
[Leetcode 126] 单词接龙 II
#BFS#DFS#回溯#图 - Feb 7, 2026
[Leetcode 200] 岛屿数量
#DFS#BFS#并查集 - Feb 7, 2026
[Leetcode 79] 单词搜索
word 是否存在于字母网格中
#回溯 - Feb 7, 2026
[Leetcode 105] 从前序与中序遍历序列构造二叉树
#分治#递归 - Feb 7, 2026
[Leetcode 122] 买卖股票的最好时间 II
#动态规划#贪心 - Feb 7, 2026
[Leetcode 88] 合并两个有序数组
#双指针 - Feb 7, 2026
[Leetcode 138] 随机链表的复制
#链表 - Feb 7, 2026
[Leetcode 143] 重排链表
#链表#双指针 - Feb 7, 2026
[Leetcode 171] Excel 列名转换为数字
#字符串 - Feb 7, 2026
[Leetcode 62] 不同路径
位于网格左上角的机器人总共有多少条不同的路径达到网格的右下角
#动态规划 - Feb 7, 2026
[Leetcode 92] 反转链表II
#链表 - Feb 7, 2026
[Leetcode 128] 最长连续序列
#并查集#哈希 - Feb 7, 2026
[Leetcode 83] 删除排序链表中的重复元素
删除已排序的链表中所有重复的元素
#双指针#递归 - Feb 7, 2026
[Leetcode 116] 填充每个节点的下一个右侧节点指针
#二叉树#BFS - Feb 7, 2026
[Leetcode 123] 买卖股票的最佳时机 III
#动态规划 - Feb 7, 2026
[Leetcode 101] 对称二叉树
#二叉树#DFS#BFS - Feb 7, 2026
[Leetcode 160] 相交链表
#链表#双指针 - Feb 7, 2026
Breadth-First Search: Level-Order Exploration
#Tree#Graph#BFS#Dijkstra - Feb 7, 2026
Depth-First Search: Exploring Deep Before Wide
#DFS#Tree#BinaryTree#Graph - Feb 7, 2026
[Leetcode 11] 盛最多水的容器
找出数组 height中的两条线,使得它们与 x 轴共同构成的容器可以容纳最多的水。
#双指针 - Feb 7, 2026
[Leetcode 25] K 个一组翻转链表
每 k 个节点一组进行翻转,返回修改后的链表。
#双指针#链表 - Feb 7, 2026
[Leetcode 4] 两个排序数组的中位数
找出并返回这两个升序数组的中位数 。
#二分搜索 - Feb 7, 2026
[Leetcode 5] 最长回文子串
找字符串 s 中最长的回文子串。
#双指针#动态规划 - Feb 7, 2026
[Leetcode 32] 最长有效括号
找出最长有效(格式正确且连续)括号子串的长度
#字符串 - Feb 7, 2026
[Leetcode 3] 无重复字符的最长子串
找出其中不含有重复字符的 最长子串的长度。
#双指针#字符串#滑动窗口 - Feb 7, 2026
[Leetcode 17] 电话号码的字母组合
字符串能表示的所有字母组合
#回溯#BFS#DFS - Feb 7, 2026
[Leetcode 34] 在排序数组中查找元素的第一个和最后一个位置
找出给定目标值在排序数组中的开始位置和结束位置。
#二分搜索#BinarySearch - Feb 7, 2026
[Leetcode 14] 最长公共前缀
查找字符串数组中的最长公共前缀。
#字符串 - Feb 7, 2026
[Leetcode 18] 四数之和
返回满足条件且不重复的四元组 [nums[a], nums[b], nums[c], nums[d]]
#双指针#排序 - Feb 7, 2026
[Leetcode 23] 合并 K 个升序链表
将所有升序链表合并到一个升序链表中
#链表#分治 - Feb 7, 2026
[Leetcode 19] 删除链表的倒数第 N 个节点
删除链表的倒数第 n 个节点
#双指针#链表 - Feb 7, 2026
[Leetcode 21] 合并两个有序链表
将两个升序链表合并为一个新的升序链表并返回
#链表 - Feb 7, 2026
[Leetcode 44] 通配符匹配
匹配字符串 (s) 和字符模式 (p), 支持 '?' 和 '*'
#字符串#动态规划#双指针#贪心 - Feb 7, 2026
[Leetcode 20] 有效的括号
判断括号字符串是否有效。
#字符串#栈 - Feb 7, 2026
[Leetcode 42] 接雨水
n 个柱子,下雨之后能接多少雨水。
#双指针#动态规划#单调栈 - Feb 7, 2026
Solving Problems with the Two-Pointers Technique
The two-pointer technique is essential for optimizing operations on arrays, strings, and lists, often reducing time complexity from O(n²) to O(n). Common patterns include opposing pointers, sliding windows, fast–slow pointers, and dual-input pointers—each suited to different problem types such as finding pairs, subarrays, or merging sorted lists.
#TwoPointers - Feb 7, 2026
Implementing Efficient Prefix Search with Tries
Prefix search is a fundamental operation in computer science, typically implemented using a Trie (prefix tree). A Trie is a dynamic data structure for storing a collection of strings, supporting efficient insertion, lookup, and enumeration operations. Tries and their variants provide a powerful and efficient way to manage and query large volumes of string data.
#数据结构#Tree#Recursion#Tries - Feb 7, 2026
Shrinking the Search Space with Binary Search
Binary search is an efficient searching technique based on the divide-and-conquer principle. By repeatedly narrowing the search space, it guarantees a worst-case time complexity of O(log n). It is well-suited for sorted data, monotonic arrays, and optimization problems where the goal is to find the best value. Common use cases include exact matching, boundary and insertion point searches, finding the closest element, and performing binary search on the answer space.
#BinarySearch - Feb 7, 2026
Topological Sorting Explained: Sorting Dependency Chains
#Topological Sort#Graph - Feb 7, 2026
Understanding Recursion: Functions That Call Themselves
Recursion is a core computational concept where a problem is solved by calling itself on smaller instances. Recursion is key to many algorithms: DFS (Depth-First Search) is often implemented recursively, Dynamic Programming is fundamentally recursion with caching (memoization), and Divide & Conquer uses recursion to split problems into independent subproblems.
#Recursion#DFS#Dynamic Programming#Divide & Conquer - Feb 7, 2026
[Leetcode148] 链表排序
给出链表的头结点 head ,请将其按 升序 排列并返回 排序后的链表 。
#链表#排序#分治 - Feb 7, 2026
[Leetcode 46] 全排列
给定一个不含重复数字的数组 nums,返回其所有可能的全排列 。
#回溯 - Feb 7, 2026
[Leetcode 124] 二叉树中的最大路径和
#二叉树#DFS - Feb 7, 2026
[Leetcode 146] LRU 缓存
#哈希#链表#数据结构#设计 - Feb 7, 2026
[Leetcode 240] 搜索二维矩阵 II
搜索 m x n 的排序矩阵中的一个目标值 target 。
#搜索#矩阵#二分搜索
2026-01
- Jan 25, 2026
The Production Agent Stack
A reliable agent is not just an LLM connected to tools. A production agent stack is a system of layered responsibilities. The runtime owns execution state and governs workflow progression. The planner proposes next steps, but proposals are not execution. Memory provides contextual recall without serving as the source of truth. Agent interoperability enables structured delegation, while tools expose external capabilities through standardized protocols such as MCP. Validation transforms probabilistic model outputs into structured, policy-constrained proposals that can safely enter the execution pipeline. Execution itself occurs inside isolated runtime environments where side effects can be controlled, audited, recovered, or rolled back.
#AI Agent#Production AI#System Design - Jan 16, 2026
Building a Simple Agent from Scratch
This post walks through the implementation of a minimal invoice-processing agent. The agent parses an invoice, verifies it against a ledger, requests approval when needed, and writes the final entry only after validation. The core pattern is simple: state constrains actions, the planner proposes one, validation gates it, tools return observations, the reducer updates state, and the runtime decides whether to stop. Before adopting complex orchestration frameworks, build this loop first.
#AI Agent#Applied AI
2025-12
2025-11
- Nov 19, 2025
Demystifying Agentic Search Engines
Agentic search engines—such as Google AI Mode, Perplexity, Bing Copilot, ChatGPT Search no longer means “type keywords, get ten blue links.” AI Search experience capable of understanding tasks, planning queries, calling tools, and synthesizing results and deliver a conversational response with inline citations, minimizing user effort. In this post, I’ll walk through the stack from bottom to top, how it crawls and indexes pages, how it retrieves and ranks information, and how recent features like RAG and Agentic search build upon these foundations.
#System Design#RAG#Retrieval#LLM - Nov 1, 2025
Modern Recommendation System Infrastructure
Building Modern Recommendation Systems introduces a comprehensive, end-to-end pipeline that drives intelligent recommendations. The post walks through the full machine learning workflow — from raw data preparation and feature engineering to model training, deployment, real-time inference, and system monitoring.
#System Design#RecSys#Recommendation#ML system