#agent-tools
12 posts tagged with "agent-tools".
-
Long-Horizon Agents: When Tasks Take Hours
• 11 min readSix-hour agent runs are now real. The harness — checkpoints, durable state, recovery — matters more than the model. A field guide to the long-running pattern.
-
Code with Claude 2026: Five Things That Actually Matter
• 9 min readAnthropic shipped a lot on May 6 — Managed Agents updates, Dreaming, Outcomes, Multi-agent Orchestration, and a SpaceX partnership. The signal-to-noise filtered down to five things that change how you build.
-
Agent Observability in 2026: Tracing, Replay, and Why OTel Won
• 9 min readLangfuse got acquired by ClickHouse. Helicone hit maintenance mode. OpenTelemetry standardized LLM tracing. The observability stack for agents reshuffled in three months. Here's what it looks like now.
-
Agent Evals in 2026: Beyond LLM-as-Judge
• 10 min readVibes-based scoring is finally dying. Trajectory eval, rubric eval, golden replay, and the test pyramid that production agent teams actually run.
-
Sandbox Execution: Code Interpreters Grew Up
• 11 min readFirecracker microVMs, gVisor containers, persistent workspaces, and the $24M Series A nobody quite expected. The sandbox layer beneath every serious agent — and how to pick the right one.
-
Cost-Optimized Agent Architectures: Cutting Spend 10x Without Losing Quality
• 9 min readCaching, routing, distillation, and per-task model selection. The four moves that take a $0.40/task agent to $0.04/task without anyone noticing the difference.
-
Web Research Agents: The State of the Art, March 2026
• 10 min readOperator died, Browser Use became the default substrate, Manus shipped at scale, and the gap between demo and reliable production narrowed considerably. A field report.
-
Deep Agents: Planner / Executor / Critic Becomes the Default
• 10 min readThe three-role pattern that powered Manus, then LangChain Deep Agents, then half the production agents shipping in early 2026. Why it works, when it doesn't, and how to actually build one.
-
Tool Selection at Scale: When Your Agent Has 200 Tools
• 9 min readPast ~30 tools, agent reliability falls off a cliff. Past ~100, it's chaos. Here's the actual engineering — RAG-over-tools, semantic routing, dynamic loading, and namespacing — that production teams ship to stay sane.
-
Sub-Agents Are the New Microservices
• 9 min readThe orchestrator-worker pattern that took over agent design in late 2025 is the same pattern that took over backend design in 2014. The wins are real. So are the failure modes.
-
Choosing an Agent Framework in 2026: A Decision Tree
• 9 min readSix serious frameworks, four orchestration styles, and one tired question I keep getting asked. Here's the decision tree I actually use.
-
MCP Just Crossed the Inflection Point
• 7 min readFourteen months in, the Model Context Protocol stopped being a curiosity and started being plumbing. Here's what changed over the holidays — registries, governance, and the first scaling pains.