Bharat Bhavnasi San Francisco, CA, USA

#agent-tools

12 posts tagged with "agent-tools".

Long-Horizon Agents: When Tasks Take Hours

May 21, 2026 • 11 min read

Six-hour agent runs are now real. The harness — checkpoints, durable state, recovery — matters more than the model. A field guide to the long-running pattern.
Code with Claude 2026: Five Things That Actually Matter

May 7, 2026 • 9 min read

Anthropic shipped a lot on May 6 — Managed Agents updates, Dreaming, Outcomes, Multi-agent Orchestration, and a SpaceX partnership. The signal-to-noise filtered down to five things that change how you build.
Agent Observability in 2026: Tracing, Replay, and Why OTel Won

May 1, 2026 • 9 min read

Langfuse got acquired by ClickHouse. Helicone hit maintenance mode. OpenTelemetry standardized LLM tracing. The observability stack for agents reshuffled in three months. Here's what it looks like now.
Agent Evals in 2026: Beyond LLM-as-Judge

Apr 24, 2026 • 10 min read

Vibes-based scoring is finally dying. Trajectory eval, rubric eval, golden replay, and the test pyramid that production agent teams actually run.
Sandbox Execution: Code Interpreters Grew Up

Apr 10, 2026 • 11 min read

Firecracker microVMs, gVisor containers, persistent workspaces, and the $24M Series A nobody quite expected. The sandbox layer beneath every serious agent — and how to pick the right one.
Cost-Optimized Agent Architectures: Cutting Spend 10x Without Losing Quality

Mar 26, 2026 • 9 min read

Caching, routing, distillation, and per-task model selection. The four moves that take a $0.40/task agent to $0.04/task without anyone noticing the difference.
Web Research Agents: The State of the Art, March 2026

Mar 19, 2026 • 10 min read

Operator died, Browser Use became the default substrate, Manus shipped at scale, and the gap between demo and reliable production narrowed considerably. A field report.
Deep Agents: Planner / Executor / Critic Becomes the Default

Mar 12, 2026 • 10 min read

The three-role pattern that powered Manus, then LangChain Deep Agents, then half the production agents shipping in early 2026. Why it works, when it doesn't, and how to actually build one.
Tool Selection at Scale: When Your Agent Has 200 Tools

Feb 12, 2026 • 9 min read

Past ~30 tools, agent reliability falls off a cliff. Past ~100, it's chaos. Here's the actual engineering — RAG-over-tools, semantic routing, dynamic loading, and namespacing — that production teams ship to stay sane.
Sub-Agents Are the New Microservices

Feb 5, 2026 • 9 min read

The orchestrator-worker pattern that took over agent design in late 2025 is the same pattern that took over backend design in 2014. The wins are real. So are the failure modes.
Choosing an Agent Framework in 2026: A Decision Tree

Jan 22, 2026 • 9 min read

Six serious frameworks, four orchestration styles, and one tired question I keep getting asked. Here's the decision tree I actually use.
MCP Just Crossed the Inflection Point

Jan 15, 2026 • 7 min read

Fourteen months in, the Model Context Protocol stopped being a curiosity and started being plumbing. Here's what changed over the holidays — registries, governance, and the first scaling pains.