#agents
23 posts tagged with "agents".
-
The Agent Trust Stack Just Got Built: Three Weeks in May 2026
• 6 min readSkill cards, self-hosted sandboxes, MCP tunnels, computer-use verifiers, and a Five Eyes warning all landed in twenty-one days. The boring perimeter around capable agents finally has shape.
-
The Browse-Click-Compare Web Is Ending. Here's What Replaces It.
• 10 min readTwenty minutes of tabs vs. five minutes of prompt. The traditional web wasn't designed for humans — it was designed for mice. The agent-native web is quietly dismantling the parts that never made sense.
-
Long-Horizon Agents: When Tasks Take Hours
• 11 min readSix-hour agent runs are now real. The harness — checkpoints, durable state, recovery — matters more than the model. A field guide to the long-running pattern.
-
Skills, Connectors, Subagents: Anthropic's 3-Layer Agent Template
• 10 min readAnthropic just shipped 10 financial services agent templates. The interesting part isn't the templates — it's the three-layer architecture quietly becoming the standard for enterprise agents.
-
Code with Claude 2026: Five Things That Actually Matter
• 9 min readAnthropic shipped a lot on May 6 — Managed Agents updates, Dreaming, Outcomes, Multi-agent Orchestration, and a SpaceX partnership. The signal-to-noise filtered down to five things that change how you build.
-
Agent Observability in 2026: Tracing, Replay, and Why OTel Won
• 9 min readLangfuse got acquired by ClickHouse. Helicone hit maintenance mode. OpenTelemetry standardized LLM tracing. The observability stack for agents reshuffled in three months. Here's what it looks like now.
-
Agent Evals in 2026: Beyond LLM-as-Judge
• 10 min readVibes-based scoring is finally dying. Trajectory eval, rubric eval, golden replay, and the test pyramid that production agent teams actually run.
-
Cascaded vs Fused Voice Agents: A Builder's Perspective on Architecture Choices
• 16 min readDeep dive into voice agent architectures. Why cascaded models give you control and fused models trade complexity for naturalness. What we're learning from shipping production agents at scale.
-
Sandbox Execution: Code Interpreters Grew Up
• 11 min readFirecracker microVMs, gVisor containers, persistent workspaces, and the $24M Series A nobody quite expected. The sandbox layer beneath every serious agent — and how to pick the right one.
-
How to Make Voice Agents Sound Human: A Practical Guide to Realistic Speech Prompting
• 9 min readWhy your cascaded voice agent sounds robotic — and how to fix it with concrete examples, SSML pause patterns, emotion tags, and personality-as-behavior prompting techniques.
-
Cost-Optimized Agent Architectures: Cutting Spend 10x Without Losing Quality
• 9 min readCaching, routing, distillation, and per-task model selection. The four moves that take a $0.40/task agent to $0.04/task without anyone noticing the difference.
-
Web Research Agents: The State of the Art, March 2026
• 10 min readOperator died, Browser Use became the default substrate, Manus shipped at scale, and the gap between demo and reliable production narrowed considerably. A field report.
-
Deep Agents: Planner / Executor / Critic Becomes the Default
• 10 min readThe three-role pattern that powered Manus, then LangChain Deep Agents, then half the production agents shipping in early 2026. Why it works, when it doesn't, and how to actually build one.
-
Context Engineering: The Discipline That Makes AI Agents Actually Work
(updated) • 16 min readA deep dive into context engineering — the techniques that separate toy demos from production AI agents. Covers compaction, offloading, isolation, caching, and prioritization with real examples from Manus, Claude Code, and Devin.
-
Training a Virtual Company: A Deep Dive into Multi-Agent Reinforcement Learning with OpenEnv & Unsloth
• 29 min readHow exploring LLM fine-tuning led to building a Gymnasium-compatible RL environment where 7 LLM-powered agents run a company — trained with GRPO + LoRA on Qwen 2.5 14B — and what we learned about reward design, emergent collaboration, and the future of agentic AI.
-
MCP Has a Tools Problem — And Code Mode Might Fix It
• 7 min readAI agents are drowning in tools. The more APIs you connect via MCP, the worse your agent performs. Here's why, and what Code Mode changes.
-
The AI App Paradox: Why We're Drowning in Tools but Starving for Experience
• 2 min readWe've been so obsessed with what AI can do that we forgot about how it feels to use it. The AI experience layer is the next frontier — not the model, not the capabilities.
-
Tool Selection at Scale: When Your Agent Has 200 Tools
• 9 min readPast ~30 tools, agent reliability falls off a cliff. Past ~100, it's chaos. Here's the actual engineering — RAG-over-tools, semantic routing, dynamic loading, and namespacing — that production teams ship to stay sane.
-
Sub-Agents Are the New Microservices
• 9 min readThe orchestrator-worker pattern that took over agent design in late 2025 is the same pattern that took over backend design in 2014. The wins are real. So are the failure modes.
-
I Tested Every Major Open-Source AI Agent SDK So You Don't Have To
• 2 min readA comprehensive hands-on comparison of seven open-source AI agent frameworks — which one should you actually use?
-
Choosing an Agent Framework in 2026: A Decision Tree
• 9 min readSix serious frameworks, four orchestration styles, and one tired question I keep getting asked. Here's the decision tree I actually use.
-
MCP Just Crossed the Inflection Point
• 7 min readFourteen months in, the Model Context Protocol stopped being a curiosity and started being plumbing. Here's what changed over the holidays — registries, governance, and the first scaling pains.
-
JARVIS: Building an Agentic AI System for IoT Control
• 2 min readOpen-sourcing my childhood dream — an AI agent that understands context, makes decisions, and controls connected devices just like JARVIS.