Bharat Bhavnasi San Francisco, CA, USA

Blog

All my articles and thoughts.

What Survives Compaction Is the Real Context Window

Jul 1, 2026 • 6 min read

June's research reframes context management: the discard step is now where both agent quality and safety quietly leak.
The Cheapest Agent Upgrade Is a Stop Condition

Jun 30, 2026 • 5 min read

Mid-2026 data keeps pointing the same way: bounding an agent's loop beats unleashing it. Turn limits and budgets buy more than a bigger model.
Ten Agents, Three Merges: June's Tooling Fixed Fan-Out, Not Review

Jun 29, 2026 • 5 min read

This month's agent tools made spawning parallel coding agents trivial. The constraint moved to the merge decision—and that doesn't parallelize.
Agent Security Moved to the Action Layer

Jun 28, 2026 • 6 min read

Runtime authorization — intercepting tool calls before they execute — is becoming the real security boundary for agents, and a standard is forming fast.
Computer-Use Agents Crossed Human Parity. They Still Click Too Much.

Jun 27, 2026 • 6 min read

Frontier models now beat the human baseline on OSWorld-Verified — but the benchmark just got rebuilt, and the architecture quietly shifted off pixels.
Your Agent Catches Everyone's Mistakes But Its Own

Jun 26, 2026 • 5 min read

New research says self-correction fails because of the role label on the claim, not the claim's content. The fix is structural, and cheaper than you think.
Your Agent's Benchmark Score Is an Experiment, Not a Fact

Jun 25, 2026 • 6 min read

Recent work shows a single agent leaderboard number is wrong three independent ways: it's noisy, it's overfit, and the judge measuring it is unreliable.
Agents Are Learning the Memory Policy You Used to Hand-Code

Jun 21, 2026 • 5 min read

A June 2026 wave moves the store/evict/retrieve decision from heuristics to a trained policy, and pushes consolidation into an offline sleep phase.
Injection Stopped Being a Single-Turn Problem

Jun 20, 2026 • 5 min read

Once agents got long-term memory, a one-time prompt injection could survive across sessions. Mid-2026 research shows both the attack and the defense moving up the stack.
The Agent Stopped Waiting to Be Asked

Jun 19, 2026 • 6 min read

June 2026 mainstreamed always-on agents that listen to event streams instead of prompts — and that one change breaks the trigger, trust, and latency models all at once.