Bharat Bhavnasi San Francisco, CA, USA

#rl

5 posts tagged with "rl".

Agents Are Learning the Memory Policy You Used to Hand-Code

Jun 21, 2026 • 5 min read

A June 2026 wave moves the store/evict/retrieve decision from heuristics to a trained policy, and pushes consolidation into an offline sleep phase.
The Environment Became the Curriculum: Agent RL's Synthesis Turn

Jun 9, 2026 • 6 min read

Agent RL's bottleneck moved from data to reward to the environment itself. The newest research tries to take humans out of environment-building entirely.
Verification Is Becoming the Agent's Substrate

May 30, 2026 • 5 min read

The agents scaling fastest in mid-2026 share one trait: their output lands in a column a machine can check. The verifier, not the model, is the moat.
Where the Reward Goes: Agent RL's Reward-Design Split

May 29, 2026 • 5 min read

Recent papers disagree on whether to reward agents per-turn or only at the end — and the answer reveals where RL for agents is actually headed.
Training a Virtual Company: A Deep Dive into Multi-Agent Reinforcement Learning with OpenEnv & Unsloth

Mar 7, 2026 • 29 min read

How exploring LLM fine-tuning led to building a Gymnasium-compatible RL environment where 7 LLM-powered agents run a company — trained with GRPO + LoRA on Qwen 2.5 14B — and what we learned about reward design, emergent collaboration, and the future of agentic AI.