Bharat Bhavnasi San Francisco, CA, USA

#evaluation

1 post tagged with "evaluation".

Evaluating Agents: From Unit Tests to LLM-as-Judge Pipelines

Mar 30, 2026 • 8 min read

You can't ship agents you can't measure. The 2026 eval stack — task-level scoring, trajectory grading, LLM-as-judge with calibration, and the regression gates that catch silent quality drops.