WHAT A CODING AGENT IS
A coding agent is a language model wrapped in a loop: it reads a task, edits files, runs tests, reads errors, and tries again. The model is the engine; the scaffolding around it — tool use, file I/O, terminal access — is what turns chat into work.
WHY RL ON CODE WORKS
Code is one of the only domains with a built-in oracle: the compiler and the test suite tell you immediately whether the output is correct. That lets reinforcement learning generate near-infinite training signal without human labelers — the same property that made AlphaGo possible.
THE BENCHMARK TRAP
SWE-bench, the dominant coding benchmark, draws from real GitHub issues with known fixes. Labs train against it, scores climb, but the benchmark distribution drifts from production work — long-lived codebases, ambiguous specs, flaky tests, legacy dependencies. Vendor numbers describe a sanitized slice of the job.
THE LEAPFROG DYNAMIC
Frontier labs release on overlapping cycles — Anthropic, OpenAI, Google, and now several Chinese labs. When training runs take months but inference improvements ship weekly, the leaderboard reorders constantly without any single model being decisively ahead.
THE ABANDONMENT PATTERN
Software built on a rapidly moving model substrate inherits the model's flaws and obsolescence. A 2025-era agent wrapper that worked around specific failure modes becomes dead weight when the next model fixes them natively. The graveyard of LLM startups is mostly thin wrappers outrun by the underlying model.
THE COST CURVE
Token prices for frontier coding models fell roughly 10× per year from 2023 to 2026. An agent run that cost $20 in early 2024 costs cents today. This is the quiet structural force behind the usability threshold — capability per dollar crossed a line where running an agent overnight became cheaper than a developer's coffee.