THE IRONY OF AUTOMATION
Lisanne Bainbridge named this in 1983: the more reliable an automated system becomes, the rustier the human supervisor gets — and the supervisor is only kept in the loop precisely for the cases the automation cannot handle. Reliability erodes the very vigilance it depends on.
THE PRECEDENTS
Air France 447 (2009) and the Boeing 737 MAX MCAS crashes are the canonical industrial cases — pilots out of practice on stick-and-rudder flying when autopilot disengaged. Tesla Autopilot inattention crashes are the consumer version. Each followed the same pattern: high reliability, dulled supervision, edge-case failure.
WHAT CODE REVIEW IS FOR
Review catches two distinct things: defects and drift. Defects are bugs an author missed; drift is code that works but violates an invariant the codebase depends on — a security boundary, a concurrency assumption, a data contract. Agents are getting good at the first; the second still requires a reader who holds the system in their head.
THE SKILL ATROPHY LOOP
Reading code closely is how engineers build the mental model that lets them review well in the first place. Skip the reading because the agent is usually right, and the model degrades. Six months in, the reviewer cannot tell good output from plausible output — the exact failure mode Willison is naming.
WHO BEARS THE LIABILITY
Legally and contractually, the human who approved the merge owns the defect — not the model vendor. EULAs for coding assistants disclaim warranty on output; SOC 2 and ISO 27001 audits trace accountability to the committer. 'The agent wrote it' is not a defense in a postmortem or a courtroom.
THE INCIDENT SHIFT
Early LLM coding incidents were hallucinated APIs and obviously wrong logic — the kind a reviewer catches in seconds. The newer class is subtler: plausible code that passes tests but encodes a wrong assumption about a rate limit, a permission model, or a race condition. The bug is in what the agent didn't know to ask, and the reviewer didn't think to check.