The top coding benchmark changed leaders 5 times in one month as all major labs competed.

Reinforcement learning on code drove the shift, making agents daily-driver tools by November for the first time.

Production reliability still lags vendor benchmarks; many projects built on early agents were abandoned by February.

Sources: Hacker News