Member-only story
75% of AI Coding Agents Introduce Regressions During Long-Term Maintenance
WE-CI tested 18 models across 100 real repositories spanning 233 days of history.
MKWritesHere14 min read·Just now--
Every few weeks, someone posts a viral demo of an AI agent building a full-stack app from a single prompt, and the comments flood in like clockwork — “Developers are done,” “Junior engineers are cooked,” and the ever-present “This changes everything.” I’ve watched these demos, and honestly they’re impressive. A feature ships in five minutes that would’ve eaten my entire afternoon.
But I maintain a codebase every single day — same repo, same boss, same tangle of decisions that past-me made at 11pm three months ago. So when I read the SWE-CI paper in March 2026, it didn’t feel like a revelation. It felt like someone finally put a number on something I’d been living for years.
The question the paper asks is deceptively simple: what actually happens when you don’t drop an AI agent into a fresh problem, but make it live inside a codebase for 233 days?
The answer isn’t pretty.