75% of AI Coding Agents Introduce Regressions During Long-Term Maintenance

By MKWritesHere · Published March 9, 2026 · 1 min read · Source: Level Up Coding

Member-only story

75% of AI Coding Agents Introduce Regressions During Long-Term Maintenance

WE-CI tested 18 models across 100 real repositories spanning 233 days of history.

MKWritesHere14 min read·Just now

Every few weeks, someone posts a viral demo of an AI agent building a full-stack app from a single prompt, and the comments flood in like clockwork — “Developers are done,” “Junior engineers are cooked,” and the ever-present “This changes everything.” I’ve watched these demos, and honestly they’re impressive. A feature ships in five minutes that would’ve eaten my entire afternoon.

But I maintain a codebase every single day — same repo, same boss, same tangle of decisions that past-me made at 11pm three months ago. So when I read the SWE-CI paper in March 2026, it didn’t feel like a revelation. It felt like someone finally put a number on something I’d been living for years.

The question the paper asks is deceptively simple: what actually happens when you don’t drop an AI agent into a fresh problem, but make it live inside a codebase for 233 days?

The answer isn’t pretty.

The Two Diagrams That Exposed What We’ve Been Ignoring

This article was originally published on Level Up Coding and is republished here under RSS syndication for informational purposes. All rights and intellectual property remain with the original author. If you are the author and wish to have this article removed, please contact us at [email protected].