Claude Slows Down Expert Developers: The Study Challenging the AI Productivity Hype
While the world rushes to integrate AI into every workflow, a new study offers a much-needed pause. The key finding: when experienced developers use AI coding assistants like Claude 3.5, they often perform worse than working solo.
The study comes from METR (Model Evaluation and Testing for Reliability), a nonprofit gaining traction in 2024 and 2025 for its rigorous evaluations of AI systems in real-world environments. But this time, they didn’t test the models in isolated prompts or controlled tasks. They did something more telling: they observed how real developers handled real codebases — with and without AI.
The METR team recruited 58 experienced open-source developers and split them into two groups. One used the Claude 3.5/3.7 Sonnet assistant via Cursor Pro; the other had no AI support at all. The task was simple — and hard: fix a real bug in a public GitHub repository. No toy examples. No artificial constraints. Just the kind of messy, unpredictable work software engineers face every day.
Result: The developers using Claude took 19% longer on average to complete their tasks.
This may seem counterintuitive. Aren’t AI tools supposed to speed us up? METR’s analysis uncovered several reasons why that’s not always the case: context mismatch, where Claude couldn’t fully grasp the architecture or conventions of the codebase; over-suggestion, where it proposed technically valid but irrelevant paths; and cognitive friction, where developers spent extra time evaluating whether the AI’s help was actually helpful.
The paradox is striking: AI tries to help, but the cost of verifying its output negates the gain.
Beyond performance, METR also looked at how developers felt. Some appreciated Claude’s ability to explain its reasoning step by step. Others quickly ignored it after a few off-target suggestions. A smaller group described something deeper: a sense that their own focus was being diluted. Just having the assistant present altered their thinking patterns.
This isn’t just a technical issue. It’s cognitive. AI doesn’t simply suggest solutions — it influences how we frame problems. And for experienced professionals, that influence can be disruptive rather than productive.
The Claude/METR study doesn’t mean AI isn’t valuable. But it does challenge the idea that AI = productivity, universally. It reminds us that helpfulness is context-dependent, and that tools designed for novices can misfire in expert hands. In the rush to roll out AI across every role and department, we must ask harder questions. Not just “what can this do?” but “for whom is this actually useful?”
Sometimes, the best support for a high-performing developer isn’t more suggestions — it’s the silence to think clearly.
This study isn’t just about Claude. It’s about rethinking the way we define effectiveness and expertise. If AI changes how we work, it also changes how we measure good work. And that shift is already happening.
Because intelligence is no longer just in the model — it’s in the choices we make about how, when, and why we use it.
Source: metr.org