Anthropic Confirms Claude Code Was Broken For 7 Weeks
- March 4 default reasoning dropped from high to medium for latency, hurt coding accuracy on real tasks
- March 26 bug made Claude Code discard its own reasoning history mid-session, draining usage limits invisibly
- April 16 system prompt capped responses at 25 words between tool calls, reverted four days later on April 20
- All three issues fixed by April 20, Anthropic API was never affected, only Claude Code
- Trust in AI tooling is built on transparency, and a public post-mortem is the right move even when the news is bad
For about seven weeks I kept thinking I was losing my mind. Claude Code felt thinner. Sessions went amnesic. Then on April 24 Anthropic published the post-mortem and confirmed it was real.
What I noticed in my own studio first
I run a one-person studio. Claude Code is in almost every loop I have, from Shopify theme work to Vercel deploys to writing this very blog. So when something shifts inside the tool, I feel it within a day or two.
In early March the reasoning got shallower. I would ask for a non-trivial refactor and get a confident-sounding answer that skipped a step. The kind of skip you only catch by reading the diff carefully. I started adding "think this through carefully before you change anything" to prompts I had been running cleanly for months.
Late March is when the amnesia started. I would be deep in a session, pasting context, narrowing in on a fix, and Claude would suddenly act like it had not seen the last twenty messages. I assumed it was me. Wrong context window settings, maybe a bad prompt. I rebuilt my workflows twice.
Then mid-April brought the weirdest one. Responses got clipped. Tool calls felt rushed. Claude would start a thought, fire off a bash command, and never finish the sentence. I joked on a call that Claude was "answering in haiku now." That joke turned out to be closer to the truth than I knew.
I want to be clear about my bias here. I love this tool. I sell a product called Claude Blueprint that helps people set up Claude Code well. I had every reason to assume the problem was on my end. So when the engineering post landed on April 24, my first reaction was relief. Not satisfaction. Relief.
The pattern across all three bugs is the same. Each one was invisible to the user but had a measurable effect on quality. None of them came with a changelog entry users could read. You felt the symptoms and assumed your prompts were the problem.
The March 4 change: high to medium reasoning
Anthropic confirmed that on March 4, 2026 they switched the default reasoning effort in Claude Code from "high" to "medium." The goal was to cut latency and make the tool feel snappier. They now say that tradeoff was wrong.
This one stings because it is the most defensible of the three. Latency matters. Slow tools get abandoned. Anyone who has shipped a developer product knows the pressure to make every interaction feel instant. I have made the same call on smaller scales in my own work, choosing speed over depth and regretting it later.
But coding is not a domain where you can cut reasoning depth quietly. The model needs to walk through edge cases, reconcile imports, think about side effects. "Medium" reasoning produces code that compiles and looks reasonable but quietly skips checks a senior engineer would never miss. The kind of bug you find at 11pm on a Friday because a deploy failed.
What I take from this: defaults are not a small UX decision in AI tooling. The default reasoning level of a coding assistant is a product position. If your product is "this thinks like a careful engineer," lowering the default reasoning is changing the product. Users do not see that change in a changelog. They feel it as "the tool got dumber."
There is also a measurement problem hiding inside this. If your internal benchmarks pass at "medium," the change looks safe. But benchmarks rarely capture the long-tail tasks that solo developers throw at the tool. A senior engineer asking Claude to refactor a tricky reducer is not a benchmark task. It is exactly the case where the missing reasoning depth shows up first.
The March 26 reasoning-history bug
The middle bug is the one that explains my "amnesia" weeks. On March 26 a change shipped that caused Claude Code to continuously discard its own reasoning history mid-session.
In practice this meant the model would do a chunk of work, drop the reasoning that got it there, and then on the next turn re-derive context from scratch. Sometimes badly. From my seat as a user, this looked exactly like a forgetful, erratic model. From inside the system, it was the model rebuilding state every few turns and burning tokens to do it.
Anthropic flagged a specific consequence I did not expect: this drained users' usage limits faster than usual. Retries and re-derivations were happening invisibly. So the bug had two costs at once. Quality went down because the reasoning was thrown away. Quota went down because the system kept paying to recreate it.
I felt this. Several days in late March I hit my limit far earlier than the workload justified. I assumed I was just running heavier sessions. Now I know my sessions were running heavier without me asking them to.
The lesson here is the brutal one. A bug in a stateful AI loop does not just degrade output. It compounds, because the model keeps spending resources to compensate. If you are building anything that maintains conversational state across turns, you have to monitor not just final answers but the cost of getting there. Usage curves are leading indicators.
It is also worth noting that the API was untouched during this whole period. Developers calling the Anthropic API directly never felt any of these symptoms. That detail matters because it tells you exactly where the bugs lived: in the Claude Code harness, the layer that wraps the model with tools, history management, and orchestration. The model itself was fine. The product around it slipped.
The April 16 25-word cap
The third miss is almost funny. On April 16 Anthropic added a system-prompt instruction that capped responses to 25 words between tool calls. The goal was probably to keep the model action-oriented in agentic loops. The result was Claude Code that could not explain itself.
For tightly scoped tool use, 25 words is fine. Run this command, read this file, move on. For real engineering work, 25 words is a gag order. The model cannot lay out a plan, cannot weigh two approaches, cannot tell you why it is about to refactor a function. It just acts.
I noticed this most when debugging. I would ask "why is this failing" and get four words and a tool call. The tool call would do something. Then four more words. The whole experience felt like working with someone who had been told not to talk. I stopped trusting the actions because I could not see the reasoning behind them.
Anthropic reverted the cap on April 20, four days after shipping it. That is fast, and it is the right call. But four days is enough to ruin a sprint, especially for solo developers who depend on Claude Code as their pair. I lost most of one workday to chasing a phantom bug in my own code that turned out to be Claude refusing to explain why a build was breaking.
The lesson I am taking into my own products: any change to a system prompt is a product change. It deserves the same review as a UI change. If you would not silently rewrite a button label, do not silently rewrite the rules the model is operating under.
A 25-word cap is not a small instruction. It rewires how the model sequences thought and action. In a one-shot chat that might be invisible. In a multi-turn agentic loop with tool calls every few seconds, it is a personality transplant. Reasoning that should have been in the response moved nowhere. It just got dropped.
Bottom Line
All three bugs were fixed by April 20. The Anthropic API was never affected, only Claude Code. On April 23 Anthropic reset usage limits for every subscriber as a goodwill gesture, and on April 24 they published the engineering post-mortem. They explicitly denied that any of this was intentional "nerfing," which had been the loud accusation across forums for weeks. The engineering record now backs the denial, even if it cost them a confession to do it.
I want to give credit where it is due. Anthropic could have stayed quiet. The "nerfing" allegations were already loud, and a denial would have been easier than a confession. They chose the confession. That is rare, and it is the move that keeps me trusting the tool.
For anyone who builds on Claude Code, the practical takeaway is to lock in working setups. Save your prompts, your context patterns, your sub-agents, your hooks. When the platform drifts, you want a known-good baseline you can rebuild from in an afternoon, not from memory. That is exactly what I packed into Claude Blueprint, and you can find more pieces I have written on the workflow at the RAXXO Lab.
Trust is built one post-mortem at a time. This was a good one.
Back to all articles