/Tech4h ago

AI developer @teortaxesTex says Claude Sonnet 5.0 underperforms Sonnet 4.6 on non-target benchmarks, sparking 'benchnerfing' debates

Story Overview

Anthropic rolled out Claude Sonnet 5 as its most capable agentic model yet, touting stronger reasoning and tool use than Sonnet 4.6 at a lower price point, but independent checks quickly surfaced regressions on benchmarks that fall outside the usual optimization targets.

45261111111.2K

#501

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex#501inTech

The one bench nobody wants to hill-climb

banteg@banteg

welcome to benchnerfing era, sonnet 5 weaker than sonnet 4.6

11:18 AM · Jun 30, 2026 · 271 Views

Open Question

Performance gaps invite fresh scrutiny

A vocal AI developer highlighted weaker scores versus the previous Sonnet on non-targeted tests, prompting others to ask whether the new model even clears the bar set by GLM-5.2.

Developer Impact

Version jump fuels naming doubts

Some commentators argue the incremental gains do not justify the 5.0 label and might better suit a point release like 4.8, keeping the conversation focused on what actually moved forward.

Sentiment

Users in the replies dismiss Anthropic's Claude Sonnet 5 release as a nerfed downgrade with worse performance than prior models like Opus 4.8 because it seems driven by marketing and greed instead of genuine improvements.

Pos

0.0%

Neg

100.0%

9 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS7.1KBOOKMARKS7LIKES160RETWEETS8REPLIES15

Lisan al Gaib@scaling01

I knew it was going to be an insane nothingburger, because there's currently a soft ban on frontier capabilities

but I genuinely don't understand why they didn't call it Sonnet 4.8 or Sonnet 4.9, because this artificially nerfed piece of shit is not worthy of the 5.0 naming

3h7.1K1607

Lisan al Gaib@scaling01

Claude 5 has so far been the worst launch by Anthropic

Fable 5 isn't available and Sonnet 5 was nerfed to death

2h1.5K703

Lisan al Gaib@scaling01

like does it even beat GLM-5.2?

Lisan al Gaib@scaling01

I knew it was going to be an insane nothingburger, because there's currently a soft ban on frontier capabilities

but I genuinely don't understand why they didn't call it Sonnet 4.8 or Sonnet 4.9, because this artificially nerfed piece of shit is not worthy of the 5.0 naming

2h1.7K250

Lisan al Gaib@scaling01

but we all know that Anthropic's .5 iterations are the real generational jumps. so it's fine

Sonnet 3.5 >>> Sonnet 3 Sonnet 4.5 >>> Sonnet 4 Sonnet 5.5 >>> Sonnet 5

(in terms of relative improvements to the previous models, in raw scores they are obviously better)

Lisan al Gaib@scaling01

I knew it was going to be an insane nothingburger, because there's currently a soft ban on frontier capabilities

but I genuinely don't understand why they didn't call it Sonnet 4.8 or Sonnet 4.9, because this artificially nerfed piece of shit is not worthy of the 5.0 naming

1h2.3K430

Lisan al Gaib@scaling01

turns out it does (barely) at like 3x the price

2h1.2K121

Chubby♨️@kimmonismus

it gets even worse:

Chubby♨️@kimmonismus

tl;dr: Sonnet 5 is cheaper per token, but more expensive per solved problem – and still lags behind Opus 4.8 in overall intelligence.

Thats honestly disappointing and not a good release.

57m60941

Lincoln 🇿🇦@Presidentlin

@0xVita @scaling01 One of these days, I need to watch these movies.

3h1631

Jacob Centner@JacobCentner

@scaling01 feels like haiku-5 instead, I think they should have gone with that

roughly matching sonnet-4.6 perf on medium for half the cost is cool, lot of enterprises are going to love that

3h73811

wetbrain@0xVita

@scaling01 Thank god the mahdi is here to tell us the truth we @Presidentlin were worried you got hit by a bus when you didn’t break the news first

3h5364

Juniper@JuniperViews

@kimmonismus It's truly a horrible release. What's the point?

56m222

💺@patience_cave

@scaling01 i think you mean best launch?

2h1213

Andrew Rivers@itsandrewrivers

@scaling01 We have entered the age of artificially-limited frontier model regression.

We can thank the gov for that.

US is no longer going to be a safe haven for rapid AI innovation, unfortunately.

3h5154

Irving@ieqr_

@scaling01 I agree, it was an enormous letdown. It doesn't reach Opus capabilities and it's in the opposite Pareto frontier quadrant, the worst quadrant in token efficiency at least for the benchmarks

2h117

Luigi Pagani@Luigi1549898

@scaling01 This is probably haiku size, Opus 4.6 was the original Sonnet 5. They are just greedy

3h2343

AYLI@yanyan9_A

@teortaxesTex Liang Wenfeng if you can hear me obliterate the evil west V4.1 PRO MAX.finalversion to put an end to this charrade

3h594

Habanero@singularityHSN

@scaling01 then people would ask for sonnet 5… best they just pump out this filler model now and focus their attention on the big guns again, who cares about anything other than the SOTA

3h8151