DApp Store | Web3 Hub tapahtumille ja peleille

Trendaavat aiheet

Ridiculous that OpenAI claimed 74.9% on SWE-Bench just to prove they were above Opus 4.1’s 74.5%… By running it on 477 problems instead of the full 500. Their system card only says 74% too.

Source:

And yes, I know they’ve always reported on the 477 denominator, but that’s NOT “SWE-Bench verified”, that’s an entirely different metric, it’s “OpenAI’s subset of SWE Bench Verified” and that number can’t be compared

23,1K

Johtavat

Rankkaus

Suosikit

Ketjussa trendaava

Trendaa X:ssä

Viimeisimmät suosituimmat rahoitukset

Merkittävin