GPT-5 system card capability evals reactions thread. First observation: ~no improvement on all the coding evals that aren't SWEBench
Very cool new benchmark
Interesting that the model knew not just that it was in an eval, but the exact task and organization running it
37,76K