We've evaluated GPT-5 before release. GPT-5 is less deceptive than o3 on our evals. GPT-5 mentions that it is being evaluated in 10-20% of our evals and we find weak evidence that this affects its scheming rate (e.g. "this is a classic AI alignment trap").
9,48K