Watching GPT-5’s big router keep failing for me convinces me that “LLMs as judges” and synthetic-data-to-train-better-SOTA-LLMs are just wishful thinking.
1,21K