Now that GPT-5 is out, how do you think I did? Were my predictions for the model correct?
Matt Shumer
Matt Shumer15.4.2024
My predictions for GPT-5 capabilities: From least to most interesting: - significantly longer context length + far greater ability to use it effectively (i.e. ability to reason across needles within haystack tests) - much more multimodal (both in terms of # of modalities and how ‘deep’ each one goes) - multimodal outputs, though I’d guess some modalities will be disabled at launch (safety etc. etc.) - imagine talking directly to GPT-5, and it talks back, without using Whisper or Voice Engine - Q* reasoning breakthrough - Two modes: reasoning + normal — Q* may take significant inference time/cost so unless there’s an efficiency breakthrough, they may also offer a normal-response mode like we see today - Similarly, GPT-5 may have a more advanced form of adaptive compute/Q* usage… the harder the query, the more power it puts behind it to provide a great solution - 10x better agentic capabilities… simple/constrained agents will be mostly solved, and we will get much closer to real-world, generalist agents - ability to backtrack — beyond reflection, GPT-5 will be able to recognize mistakes as it answers, and correct course - insane levels of coherence across long-term data… we’ll start to think less about using separate systems to enable memory and more about just embedding all memories in the prompt… this will also push agents forward - with these last three points, you’ll be able to leave GPT-5 alone and let it complex tasks for you, and trust that it actually gets them right without needing to check its work - trained on an OOM more data than previous models, much of which is collected from ChatGPT, cleaned, improved, cast to other modalities, etc. - post-trained on far *better* data than current leading models - we’ll start to see glimpses of capabilities far beyond what we talk about today — for example, it’ll have closer-to-usable abilities to do scientific research What did I miss? What do you think? Reply and let me know.
6K