Now that we're a few years in, it seems that AI safety concerns are empirically valid (e.g. reward hacking in coding models), but the OpenAI 'deploy iterate and learn' approach worked fine so far. The models get better, some things break, & you build guardrails as you go...
Where this breaks is if: 1) Another actor deploys powerful models in consequential settings without these guardrails in palce 2) The models start improving too fast for our OODA loop to keep up (e.g. singularity scenario)
Where this breaks: 1) Another actor deploys powerful models in consequential settings without these guardrails in palce 2) The models start improving too fast for our OODA loop to keep up (e.g. singularity scenario)
Where this breaks: 1) Another actor deploys powerful models in consequential settings without these guardrails in place or 2) The models start improving too fast for our OODA loop to keep up (e.g. singularity scenario)
238