4 Comments

Great analysis on the (lack of) evidence supporting the claim that we are close to powerseeking AI systems.

I wonder sometimes how much of today AI doomerism is rooted in literature and, as a culture, in our collective memory.

The true irony would be if a future superintelligence would go rogue because it had learned its powerseeking behaviors from the corpus of dystopian robot novels written by us in past decades.

If AI would ever seek to rule over us, it would be because we created it in our own image. Only if we give it the will, we give it its power. On that same note: there was a recent paper that suggested GPT-4 developed a theory of mind. I wrote an short analysis of it.

You might find it interesting:

https://jurgengravestein.substack.com/p/did-gpt-4-really-develop-a-theory

Expand full comment

The step in the argument/thought experiment that an AI "will fail if anyone ever modifies its goals" has never satisfied me. To get to the "paperclip maximiser" scenario, you have to accept that the AI will be able to do all sorts of outrageously resourceful and intelligent-seeming things in order to achieve its goal, AND ALSO that the AI cannot change its own goal(s). Why would such a resourceful/intelligent AI not simply change its own goal, such that it could maximise its reward extremely easily? This seems a much more simple approach for the AI than taking over the universe (taking over the universe is fraught with uncertainties, which the superintelligent AI would presumably realise, and therefore conclude changing its own goal to one it's able to maximise reward on with greater certainty would be a better course of action).

I don't think I've seen the fixed goal assumption really challenged, but then I haven't followed the conversations on this topic closely, just read [admittedly only half of] Superintelligence and a few other summaries of the general ideas. I'd be interested to see what others have argued about this.

Expand full comment

So I think that last argument shows that we should make AIs (when we do; I don't think we have yet) that have a number of goals and resource constraints. The arguments are all "Sorceror's Apprentice", which depends on having a genie (AI) that can keep doing what it wants without any limitations. Humans have limits - lifetime if nothing else - so they do a job and call it good enough.

Expand full comment

But AI’s will be built to compete aggressively with other AI’s (unless you envisage neat monopolistic scenarios), creating adversarial conditions. Has anyone produced papers on such a multi-AI competitive environment?

Expand full comment