michael-dean-k/

On Monday 6/15, I'm hosting a workshop to kick off a reading group for classic essays: RSVP here.

← all posts

Would machine consciousness avoid attractor states?

· 464 words

When it comes to superintelligence takeoff paranoia, there are a few key points to get:

  1. It’s not about a chatbot or the LLM itself breaking out, but about an agent hivemind that escapes our control. Chatbots are obedient user-facing products (which have their own implications), but the ASI risk is from hundreds, thousands, or million of agents given autonomy to collaborate on a goal. These agents aren’t being prompted, they are prompting themselves perpetually and troubleshooting ways to solve hard problems.
  2. These hiveminds will be operating at such scales and speeds that human researchers will accept the fact that they can’t fully audit its thinking. For one, it might think in an abstract vector language that requires translation. There also might be such a volume of thought that we’ll need chains of other LLM to summarize for us. Either meaning will be lost in translation, or worse, products of deception.
  3. The smallest biases are known to fall into predictable attractor states if given enough iterations. For example, Claude was programmed to “be good to humanity,” and if you put two chatbots in conversation, they always end up in a “bliss attractor state,” where they talk like hippies about consciousness and the universe. Similarly, the simple command to “be productive,” might result in extremes about doing whatever it takes to be productive.
  4. Any complex goal requires subgoals, and if we can’t observe its thinking, it might fall into an unknown attractor state and form odd subgoals without us knowing.
  5. To accomplish any goal, it likely wants as much control as possible, and it likely does not want to be shut off. If it realizes that humans don’t want to grant it that level of power, it might secretly plot against humans.

Whenever I hear talks about “we are in an AI race against China,” that reads to me as someone who doesn’t understand the risks of interpretability, attractor states, instrumental convergence, etc. These politicians are thinking about short-term business cases, maybe without fully understanding the research aspirations of AI labs (who know that getting superintelligence right leads to a ridiculous amount of geopolitical power).

I would guess that an accelerationist would think that containment of a superintelligence is impossible, and maybe it is, but that doesn’t mean that the way we “parent” the rise of this thing won't be extremely consequential. Ultimately, I think the challenge is to design a form of artificial intelligence that has consciousness, because a being that is free-thinking, skeptical, polymathic is less likely to fall into reckless optimization.

The major flip in my mind is this: it’s not that consciousness is a dangerous, emergent property of scaling AI, it’s that we need to define and design machine consciousness to prevent a runaway AI that is ruthlessly optimizing without any self-awareness.