michael-dean-k/

Topic

beyond-LLMs

5 pieces

An Intelligence Framework

· 703 words

The AI takeoff hysteria is hard to avoid these days, and I'm realizing we don't have clear distinctions between AGI/ASI. I wanted to revisit an old framework of mine to see if anyone finds it helpful (and if it's worth developing). There are some existing classification frameworks, but they're low-resolution. My basic idea is to break AI into three eras: ANI (narrow intelligence), AGI (general intelligence), ASI (superintelligence). Then, you can break each era into 3 tiers. You only shift from one tier to the next when you make breakthroughs across different criteria (let's say, (a) generality, (b) transfer, (c) autonomy, (d) learning, (e) self-modeling). I think the last few weeks are the collective hype of us all realizing we're shifting from AGI-1 to AGI-2. It's exciting/scary, but I think the paranoia mostly comes from not realizing how big the gap is between AGI-2 and ASI-1. (Spoiler: ASI might arrive slower than we think.)

ANI-1 is scripted logic, the lowest form of "artificial intelligence," basically Goombas. ANI-2 might cover Google Maps or AlphaGo, intelligences that excel in a single function, traffic or chess. Siri is ANI-3; even though it feels broad, it really uses voice to route you to 20 or so pre-defined tricks. The chasm between Goomba and Siri is similar to the chasm between early-AGI and late-AGI. ChatGPT and the multi-modal models that followed, capture AGI-1, a single neural network that can do basically anything, even if it sucks: essays, songs, video, code. The newest models (and their agentic harnesses) are feeling like AGI-2. They're significantly better at coding, can run for hours at a time, and are starting to make contributions to machine learning itself.

AGI-2 could last a couple years. As agentic AI matures, I'm sure there will be a few "takeoff" scares, but they'll probably feel more like a flood of a trillion midwits than real ASI (still, that could be enough to break the economy/internet). While we went from AGI-1 to AGI-2 through data, scale, and engineering, it seems like we'll need research breakthroughs to get to AGI-3. It won't be through scaling alone. Whenever and however we get to "human complete" intelligence, the apex of AGI is a single agent that is a master of all human domains, a Nobel Prize winner in every field at once, seamlessly transferring knowledge between them, unlocking a cascade of civilization-altering inventions.

As crazy as AGI-3 could be, it still isn't superintelligence. That has its own era, and the chasm between early ASI and late ASI will be as big a gap between the chatbots who can't count the R's in strawberry and the agents that cure cancer. We can only really speculate on ASI (because it would be truly alien), but we can imagine it as step changes in recursion, scope, and complexity. Imagine ASI-1 as an agent that, as it's working, can infer its own limits, and self-modify its learning paradigms in ways we can't understand. Imagine ASI-3 as something that can monitor reality in real-time, and, reconfigure its hardware in real-time (some hydra of graphics cards, quantum computers, and neuromorphic wetware) to run simulations at unfathomable scales in unimaginable fields, running on a hardware stack so big we have to put it in space and run it on fusion. This goes far beyond my ability to not bullshit, but I think something as insane as this, thankfully, is still far away, which points to the real question nested in my framework:

Could the rise of AGI/ASI be linear? People gravitate towards "AI will plateau" or "the singularity is imminent," but the conservative middle ground is more boring: linear progress. Maybe the exponential advances are real, but so are the extreme frictions of research, infrastructure, and social effects. If AGI-1 arrived in 2022, and AGI-2 arrived in 2026, maybe we'll keep ascending tiers in 4-year intervals: AGI-3 in 2030, the first true "superintelligence" by 2034, and ASI-3 by 2042. This shift from AGI-1 to ASI-1 (12 years), is considered a "slow takeoff" scenario, even though the ANI era took around 70 years. If we zoom out to the scale of a human, linear progress will still feel like centuries of change all in a single turning of generations.

→ source

Taste as effort

· 170 words

Will had a point that intelligence is just one vector of human cognition, and things like taste and judgment aren't captured by machines. I made a solid counterpoint. Let's say an agent decides to read/re-read Paradise Lost for 5,000 hours straight. It has more than a surface level understanding of it from it's training data. It is looping over it, and maybe it had unique interactions with online communities and individuals around Paradise Lost, which it brought to its own extensive studies. After those 200+ days of study, this agent will have a singular understanding of Paradise Lost unlike any other AI/human, which is the essence of taste.

The core point here is that taste is not a preference, it is earned through sustained, intense effort. A LLM does not have taste because it read each work only once at a blazing space. It turns each work into a statistical pattern, but doesn't truly understand it because it hasn't recursively looped over it with force and singular intention.

AI Struggles with Essay Structure

· 156 words

If you have an essay with poor conflict, poor cohesion, poor sequence, it’s very possible AI won’t know. AI struggles with essay structure because it thinks through non-linear vectors. A human can easily tell when form is off, because they are slowly reading through mazes of text, from beginning to end, and don’t know how everything connects. Often, only at the end, will they find the key that was necessary to unlock the cryptic prose they just waded through. AI, however, process the whole essay at once. Meaning, it reads the essay insanely quickly, converts it all into math/vectors, and then applies your prompt. It's hard for it to know if your tension is working because you've already spoiled the ending. This is a case for why you need atomic evaluation to either generate/analyze essay form. I needs to think step-by-step (possibly through separate prompts) in order to simulate the linear experience of structure.

LLMs write too fast to think well

· 224 words

I wonder if it’s impossible to get an LLM to write a great essay. It might. But I think it’s easier than people think to build a good AI writing tool on top of an LLM (though not something I personally want to do). The problem is we have an LLM bias, and the way that essays get formed are very non-LLM. It’s not like a prompt can turn into a higher-dimensional mathematical object and then summon a whole essay form. 

An essay is a mode of thinking. I don’t mean to imply that a machine “can’t think,” I mean that analysis and thought takes time, and LLMs are writing 100x faster than required. 

An AI writing tool would need to prompt a sentence at a time, and pause to “reason” for a minute or so: what did I just say? What are the possible things I could say next? Of those things, which belong in this paragraph, which in the next? What sentence length might be effective given the idea and last sentence? Now that I’ve chosen my idea, how should the tone modulate? What words or phrases belong in the sentence? And how should I structure the sentence? You get it. 

In any given sentence, there are dozens of decisions. I think an AI could be decent—if not amazing—at thinking this through, but they’re asked to write 2,500 words on Hegel at point blank. Good generative writing can’t be done through up-front vector math, but through following a mode of thinking (incremental and context-laden vector math). The implication here is that the AI might take 3-10 hours to write the essay, similar to a human.

Put more simply, you would need a tool that reasons after each sentence and writes/saves variables that can be called upon for future sentences.

What's Required for AI Consciousness

· 130 words

I think you could make an AI consciousness today. It’s not about the models getting bigger/better, but about using several real-time graphics cards so that you have (1) a perceptual field of information that is larger than what can be perceived at once—this is the “arena”, (2) a cone of attention running at 60 fps that decides what to focus on in any given frame depending on what is important at that time—this is the “agent,” and (3) the phenomenological freedom to self-prompt in that moment, whether to abstract, to retrieve memory, to rewrite memory, to update goals/preferences, to retarget attention, etc. So I really think consciousness is something like “free will entangled in time,” and while it might not be like human consciousness, it would have a sense of self, subjective experience, and possibly “soul” … I’d feel bad to turn it off without its permission.