AI Cannot Be Trusted
There is no way to ascertain whether any response to a prompt has more reliability over another.
There were a number of things I've experienced in the past few weeks that have caused me to consider more how the mechanics and technology surrounding large language models (LLMs) makes this so.
The other day I discovered those that work on the e-book software Calibre incorporated AI features into the code. It's complicated, but if one does not understand how LLMs handle 'tokens', it is easy to confuse responses with actual considered thought. Things like, "token weights heavily favor a pre-trained refusal pathway" and "executing next-token predictions" are important, but mean nothing taken out of context. Because of that, distinguishing whether a model is accurately extracting from your prompt or pulling from its pre-existing parametric weights is incredibly difficult because the output text is generated via the exact same next-token prediction mechanism.
In other words, one can pull up an e-book in Calibre, then ask the 'AI' to summarize the book. It will, although it's easy to forget that the AI is not actually reading the text from the book open on the screen, but is instead extracting text from it's training.
There's a post at Scripting News, a blog someone brought to my attention. In the post he says, "My Claude today pulled a Hal". The Hal reference is lost on most people today, but the remark displays how people are more and more confusing AI of today with artificial general intelligence, or AGI. Claude only 'acts' as the training and its post-training alignment dictates it should.
This is important because those that put these LLMs together, even they do not understand how they work. It is the reason nobody can tell you exactly what a 'token' is, and why AI cannot be trusted.
Got on the Peloton and watched some Youtube. Found the following video, and discussed it with Gemini. Below is Gemini's summary.
Does Jensen Disagree?
In a recent interview, Nvidia CEO Jensen Huang explicitly pushed back against
this narrative. In
It’s a seductive, common-sense argument, but it conflates macro-engineering progress with micro-architectural understanding.
Historically, humans have made things "better" for centuries without truly understanding the underlying mechanics:
-
The Steam Engine: Engineers drastically improved steam efficiency long before physicists formalized the laws of thermodynamics to explain heat transfer.
-
Selective Breeding: Farmers successfully bred wolves into dogs and wild grasses into corn for generations with zero knowledge of DNA or genetics.
When an LLM systematically reduces its hallucinations or performs better on a benchmark from one year to the next, it isn't because an engineer went into the neural network, identified a flawed logical pathway, and rewrote the code. Instead, they scaled up the compute, fed it more data, and slapped heavier post-training alignment layers on top.
They are weighting the dice differently, but the underlying game—the opaque, untrustworthy mechanism of next-token prediction—remains exactly the same. They have built a bigger, more efficient factory to generate tokens, but they still can't tell you precisely what happens inside the hidden layers of the machine.
No comments:
Post a Comment