It generally doesn’t. It apologizes then will output exactly, very nearly the same thing as before, or something else that’s wrong in a brand new way. Have you used GPT before? This is a common problem, it’s part of why you cannot trust anything it outputs unless you already know enough about the topic to determine it’s accuracy.
Hallucinations are different from in-context learning. I’ve seen a number of impressive examples of this, enough that you should provide evidence that it generally doesn’t work. There are a bunch of papers on this topic, surely at least one would support your thesis?
And did you really just go “nuh huh its actually in binary”?
No, that is literally how knowledge is stored inside of neural networks. Plenty of papers have shown that the learning process is actually mostly about compression, since you distill the patterns of training data into smaller size data. This means that LLMs actually have concepts of things (which again has been shown independently, e.g. with Otello). These concepts are themselves stored as relationships between large amounts of numbers - that’s how NNs work.
I also fully understand how the tokenization process works and what the mentioned “symbols” are. Please explain what this has to do with anything. The model sees text in specific chunks as an optimisation, what does this change?
I’m a big boy who has already implemented his own LLMs from the group up, so feel free to skip any simplifications and tell me exactly, in detail, what you mean.
Hallucinations are different from in-context learning. I’ve seen a number of impressive examples of this, enough that you should provide evidence that it generally doesn’t work. There are a bunch of papers on this topic, surely at least one would support your thesis?
No, that is literally how knowledge is stored inside of neural networks. Plenty of papers have shown that the learning process is actually mostly about compression, since you distill the patterns of training data into smaller size data. This means that LLMs actually have concepts of things (which again has been shown independently, e.g. with Otello). These concepts are themselves stored as relationships between large amounts of numbers - that’s how NNs work.
I also fully understand how the tokenization process works and what the mentioned “symbols” are. Please explain what this has to do with anything. The model sees text in specific chunks as an optimisation, what does this change?
I’m a big boy who has already implemented his own LLMs from the group up, so feel free to skip any simplifications and tell me exactly, in detail, what you mean.