<p>The one question left is: how much do humans hallucinate and what percentage of training data out there can be classified as hallucinations? It is easiest to answer it from contrariwise: to identify which data is definitely not hallucinated. The only texts that meet this criterion are the ones that are not interpretations, i.e. not written to explain something after the fact. But if you try to think about it, almost all text ever written is written after the thing it is describing had already happened. The only exceptions are the text which are self-contained and do not refer to the material reality at all: mathematical problems with their solutions, some source code and perhaps the most abstract strains of philosophy. An even smaller subset of that small subset, problems with objectively verifiable solutions, are what the o1-o3 model series are trained on. It is also not coincidental that all previous instances of AI systems outperforming humans happened in self-contained problems with known rules, such as in games.</p>
0 commit comments