upd:interpretations

funbotan · funbotan · commit 9597e4757d30 · 2025-01-12T21:13:44.000+01:00
diff --git a/drafts/interpretations-en.html b/drafts/interpretations-en.html
@@ -134,6 +134,7 @@ <h2>Hold my dataset</h2>
 <p>The one question left is: how much do humans hallucinate and what percentage of training data out there can be classified as hallucinations? It is easiest to answer it from contrariwise: to identify which data is definitely not hallucinated. The only texts that meet this criterion are the ones that are not interpretations, i.e. not written to explain something after the fact. But if you try to think about it, almost all text ever written is written after the thing it is describing had already happened. The only exceptions are the text which are self-contained and do not refer to the material reality at all: mathematical problems with their solutions, some source code and perhaps the most abstract strains of philosophy. An even smaller subset of that small subset, problems with objectively verifiable solutions, are what the o1-o3 model series are trained on. It is also not coincidental that all previous instances of AI systems outperforming humans happened in self-contained problems with known rules, such as in games.</p>
 <p>So, I will make a grand statement: <strong>LLMs cannot reason about the material world and will never be able to do so</strong> due to the nature of how humans produce language. This is to say nothing about future AI systems that gain an ability to interact with the material world directly, without text mediation. They will have their own problems, but we’ll talk about those sometime later.</p>
 <p>Combining this with the conclusion of the first section, we arrive at a picture that is very similar to the present-day state of the art: LLMs excel at toy problems, exercises and benchmarks, but struggle to produce anything of real material value.</p>
+<p>All this is not a limitation of deep learning itself. What I am really trying to say here is that <strong>language is not a <em>shortcut</em> to general intelligence</strong> (if such a thing exists at all, but that’s a whole different conversation). To achieve it, we will have to painstakingly tackle every specific problem and slowly, steadily generalize.</p>
 <p>It’s hardly surprising that people are seeing LLMs cracking problems that only a handful of humans could solve before and assume that their performance on real-life tasks will be comparable. But this is a mistake that comes from anthropomorphizing algorithms, a mistake that we have made many times before.</p>
 <p>Remember when playing chess was used as a measure of intelligence? No? Me neither. I was born around the time that Deep Blue defeated Garry Kasparov, proving that winning chess does not actually require intelligence, because Deep Blue clearly did not posses it. And <a href="https://youtu.be/30D00BDvfTA?si=iKETtHUZ8CY9H9jy">far worse metrics</a> have been used before. Something tells me that, in a decade or two, we will be viewing contest math problems as something not very different from chess: as powerlifting for the brain; something you do purely for self-improvement. After all, we don’t hire powerlifting champions to actually lift stuff, we just use cranes.</p>
 <div class="footnote">
diff --git a/en/atom b/en/atom
@@ -47,6 +47,7 @@
 &lt;p&gt;The one question left is: how much do humans hallucinate and what percentage of training data out there can be classified as hallucinations? It is easiest to answer it from contrariwise: to identify which data is definitely not hallucinated. The only texts that meet this criterion are the ones that are not interpretations, i.e. not written to explain something after the fact. But if you try to think about it, almost all text ever written is written after the thing it is describing had already happened. The only exceptions are the text which are self-contained and do not refer to the material reality at all: mathematical problems with their solutions, some source code and perhaps the most abstract strains of philosophy. An even smaller subset of that small subset, problems with objectively verifiable solutions, are what the o1-o3 model series are trained on. It is also not coincidental that all previous instances of AI systems outperforming humans happened in self-contained problems with known rules, such as in games.&lt;/p&gt;
 &lt;p&gt;So, I will make a grand statement: &lt;strong&gt;LLMs cannot reason about the material world and will never be able to do so&lt;/strong&gt; due to the nature of how humans produce language. This is to say nothing about future AI systems that gain an ability to interact with the material world directly, without text mediation. They will have their own problems, but we’ll talk about those sometime later.&lt;/p&gt;
 &lt;p&gt;Combining this with the conclusion of the first section, we arrive at a picture that is very similar to the present-day state of the art: LLMs excel at toy problems, exercises and benchmarks, but struggle to produce anything of real material value.&lt;/p&gt;
+&lt;p&gt;All this is not a limitation of deep learning itself. What I am really trying to say here is that &lt;strong&gt;language is not a &lt;em&gt;shortcut&lt;/em&gt; to general intelligence&lt;/strong&gt; (if such a thing exists at all, but that’s a whole different conversation). To achieve it, we will have to painstakingly tackle every specific problem and slowly, steadily generalize.&lt;/p&gt;
 &lt;p&gt;It’s hardly surprising that people are seeing LLMs cracking problems that only a handful of humans could solve before and assume that their performance on real-life tasks will be comparable. But this is a mistake that comes from anthropomorphizing algorithms, a mistake that we have made many times before.&lt;/p&gt;
 &lt;p&gt;Remember when playing chess was used as a measure of intelligence? No? Me neither. I was born around the time that Deep Blue defeated Garry Kasparov, proving that winning chess does not actually require intelligence, because Deep Blue clearly did not posses it. And &lt;a href="https://youtu.be/30D00BDvfTA?si=iKETtHUZ8CY9H9jy"&gt;far worse metrics&lt;/a&gt; have been used before. Something tells me that, in a decade or two, we will be viewing contest math problems as something not very different from chess: as powerlifting for the brain; something you do purely for self-improvement. After all, we don’t hire powerlifting champions to actually lift stuff, we just use cranes.&lt;/p&gt;
 &lt;div class="footnote"&gt;
diff --git a/en/interpretations.html b/en/interpretations.html
@@ -135,6 +135,7 @@ <h2>Hold my dataset</h2>
 <p>The one question left is: how much do humans hallucinate and what percentage of training data out there can be classified as hallucinations? It is easiest to answer it from contrariwise: to identify which data is definitely not hallucinated. The only texts that meet this criterion are the ones that are not interpretations, i.e. not written to explain something after the fact. But if you try to think about it, almost all text ever written is written after the thing it is describing had already happened. The only exceptions are the text which are self-contained and do not refer to the material reality at all: mathematical problems with their solutions, some source code and perhaps the most abstract strains of philosophy. An even smaller subset of that small subset, problems with objectively verifiable solutions, are what the o1-o3 model series are trained on. It is also not coincidental that all previous instances of AI systems outperforming humans happened in self-contained problems with known rules, such as in games.</p>
 <p>So, I will make a grand statement: <strong>LLMs cannot reason about the material world and will never be able to do so</strong> due to the nature of how humans produce language. This is to say nothing about future AI systems that gain an ability to interact with the material world directly, without text mediation. They will have their own problems, but we’ll talk about those sometime later.</p>
 <p>Combining this with the conclusion of the first section, we arrive at a picture that is very similar to the present-day state of the art: LLMs excel at toy problems, exercises and benchmarks, but struggle to produce anything of real material value.</p>
+<p>All this is not a limitation of deep learning itself. What I am really trying to say here is that <strong>language is not a <em>shortcut</em> to general intelligence</strong> (if such a thing exists at all, but that’s a whole different conversation). To achieve it, we will have to painstakingly tackle every specific problem and slowly, steadily generalize.</p>
 <p>It’s hardly surprising that people are seeing LLMs cracking problems that only a handful of humans could solve before and assume that their performance on real-life tasks will be comparable. But this is a mistake that comes from anthropomorphizing algorithms, a mistake that we have made many times before.</p>
 <p>Remember when playing chess was used as a measure of intelligence? No? Me neither. I was born around the time that Deep Blue defeated Garry Kasparov, proving that winning chess does not actually require intelligence, because Deep Blue clearly did not posses it. And <a href="https://youtu.be/30D00BDvfTA?si=iKETtHUZ8CY9H9jy">far worse metrics</a> have been used before. Something tells me that, in a decade or two, we will be viewing contest math problems as something not very different from chess: as powerlifting for the brain; something you do purely for self-improvement. After all, we don’t hire powerlifting champions to actually lift stuff, we just use cranes.</p>
 <div class="footnote">