|Tupper’s Self-Referential Formula Debunked
||[Jun. 22nd, 2011|01:37 pm]
I have a habit of tweeting things that necessitate long explanatory blog posts. A few days ago, I wrote of the “amazing” Tupper’s self-referential formula,
The formula in question is
which, when plotted over a 106x17 rectangle starting at (0, n) where n is a certain 543-digit number, results in:
i.e., it plots “itself”.
In this post I will explain:
This formula is not self referential, any more than a program that reads its source code from disk and prints it is a quine.If we want to construct a formula that prints itself, the first thing to realize is that there’s no possible way that the mathematical content of the equation has enough “entropy” to encode the information content of the image, so a true self-referential formula is impossible. I will return to this point at the end, but let’s take it as a given for now.
The value of n encodes the bitmap of the image, and the formula acts as a print statement. It turns pixels on or off according to the binary representation of n.
The formula is deceptively impressive; I will show you how to construct a much simpler one that does the same thing.
Even though the claim of self-referentiality is bogus, the formula does something kinda cool.
So let’s instead “cheat” and encode the image elsewhere: say in the value of the co-ordinates at which the formula prints itself. We could pick either x or y, but let’s arbitrarily pick the y co-ordinate. How exactly do we encode the bitmap? Let’s pick the simplest possible encoding: the bits of n, which is the starting value of the rectangle’s y co-ordinate, will represent individual pixels.
Now all that the formula needs to do is: for each value of (x,y), output a specific bit of n. A slight catch is that the formula doesn’t have access to n, but it has access to y which is approximately equal to n everywhere within the rectangle (remember that n is a several hundred bit number). So if we ignore the right-most few bits of y, the rest of the number should be the same as n.
How do we output a given bit of n (or y) at the corresponding pixel? The first step is to turn the pixel coordinates into an array index. If h is the height of the image (we don’t know what the width and height are going to be yet), we can turn pixels into consecutive array indices like this:
hx + y
That would work nicely if the image started at the origin, but since it doesn’t, the array indices from the above formula won’t start at zero. We can fix this by instead using
hx + y%h
This ensures that we only look at the lower-order bits of y. We’ll need to make sure later that n is divisible by the height h, so that y%h has an initial value of 0. This will be easy since we’re not using the lower-order bits of n to encode the image, so we’ll free to play with those bits to make it divisible by h. Note that y has two functions: it acts as a bitmapped array, but the last few bits, together with x, act as an index into this array.
Now, given an array index hx + y%h, we can extract the corresponding bit of y (treated as an array) as follows:
y >> (hx + y%h) & 1
That is, first we right-shift y by the given number of pixels, then look at the last bit (standard C operators and precedence).
That’s it! That’s our formula. Happily, it fits neatly in one line, which means that we can substitute 10 in place of h h if we’re going to use a 10-point font.
One little gotcha: this prints all the bits of y including the right-most, but we need to avoid printing the few right-most bits since we don’t control them. We could modify the formula to add some extra right shift, or we could declare that we’re going to start the rectangle at x=1 :-) This means the minimum value of the index is going to be 10, which lets us ignore the last 10 bits. Behold, the formula:
prints itself at the rectangle bounded by 1 <= x <= 100, n <= y < n+10 where n is
I calculated n by encoding the image in binary, then left-shifting by 10 bits, and adjusting the last few bits so that it is divisible by 10.
This is precisely what Tupper’s formula does as well, except (1) it assumes real-valued x and y instead of integer valued, necessitating floor signs, (2) it uses the ‘mod’ function instead of the % operator, (3) it uses negative exponentials of 2 instead of right shift; (4) because of the exponentials, the height of the image is bigger: 17. All of these make the formula look more profound, especially the appearance of ‘17’. I too, at first sight thought the numerical properties of 17 had something to do with it before I realized it was simply the height of the image!
If the formula I’ve showed you, as well as Tupper’s, aren’t self-referential but merely print statements, shouldn’t they be able to print any input and not just themselves?
If there’s anything actually interesting about these two formulas it’s that graph of their plots over the integer plane contain all possible bitmaps (of height 10 and 17, respectively) if you look in the right area. For example, to get
from my formula, simply look at (1,1393927028945687951208608908854274144862220) whereas
is at n=2839549863835611243770369885760240519880614661852209842047097422873116957197820823170688580239030514282972465171635651597931422679050
is at n=10972249630976314209007504532033134949682086639198171874178386489416721212125763915435428634919871166306043061044372149862503608116825035242203771489961723271194035787338485934044510302195667783070693718545221897410050191506550799292424771369239643895773826749855239799677873092167849216646372880455565320
Deep, isn’t it? Just kidding. It’s kind of like the infinite monkey theorem—yes, the graphs of these formulas contain the works of Shakespeare as well—except you also know exactly where in the graph to find them.
Let me close by explaining why it’s impossible to have a truly self-referential formula—in other words, a formula that prints itself starting at (0,0) is impossible. Such a formula would have to be truly self-referential to be able to print itself because it would have no external memory to rely on.
Consider the symbol ‘10’ in the formula I presented: it takes up over a hundred pixels—a hundred bits. However, it only encodes a 4-bit integer! (10 is 1010 in binary.) In other words, there is a huge discrepancy between the information content of the image and the entropy of the mathematical content of the formula. Even if you consider all the nonnumerical symbols that could have been encoded in 100 pixels, the discrepancy persists, and will only (almost) go away if you use something like a 2-bit font, at which size the text is unreadable. You might hope that the formula could somehow contain a ‘compressed’ encoding of itself, but mathematical operations aren’t good enough to do compression.
The only way to get around this limitation is to somehow augment the formula with “memory”. The easiest way to do it is to generalize from “formula” to “source code’ in (say) C, which means you can freely assign values to variables. Such self-printing code is nothing but a quine, albeit in a particularly nasty language, and I suspect it’s doable.
But a minimal way of endowing a formula with memory, and one that at least pays lip service to the spirit of the word “formula”, is to allow an assignment statement in addition to the usual mathematical operations, but nothing more. I don’t see an immediate reason why a self-referential formula of this type can’t exist, but at the very least, finding one would be an absolutely Herculean task. [Edit. Turns out there's a simple way to do it if you allow assignment; see comments.]