Consider the problem of representing a number in computer memory, which is idealized as a sequence of zeros and ones. The binary number system is a well-known solution to this problem — for example, the sequence “01101” represents . But there’s a problem: You don’t expect the entire computer to just be used to represent one number; you expect it to have other things stored afterwards. So how do you tell where the number ends? If the sequence begins
does this represent the number
or
or
?
The solution to this problem most commonly used in practice is to declare in advance a fixed number of bits that will be used to represent the number, usually 32 bits or 64 bits. For example, if we fix a 5-bit representation the always means that the number is
and
represents other stuff in memory. This works well enough in practice, but it has a problem: The number of bits you set aside for storing the number forces an upper limit on how big the number can be. For example, you cannot store any number bigger than
in 64 bits. If the computer ever needs to store a bigger number than can be represented with space set aside then the computer fails.
I’ll introduce some terminology. The condition that a system of representing numbers is unambiguous can be phrased formally by saying the this method is a prefix-free code. A prefix-free code consists of a set of codewords, which are sequences of bits, such that no codeword is a prefix of another. A continuing stream of bits can be interpreted as a codeword by taking an initial segment that is a codeword, and by prefix-free property at most one such interpretation is possible. Since we want this code to represent numbers we also want a map from the codewords to the natural numbers (by which I mean including zero, naturally)