In 2015 MIRI proposed quantilizers as a decision theory criterion that an AI may use that allows it strive towards a goal while not optimizing too hard at the goal, so that if a superintelligent were a quantilizer there’s a smaller chance it will do something catastrophic like kill all humans and turn all matter on Earth into paperclips because it’s trying really hard to maximize the productivity of a paperclip factory. For a real number , a p-quantilizer is an agent which, given a goal and a space of possible strategies, picks randomly out of the top proportion , i.e., randomly among any strategy that is better than a proportion of all strategies. In fact, a quantilizer is for all practical purposes equivalent to a a perfect optimizer (an agent that always picks the best of all its possible strategies) that is further restricted to have its strategy specifies by a string of bits whose length must be less than . Since the people at MIRI have already considered the possibility of controlling an AI by restricting its ability to perform input and output and have generally decided that they are not satisfied with this, I consider the idea of quantilizers to be a failure.
Roughly speaking, my claim is that for a natural number , up to a constant additive factor, a perfect optimizer that is limited to strategies taking at most bits performs as well in any goal as a -quantilizer. More precisely, to explain what I mean by “up to a constant additive factor”, there is a reasonably-sized constant such that an optimizer with bits of outputs performs as well as a -quantilizer, and a -quantilizer performs as well as an optimizer with bits of output. I also need to assume that the space of strategies is given a rich enough encoding that we can do things like specify a strategy by specifying a computer program whose output is that strategy, or that the environment is rich enough that this can be done implicitly with strategies like “Walk to a computer and type this program, and attach to the computer the accessories necessary to run the strategy specified by the program.” To really make the result true up to a constant factor and not a logarithmic factor or anything like that you need to careful about encoding using stuff like prefix-free encodings but I don’t think that’s really important and will gloss over it.
Quantilizers are at least as good as bounded-output optimizers: If a particular strategy is optimal among all strategies of length bits, then since there are at most such strategies this one optimizing strategy already makes up a proportion of all strategies with bits, so a -quantilizer over strategies with bits must use exactly this strategy or an equally-performing strategy. If we assume the strategies have a measure biased towards short strategies or use a prefix-free encoding or a subset of the strategies are implicitly something like that, a proportion of all strategies will have length , so a proportion will be identical to this -bit optimal strategy, so a -quantilizer will perform at least as well as this strategy.
Bounded-output optimizers are at least as good as quantilizers: Using a deterministic pseudo-random number generator, it is possible to precisely specify a sequence of strategies that are fairly sampled from the space of all strategies. A proportion of these strategies are also in the top of performance among all strategies. So the smallest such that is among the top in performance is statistically certain to have . It takes bits to specify the , plus an additional bits to specify the pseudo-random sequence and the command to perform strategy given . So an optimizer with bits of output can specify the strategy which is -quantilizing, so the actual optimizing strategy is at least as good.