Civilization Shaped by Fundament Physical Postselection

I delayed writing on this for a long time because I was worried it may be an infohazard. I don’t think it is, and I think if this is true then it is valuable that people know about it, so I’m publishing it.

Imagine there was a physically real effect of postselection. That is, there is a simple bare probability distribution among possible initial conditions for the universe, and the universe we inhabit is not sampled from this bare distribution but rather from a subsample of all initial conditions that satisfy some criterion about the future, or a distribution that’s weighted by some function of the future. In other words, the history of the universe as a direct physical law is optimized to satisfy some criterion or maximize some function about its future.

One possible history to optimize this criterion is if intelligent life evolved that deliberately took action to optimize this criterion. This might be the optimal history if there were a long time-period that was weakly constrained by postselection where life can evolve which temporally precedes a highly constrained region of spacetime that is constrained to a conformation that is far more easily achieved by intelligent design than any abiotic mechanism. This intelligent species will become fully devoted and competent towards satisfying this optimization criterion, except to the extent that physics does not force the criterion to be maximally satisfied or the components that are more easily satisfied by luck than deliberate design.

The civilization of this species will develop in the most likely way possible that reaches this end state. For the most part it looks just like history that is not postselected. At some point the civilization commits to the value of satisfying the postselection criteria. This could start with a small number of people proposing that something like this is possible, then the idea gets refined as the underlying physics is better understood and the moral aspects thought out, and then it gets popular or influential to the point of complete fixation. The few people that first propose the idea are in the unusual situation where they are also the first to anticipate that it will become influential for a rather different reason than what ordinarily makes ideas influential. This weird state of affairs is what was making me worried about infohazards.

However, all things considered I have decided even in the unlikely case where something like this is true, it is preferable for me to share this idea in the open than to try to keep it private. My general impression is suspicious of “here’s an idea that shouldn’t be spread” so I need a strong countervailing reason against this one. In the likely event that nothing like this idea is true then there is no anomalous effect that could provide such a reason. If this idea is true then an individual choice to not spread it is unlikely to be effective. Finally, I don’t think a block-at-all-points-of-intervention strategy for such a future is what we should be doing; while the philosophical issues are subtle here trying to prevent the future discussed above seems more it would make us not exist than give anything good to us. I was thinking maybe there’s some variant of this theory where it would be harmful for us to know about it but while it is conceivable such a variant exists I don’t see a plausible worldview where such a variant is more likely than one where knowing about it benefits us. I still thought I was really too much on my reasoning for subtle philosophical issues that may affect many other peopler, so I consulted with a few other people before publishing, and nobody else objected. I then procrastinated far longer than reasonable to make a definitive decision, and now here we are.

I know of two plausible physical models for post-selection. Both are simple and natural modifications of our current understanding of the laws of physics. The first is the two-state formalism for quantum mechanics. In addition to an initial state |\psi_i \rangle of a quantum system a final state |\psi_f \rangle is specified as well. For any observable O, considered in the Heisenburg picture, its actual expectation value is \langle \psi_f| O |\psi_i \rangle. This doesn’t actually work as stated because these “expectation values” will have phase terms in every history that contributes to them and won’t be real numbers so there has to be a more nuanced interpretation of them, but I think this problem is solvable. The second model is making the dynamics slightly non-unitary. In a path-integral formulation this can be done by adding a small imaginary component i \epsilon P to the Lagrangian. This adds a term e ^{- \epsilon \int P} to the path integral which can be thought of as a weighing factor that favors some histories over others, specifically favoring histories where P is minimized throughout all of space. For \epsilon small this won’t perceptibly change the probabilities of small-scale events but will direct the large-scale trajectory of the universe to minimize P.

Postselection provides a potential solution to the cosmological constant problem, in that a low cosmological constant allows long-lasting stable structures that are more conducive to evolution, and that might more than make up for the improbability of fine-tuning a low cosmological constant. This is similar to Steven Weinberg’s anthropic explanation for the cosmological constant, but is different in two ways:

  1. Postselection may require intelligently designed devices to fill all of space, and which requires all of space to be in the future lifecone of places where life evolves. This is a stricter criterion than requiring that intelligent life exists at all and may give smaller estimates for the cosmological constant,
    closer to the observed value.
  2. Postselection is a purely physical mechanism and doesn’t depend on the controversial philosophy behind the use of the anthropic principle. I particularly don’t buy the use of the anthropic principle in Weinberg’s anthropic explanation but I can imagine other people objecting to some philosophical stance behind postselection, so this one is more subjective.

I think this is the strongest reason to think physical postselection is actually true and not just a intriguing possibility; in fact, in my personal opinion this theory is the best explanation for the low cosmological constant that I am aware of.

It may be worth thinking thinking about applying a postselection theory to explain the other unnatural physical constant, the Higgs mass. Current measurements of its value place the universe just at the tipping point of metastability, where its still not known whether the universe can undergo vacuum decay. This seems “teleological” in a way potentially amenable to an explanation involving postselection with some future vacuum decay.her the universe can undergo vacuum decay. This seems “teleological” in a way potentially amenable to an explanation involving postselection with some future vacuum decay.

Quantilizer ≡ Optimizer with a Bounded Amount of Output

In 2015 MIRI proposed quantilizers as a decision theory criterion that an AI may use that allows it strive towards a goal while not optimizing too hard at the goal, so that if a superintelligent were a quantilizer there’s a smaller chance it will do something catastrophic like kill all humans and turn all matter on Earth into paperclips because it’s trying really hard to maximize the productivity of a paperclip factory. For a real number 0 \leq p \leq 1, a p-quantilizer is an agent which, given a goal and a space of possible strategies, picks randomly out of the top proportion p, i.e., randomly among any strategy that is better than a proportion (1-p) of all strategies. In fact, a quantilizer is for all practical purposes equivalent to a a perfect optimizer (an agent that always picks the best of all its possible strategies) that is further restricted to have its strategy specifies by a string of bits whose length must be less than \log_2 (1/p). Since the people at MIRI have already considered the possibility of controlling an AI by restricting its ability to perform input and output and have generally decided that they are not satisfied with this, I consider the idea of quantilizers to be a failure.

Roughly speaking, my claim is that for a natural number n, up to a constant additive factor, a perfect optimizer that is limited to strategies taking at most n bits performs as well in any goal as a 2^{-n}-quantilizer. More precisely, to explain what I mean by “up to a constant additive factor”, there is a reasonably-sized constant c such that an optimizer with n+c bits of outputs performs as well as a 2^{-n}-quantilizer, and a 2^{-(n+c)}-quantilizer performs as well as an optimizer with n bits of output. I also need to assume that the space of strategies is given a rich enough encoding that we can do things like specify a strategy by specifying a computer program whose output is that strategy, or that the environment is rich enough that this can be done implicitly with strategies like “Walk to a computer and type this program, and attach to the computer the accessories necessary to run the strategy specified by the program.” To really make the result true up to a constant factor and not a logarithmic factor or anything like that you need to careful about encoding using stuff like prefix-free encodings but I don’t think that’s really important and will gloss over it.

Quantilizers are at least as good as bounded-output optimizers: If a particular strategy is optimal among all strategies of length \leq n bits, then since there are at most 2^{n+1} such strategies this one optimizing strategy already makes up a proportion 2^{-(n+1)} of all strategies with \leq n bits, so a 2^{-(n+1)}-quantilizer over strategies with \leq n bits must use exactly this strategy or an equally-performing strategy. If we assume the strategies have a measure biased towards short strategies or use a prefix-free encoding or a subset of the strategies are implicitly something like that, a proportion \geq \frac {1} {c} of all strategies will have length \leq n, so a proportion \geq 2^{-(n+c)} will be identical to this \leq n-bit optimal strategy, so a 2^{-(n+c)}-quantilizer will perform at least as well as this strategy.

Bounded-output optimizers are at least as good as quantilizers: Using a deterministic pseudo-random number generator, it is possible to precisely specify a sequence s_0, s_1, s_2, \dots of strategies that are fairly sampled from the space of all strategies. A proportion 2^{-n} of these strategies are also in the top 2^{-n} of performance among all strategies. So the smallest m such that s_m is among the top 2^{-n} in performance is statistically certain to have m \leq C 2^n. It takes n + O (1) bits to specify the m, plus an additional O (1) bits to specify the pseudo-random sequence and the command to perform strategy s_m given m. So an optimizer with n + O (1) bits of output can specify the strategy s_m which is 2^{-n}-quantilizing, so the actual optimizing strategy is at least as good.

Decoherence is Dephasement, Not Disjointness

According to the many-worlds interpretation of quantum mechanics, when a measurement is performed a superposition between two microscopic states evolves into a macroscopic superposition between a universe where one outcome was observed and a universe where the other outcome was observed. This leads people to the popular picture of a tree-like structure for the evolution of the wavefunction, which is thought to be focused on many “branches” found in different parts of phase space, which are constantly splitting during quantum indeterminacies, but these split branches never rejoin once they’re macroscopically distinct. This picture is incorrect: Although the size of phase space is exponentially large (in the number of particles) the number of branches also grows exponentially and would quickly overtake it, densely occupying the physically plausible portion of phase space. It is statistically inevitable that macroscopically distinct branch would routinely reconverge. The reason why we don’t observe quantum interference between macroscopically distinct states is not that such states never converge to the same outcome state. Instead, when a outcome state can be reached by many macroscopically distinct pathways, the relative phases of these pathways are effectively random, so statistically the final amplitude corresponds to a probability that is approximately the sum of the probabilities of the distinct pathways as though there were no interference.

If you’re wondering why you haven’t heard of this form of macroscopic interference, well, I haven’t either. I’m not basing this on any article or folklore of professional physicists, but my own attempt to reason with the underlying physical laws. Tell me if someone published this before, I haven’t checked.

Model 1: Isolated equilibrium system

The simplest model system is an equilibrium gas perfectly isolated from its environment. The semiclassical description of it is a collection of particles with positions and velocities, occasionally interacting. For a quantum mechanical state, let N be the number of branches it has measured as the number of semiclassical states on which the wavefunction has a significant weight (if you’re really nitpicky make that a volume of phase space and divide by the appropriate power of \hbar). Every time two particles scatter the outgoing scattering state is spread out between multiple trajectories, so one branch turns into multiple branches. The frequency that scattering occurs is proportional to the volume, and this splitting occurs in every pre-existing branch, so we have approximately

\frac {d N} {d t} \propto V N
\frac {d \log N} {d t} \propto V

On the other hand, the size of the relevant portion of phase space is given by the entropy S, which is proportional to volume

S \propto V

It follows that there is a fixed time t_0 after which the branch count N must overtake the phase space volume \exp (S), and this time t_0 must be independent of the volume. After this time there will be interference between macroscopically distinct trajectories. This t_0 is approximately the mean free path time.

Model 2: Equilibrium system interacting through surface.

The previous model is unrealistic because it assumes the system is perfectly isolated. In reality, every system is entangled with the rest of the universe, and the phase space of the entire universe is much larger than the branching rate of a single small thermal system within it. Can this entanglement give space for the branches to remain apart? No. For one thing, while this one system is branching, so is the rest of the universe. In the previous model we can take our system to be the whole universe, and the time to phase space saturation is independent of volume and so is still small. Secondly, even with pessimistic assumptions about the amount of branching in the rest of the universe, as long as the system only interacts with the rest of the universe through their boundary, surface-area-to-volume considerations still favor the saturation of branches in phase space. If the system only interacts with the rest of the universe on its boundary (including radiation on the boundary that may have originated deep inside the system), then at least we can say that if two starting states are exactly the same on the boundaries at all time then the state of rest of the universe will be the same too. Therefore a conservative model of the rest of the universe is that creates and stores a complete copy of the state of the system at its boundary according to some basis. Then the entropy of the phase space is not constant, but increases as the size of the ledger increases. Since this ledger only records the information on the surface this increase is proportional to the surface area:

S \approx S_0 + C_0 A t

On the other hand, as discussed before the logarithm of the number of branches increases with volume:

\log N (t) \approx C_1 V t

If \frac {V} {A} > \frac {C_0} {C_1}, which is true for any big enough system, then eventually branch production will overtake the phase space for the branch to split even including the additional phase space from interaction of the system with its environment.

Disjointness through macroscopic records

There’s another way both of the above models are oversimplified, which is that I imagined the system in thermal equilibrium and ignoring the structure of the state space dynamics. By merely counting the number of branchings I can show that some branches converge with some other branches, but it’s still possible that two particular branches will remain permanently separate. This is exactly what happens when a quantum event leaves a permanent physical record. It’s unlikely that the two states where the record is different will lead to same final states; that’s exactly what a physical record means. So actual measurements in physicists’ experiments are likely to lead to permanent branch splits (even if the records are erased they are likely to leave some physical imprint). On the other hand, for equilibrium systems any two starting states can go to the same ending state after roughly the ergodic mixing time, so permanently separating branches is not feasible there.

That is, statistically there will often be macroscopic differences that merge together at a later time, but many of the really significant macroscopic differences do split the wavefunction to disjoint branches that don’t rejoin. It’s hard to tell the exact line between these things and it’s impossible to test experimentally.

Anthropics Explained (by Rejecting the Copernican Principle)

Many paradoxes in philosophy, like the Simulation Argument, the Boltzmann brain argument, and the Doomsday argument, rest on what’s called the Copernican Principle. This is the idea that among all of the observers that exist in the universe you are equally likely to be any one of them. There is little reason to think this is true, and these paradoxes give us a strong reason to reject the Copernican principle. Instead, the question of what observer you are should be approached like any other scientific question: be open to many theories and judge them by their parsimony and their fit with your observational data.

This is easiest to understand in the “anthropic” puzzles in eternal inflation theories. In these theories the process of inflation creates an infinite number of “bubble universes” which may have different effective laws of physics (although the rules for how these different effective laws of physics come about, as well as what happens in high-energy experiments with these bubble universes, should be determined by shared universal laws of physics). Already the mere possibility of there being an infinite number of observers kills the Copernican principle as a necessary truth, as when this occurs it is impossible to weigh each observer with an equal probability. This is a practical as well as a theoretical problem. There are many proposals for how to sample bubble universes within the full universe/multiverse, and they give different predictions for what a “typical” bubble universe is like. I think such sampling procedures for where our observable universe fits in the full universe should be considered to be a component of any specific eternal inflation theory. This means that if a single inflation mechanism has multiple substantially different ways a universe like ours might be picked out these different ways should be considered different theories. It also means that a simple inflation mechanism which requires a complex sampling procedure to describe how our currently observed universe arises should be penalized as more complex.

Your knowledge of where you are in the universe is something philosophers of science call an auxillary hypothesis. No scientific theory makes predictions entirely on its own, but it must be augmented with auxillary hypotheses that relate the theory to things we can directly observe. For example, if you test Maxwell’s equation by measuring the force between a current through a wire and magnet, the test depends on the auxillary hypothesis that your battery generates a voltage, your wire conducts electricity, and your magnet generates a permanent magnetic field. An unexpected result can be explained by changing either the theory or the auxillary hypotheses. The famous example from astronomy: When Uranus’s orbit did not match predictions this lead to the discovery of Neptune whose gravity was affecting Uranus, so Newtonian gravity still worked but required a new auxillary hypothesis. When Mercury’s orbit diverged from predictions a planet Vulcan was theorized to affect it, but Mercury’s orbit was eventually explained by general relativity, a new theory. People usually don’t explictly include auxillary hypotheses when assessing the elegance or complexity of a theory, but it is necessary to do so when comparing explanations that involve changing the auxillary hypotheses. Perhaps some fine-tuned invisible Vulcan is what’s really affecting Mercury, but general relativity is the simpler theory.

Every theory about cosmology or the nature of reality implicitly includes our place in the universe as an auxillary assumption. After all, it is tested through observations we make. You expect to see a galaxy only when you’re actually looking through a telescope. In particular, if you are actually in a VR or simulated environment then what you see through a virtual telescope won’t be determined by cosmology but by the will of the creator of your environment. That means that when we consider the hypothesis that everything we’ve ever experienced is simulated, then we have absolutely no direct evidence on the cosmology and laws of physics of the underlying “real” universe. These theories must be re-examined from scratch, and for the underlying universe to have the same laws and cosmology is possible but highly speculative. That is not how the simulation argument is usually presented. Instead, it’s thought that simply by taking our current cosmological model seriously a false egalitarianism forces us to take far more seriously our cosmological model existing in a simulation which in turn exists in the exact same cosmological model, in spite of the latter model being obviously more contrived. That’s because the simulation argument treats the auxillary hypothesis of self-location through a dogma rather than as an aspect of a scientific model.

I only covered two examples in depth, but I hope you get the idea. Again, every paradox that works using the Copernican principle can inverted to produce an argument against the Copernican principle:

  • It looks like the future population will be a lot larger than the current population, so there will be far more observers in the distant future than all observers that ever lived up to the present day. Therefore living in the present, we are not typical observers.
  • It looks like our descendants will make many simulations of us that are indistinguishable to an observer from the real thing. These simulated observers will vastly outnumber us, but unlike us, they won’t be able to make simulations in turn. Therefore we are not typical observers.
  • If the universe ends in heat death, then afterwards thermal or quantum fluctuations can create conscious observers. This is incredibly rare, but if the heat death is eternal such observers will eventually outnumber those that live in a non-equilibrium universe. Most such observers only experience the bare minimum of conscious experience that counts as such, rather than the rich lives we live. Therefore we are not typical observers.
  • In an infinite inflationary universe the very notion of a “typical observer” doesn’t make sense. Therefore we aren’t typical observers.

Strictly speaking, nothing here produces an actual contradiction, but it’s all evidence. The Copernican principle forces us to take many contrived positions about the past and future of our universe, when we could just not do that. Reject the Copernican principle!

Relativistic Colonization

The main inspiration for my ideas here has been Stuart Armstrong and Anders Sandberg’s paper “Eternity in six hours: Intergalactic spreading of intelligent life and sharpening the Fermi paradox”. It’s main point is to argue that intergalactic colonization of an intelligent civilization is highly feasible in a cosmic timescale, and to discuss the implications of this on the Fermi paradox. In doing so, it also discusses a particular method intergalactic colonization can occur: it argues a single starting solar system has enough materials and energy to directly send a probe to every reachable galaxy, and in turn each probe can self-replicates and sends probes to every star in that galaxy. While thinking through this scenario, I decided that there’s a more efficient and more plausible method that intergalactic colonization can occur. This does not substantially affect Armstrong and Sandbergs’ main points about the Fermi paradox. While re-examining this paper I found it responded to Robin Hanson’s paper “Burning the Cosmic Commons: Evolutionary Strategies for Interstellar Colonization”, which in many respects is closer to my picture of intergalactic colonization, and by all rights should have been an inspiration for me.

Armstrong and Sandberg were careful justifying their assumptions on the technological capability of an intelligent species, and trying to make their results robust to conservative technological assumptions. My approach is more optimistic — roughly speaking, anything that appears to be physically possible I assume to be technologically possible with enough research time. A good follow-up question to my proposal to figure out the exact technological requirements and their feasibility.

In Armstrong and Sandberg’s strategy, a single probe is created at the starting solar system and sent directly to a target galaxy. It spends most of its journey — a vast amount of time — completely inert. This is wasteful. Instead, the probe should spend that time gathering resources from the surrounding space while remaining in motion. It should use those resources to self-replicate while in transit rather than when it reaches a target star and galaxy. That way, it will be able to settle an entire galaxy at one rather than make a time-consuming second hop to colonize the galaxy. Though now there is no reason for a probe’s target to be exactly one galaxy. Instead, single probe now targets a cone-shaped region with the starting solar system as the apex.

Even if this method is more efficient, does that matter? If both strategies are more than capable of colonizing the future lightcone, isn’t it just a matter of which method the civilization chooses, rather than which one is “better”? No it is not, because the second stage for the inert probe really adds a serious delay. Imagine people implemented Armstrong and Sandberg’s proposal first, and launched a probe at 99% lightspeed to every reachable galaxy. Then, it takes ten thousand years until someone successfully launches a self-replicating-in-transit probe at the same speed to a particular galaxy. For comparison, Armstrong and Sandberg’s most pessimistic estimate is that will take eleven thousand years to launch every probe, and a more representative estimate is in the paper’s title, six hours[1]. Then the inert probe arrives to the galaxy twenty thousand years earlier, and has time to create and send secondary probes to a twenty thousand light year radius at best. Meanwhile, the self-replicating-in-transit arrives at the entire galaxy at once. If the galaxy is as large as the Milky Way, one hundred thousand light years across, then the active probe gets to most of the galaxy first. The fifty thousand years it takes the inert probe to colonize the rest of the galaxy is tiny from a cosmological perspective, which is why it was ignored in Armstrong and Sandberg, but huge from a human/historical perspective. Since we are comparing two approaches that can both be initiated by the same technological civilization as part of its historical development, it is the historical timescales that are relevant for comparing which approach wins over.

An active probe in transit won’t just gather resources and self-replicate, it can host an entire society. It can host intelligent entities, whether AIs or real or simulated humans, that are thinking and planning for their future. In particular, they can make scientific and technological advantages that will make the probe work better, either decrease its mass or energy requirements or increase its speed. This gives another reason to expect active-in-transit probes to be more successful than inert probes: If in situ research can lead to an early-launched probe accelerating from 99% lightspeed to 99.9% lightspeed then that speed difference will really add up over millions or billions of light years, beating inert probes launched earlier. It will also beat probes launched later at 99.9% lightspeed from the starting solar system due to its head start.

To make reasonable guesses on the behavior of these intelligent entities and their society, we should think about their incentives. Of all these entities, all in motion, whichever one moves fast would have first access to galaxies, stars, interstellar gas, and other natural resources. These resources have energy in a low entropy form, and both the energy and the low entropy are useful for performing computations and self-replicating. However, these resources are (relatively) stationary, and usable energy from the perspective of our fast-moving society must also have a lot of momentum. So the matter in this space must be accelerated to the speed of our society to be usable, with the remaining matter accelerated to an equal and opposite momentum. This opposing momentum will make it more difficult for any later entity from making use of these resources, especially if it’s also trying to move very fast. Moreover, due to transformation law for energy-momentum, the faster you go the more energy in stationary form is necessary to obtain the same amount of usable energy from your mobile perspective. So the faster the first movers are going, the more they’ll need to propel naturally-occurring matter in the opposite direction and make it difficult to use. So there’s a huge first-mover advantage.

This is absolutely terrible.

Really. Very. Bad.

It’s bad because it’s saying a large proportion of the resources in the future lightcone, perhaps most of them, will be burnt for nothing. Specifically, burnt as rocket fuel to make a huge number of rockets which, due to time dilation, only exist for long enough for their inhabitants to figure out how to make the rocket go so fast. I’m not sure what will be left behind after such a process, whether there will be an absolute vacuum or a maximum-entropy heat bath or whether there will still be some sort of usable energy after this entire process. Either way, it will be a terrible loss. This is what I believe will happen if intergalactic colonization is guided by the incentives of individual colonizers.

[1] Both estimates from Table 5 of the paper.

Is There Slack in Evolution?

In a recent blog post, Studies on Slack, Scott Alexander discusses how slack (particularly the rationalist conception of it) interacts with competition in various situations, and models it in terms of two-layered competitive systems. Many of his example use evolution as the competitive system, and his very first example claims that in certain circumstances a useful adaption is more likely to occur if there is less evolutionary pressure:

Imagine a distant planet full of eyeless animals. Evolving eyes is hard: they need to evolve Eye Part 1, then Eye Part 2, then Eye Part 3, in that order. Each of these requires a separate series of rare mutations. Here on Earth, scientists believe each of these mutations must have had its own benefits – in the land of the blind, the man with only Eye Part 1 is king. But on this hypothetical alien planet, there is no such luck. You need all three Eye Parts or they’re useless. Worse, each Eye Part is metabolically costly; the animal needs to eat 1% more food per Eye Part it has. An animal with a full eye would be much more fit than anything else around, but an animal with only one or two Eye Parts will be at a small disadvantage. So these animals will only evolve eyes in conditions of relatively weak evolutionary pressure. In a world of intense and perfect competition, where the fittest animal always survives to reproduce and the least fit always dies, the animal with Eye Part 1 will always die – it’s less fit than its fully-eyeless peers. The weaker the competition, and the more randomness dominates over survival-of-the-fittest, the more likely an animal with Eye Part 1 can survive and reproduce long enough to eventually produce a descendant with Eye Part 2, and so on. There are lots of ways to decrease evolutionary pressure. Maybe natural disasters often decimate the population, dozens of generations are spend recolonizing empty land, and during this period there’s more than enough for everyone and nobody has to compete. Maybe there are frequent whalefalls, and any animal nearby has hit the evolutionary jackpot and will have thousands of descendants. Maybe the population is isolated in little islands and mountain valleys, and one gene or another can reach fixation in a population totally by chance. It doesn’t matter exactly how it happens, it matters that evolutionary pressure is low.

Is this situation realistic?

The first part that struck me is that Scott presupposed that each of these mutations occurs separately. This would make sense in ordinary evolution, where each mutation provides a benefit and reaches fixation. If each mutation is harmful, why bother going through normal competition that the organism is going to lose, and why not just have all three Eye Parts arising by chance at once? It may seem absurd that an organism would spontaneously form a fully-fledged eye, but actually all that indicates is that the entire pathway envisioned here is absurd: As creationists are so eager to point out, something as complex as a compound eye cannot just arise by chance. It requires evolution, which requires intermediate stages that are beneficial to the organism, exactly what has been positted not to occur. You can imagine we’re not actually talking about evolving eyes, but rather a much simpler adaption still goes against the flow of the fitness landscape. If you do it becomes easier to imagine that the entire adaption occurs as a single mutation at once.

But actually, Scott is right: An adaption like this would develop in stages rather than all at once. Let’s calculate this with some made-up numbers. I’ll suppose that each one of these Eye Parts has a 10-5 probability of occurring by chance. Multiply that for all three parts and you get a probably of 10-15 of all three mutations occurring at once. A mutation like that is only likely to occur in a species once 1015 organisms of that species have been born.

Let me emphasize the last sentence: The chance a mutation will occur in a population depends on the number of individuals born in that population, or in other words the total number of individual that have ever existed in the population. That is the product of the population size and the number of generations the population existed. In particular, larger populations acquire beneficial mutations faster. This will be important later.

Now, let’s compare this to developing eyes in stages. Suppose Eye Part 1 only reduces fitness by 1%. Then on average, an individual with Eye Part 1 has 1% fewer descendants, so on average 0.99 of its descendant will also have Eye Part 1. Adding up the geometric series, if an individual acquired Eye Part 1 through a spontaneous mutation it is likely to have around 100 descendants with Eye Part 1. The chance that one of them will acquire Eye Part 2 is 10^2 \cdot 10^{-5} = 10^{-3}. If an individual with both Eye Parts 1 and 2 loses another 1% in fitness than it will have 50 descendants with both Eye Parts 1 and 2, assuming perfect genetic linkage. Then there’s a 50 \cdot 10^{-5} = 5 \times 10^{-4} chance that one of these descendants will acquire Eye Part 3. Multiplying these together, including the probability for the initial Eye Part 1 mutation, there is a 10^{-5} \cdot 10^{-3} \cdot (5 \times 10^{-4}) = 5 \times 10^{-12} chance of eyes evolving per individual of eyes evolving. This is still very small, but significantly larger than the 10-15 odds of all mutations happening at once.

In contrast, imagine if each mutation is beneficial and immediately reaches fixation. It takes 105 individuals until one acquires Eye Part 1, and immediately everyone has it. Afterwards it takes 105 individuals each to evolve Eye Parts 2 and 3, a total of 3×105. Since it takes time for traits to reach fixation and even for beneficial traits this doesn’t always occur, it is expected to take a larger number of individuals until eyes evolve, but this is still much less than the numbers required in the above two scenarios where eye parts are maladaptive.

Okay, this is the model for evolving eyes in ordinary circumstances. But what if you add slack? Let’s look at Scott’s first scenario: A disaster wipes out most of the population and leaves behind a resource-rich environment for the survivors, so even individuals with below-average fitness can reproduce above replacement and not have their lineage die out. First of all, I want to question the notion that there is less “evolutionary pressure” here, whatever that means. Less fit individuals can propagate their genes when they wouldn’t have otherwise, but fitter individuals still propagate their genes even more. If by chance a high proportion of the population right after the disaster had some fitness-lowering gene, then by the time the population rebounded the gene would be much less frequent, because the fitter individuals without the gene will repopulate faster than the less fit individuals with the gene. So it’s a matter of perspective: Are we looking at the absolute number of offspring and descendants an individual will have, or are we interested in the proportion of a trait relative to the whole population? If the former then the population boon meaningfully reduces the evolutionary pressure, if the latter than it doesn’t affect it at all.

But we’re not asking a broad qualitative question that could depend on the perspective we take, we’re asking a concrete question: Will this disaster-and-repopulation make evolving eyes more likely? Let’s think. Normally an individual with Eye Part 1 has around 100 descendants like it each which can develop Eye Part 2. During the time of plenty it can have many more than that. Sounds good?

Not so much, once you compare this with what every other individual is doing. If this individual has 100 relatedness-weighted offspring, so does every other individual during time of plenty. If there is sexual reproduction this can be tricky to think about: The first individual with Eye Part 1 has more than 100 descendant, but only some retain Eye Part 1. Other individuals also typically have more than 100 descendants. However, each descendant has many ancestors, so it’s hard to count how many descendants there should be. It’s easier to count genes: Each descendant gene comes from exactly one ancestral gene. If the spontaneously formed Eye Part 1 gene spreads to 100 descendants, then in that time every other gene should also spread to 100 descendants, or a bit more since Eye Part 1 gene reduces fitness. Since we need individuals to store all of these genes — what else are individuals good for? — the population size increases by roughly one hundred.

Now remember what I said earlier about the effect of population size on evolution? If the population increased by a factor of one hundred, that means that at the start — right after the disaster occurred — the population was one hundred times less than the ordinary stable population. That means that the mutation for Eye Part 1 was one hundred times less likely to occur in the first place! This nullifies any advantage that seems to have been gained by the plentiful conditions increasing the chance Eye Part 2 develops. Overall, a disaster that decreases population does not increase the speed the eye evolves.

Lightspeed delays lead to multiple technological singularities

By Yudkowsky’s classification, I’m assuming the Accelerating Change Singularity: As technology gets better, the characteristic timescale at which technological progress is made becomes shorter, so that the time until this reaches physical limits is short from the perspective of our timescale. At a short enough timescale the lightspeed limit becomes important: When information cannot traverse the diameter of civilization in the time until singularity further progress must be made independently in different regions. The subjective time from then may still be large, and without communication the different regions can develop different interest and, after their singularities, compete. As the characteristic timescale becomes shorter the independent regions split further.

You Must be a Coward to be Afraid of Losing Your Mind

I recently read two authors both make a similar kind of argument, which I disagree with.

Jacob Falkovich responds to a comment on his blog, about the insights gained from taking drugs:

Do you believe your own epistemology to be so fragile, your disbelief in woo so contingent, that they will crumble in the face of a few hours of altered consciousness?

A recent essay of Paul Graham, about having children:

On the other hand, what kind of wimpy ambition do you have if it won’t survive having kids? Do you have so little to spare?

Both excerpts are accusing the reader of a lack of confidence in her principles if she is afraid some transformative experience will change them: If are afraid drugs will make you believe in woo, you must not have much confidence in your current disbelief in woo. If you are afraid parenting will make you lose your ambition, you must not be very ambitious.

I don’t like the way these two excerpts shame being afraid of an external factor changing the way we think. My minds are fragile and I shouldn’t be afraid to admit it. A bullet can shut it down completely, and so many things can affect it in subtler ways I don’t understand.

For comparison, I don’t think either author would argue this way about more frequent and short-term events that change our motivation and reasoning. If someone worries that he’s unmotivated when he works at home, you don’t tell him that it’s not a problem if he cares enough about working, you tell him he shows he cares about working by setting up the environment where he can be motivated.