Derivation under the cut

# Author Archives: itaibn

# Hypothetical

Suppose an alien came down to earth, and described an operation it can perform on you. This operation will make a lot stronger and tougher physically. It will also make you a lot smarter, more conscientious, and can even make you a more moral person. Inexplicably, if you get this operation people will take you more seriously and are more likely to believe anything you say. And all of these effects are huge.

There’s one catch: Anyone who undergoes this operation becomes completely obsessed with cleaning pottery jugs. Seriously, it doesn’t matter how boring you consider this now, it’ll be what you spend all of your time on. If you have anything else you care about you’ll still remember that, but it’ll take second place to cleaning pottery jugs.

Now, you’re expecting the way this hypothetical will continue, you’ll be asked what you think about this operation, and whether you want the alien to perform it on you or not. Yeah right! As if the alien gives a fuck what you think! In fact it has already done this operation. Lucky it takes a while for it to take effect, so you still have some time to be your old self for a few years. Good luck!

# A Theory Concerning the Foundation of the United States

*[Epistemic status: Crackpot]*

This post is a about some observations I’ve been developing for quite a while. I was inspired to post about it by a recent chapter of the new web serial *Unsong*, which touches upon similar themes. Some details have only been worked out now when I’m writing this up.

The founding legend of Rome, as recounted by Virgil’s Aeneid, was that the Romans were descendants of Trojans. Troy itself a great empire, when it was sacked by the Greeks the Trojan Aeneas was ordered by the gods to found a new Troy in the province of Italy. So he wandered the sea for many years until he finally reaches Rome. Now, there’s a gap of a few hundred years between the accepted dates of the fall of Troy and the founding of Rome; Aeneas did not found Rome right away. Rather, he founded the city of Alba Longa, which his descendants ruled until Romulus and Remus, who founded Rome.

It is certainly not a new idea to compare the United States with Rome. Both are powerful empires that center their self-identity with their democratic institutions. However, I believe the connection between the two is much deeper than that…

People commonly place the fall of the Roman Empire during the fifth century CE. This is not quite right. See, the Western Roman Empire fell in 476, but the Eastern Empire, now known as the Byzantine Empire, persisted a long time after that. In fact, the Eastern Roman Empire only fell in the year 1453.

In 1492, merely 39 years later, Christopher Columbus discovered the continent of America. Most historians believe that he was from the Republic of Genoa, which like Rome is a republic in Italy. However, an even stronger connection can potentially be made. Some people believe that Christopher Columbus was of a Byzantine origin, and may even have been related to Byzantine nobility. This is especially significant if it is possible to trace a lineage from Aeneas himself.

Before Christopher Columbus, Leif Erikson independently discovered and explored North America, and the Norse eventually named the region Vinland, due to the grapevines that grew there. The name Oenotria appears in some ancient sources, including three times in the Aeneid, to refer to southern Italy. The name comes from Greek οἶνος “wine”, since the area was rich in vineyards.

The parallel between Christopher Columbus’s journey and the journeys of Aeneas and Odysseus is obvious. Notice, too, that like how Aeneas does not immediately found Rome, but rather founds Alba Longa which centuries later produces Rome, so too Christopher Columbus is only responsible for exploring the continent wherein centuries later George Washington founds the republican empire.

From the strength of these analogies I can only conclude one thing: That the same events have occured twice, and Columbus was divinely inspired to explore America and found a new Rome like Aeneas was commanded to found a new Troy.

Although originally I only thought to make Graeco-Roman connections, inspired by *Unsong* it’s worth looking a bit into the Judaeo-Christian relationships. The obvious analogy is Moses, who wandered the desert fourty years seeking to found the new nation of Israel. One interesting contrast is that although both Odysseus and Aeneas reach their intended destination at the end, Christopher Columbus sought to reach India but never arrived there, like how Moses never set foot in the land of Israel. A different interpretation for the fact that Christopher Columbus failed to reach India is that although India was where he *desired* to reach, the divinely-fated *target* for his journey was America, like how Aeneas *wanted* to stay in Carthage with Dido but was *fated* to found a nation in Italy.

# An Idea for Improving Hashlife

*(This was written with the priority of making sure my thoughts don’t just stay in my head forever over explaining anything *well*. Except some parts to be cryptic or badly phrased.)*

Hashlife is currently the best algorithm for simulating large structured patterns in Conway’s game of life and other cellular automata for long periods of time. It is described here. Basicly, it memoized recursive algorithm for computing the evolution of a block.

Memoization means that whenever the algorithm encounters a block that it has seen before it would instantly be able to use the answer it previously computed. This is what gives the algorithm its power. On the other hand, the algorithm can only detect this if the two configurations are aligned exactly the same on the blocks which it divides the grid into. In other words, it doesn’t take full advantage of translational symmetry, but only takes advantage of it when it’s a translation by a multiple of the block size[0]. Due to the way Hashlife calculates many time-steps of a pattern at once there is a similar alignment problem in time.

For example, the Caterpillar is a humongous spaceship that move forward 17 tiles every 45 steps. It has a lot of repeating components, but they all move in this speed, so they are rarely in the same alignment. Here Hashlife runs really slowly.

So I’ve been thinking about how to make a better version of Hashlife which doesn’t have these constraints. Then the problem is to recognize a pattern if it was seen previously with a different alignment. The first idea I eventually came up with is to use what I call a *translation-invariant hash*. If you take this hash on two blocks of tiles that almost completely overlap, this function should return the same or a similar value. Clearly this is not a good hash function in the conventional point of view, but it is very useful here: If you make a hash table based on a translation-invariant hash, then a lookup for a block B could also return a block B’ which contains a translation of B. This means you can find that the same pattern was calculated already even if it is out of alignment.

Here is a simple example of a translation-invariant hash: Let H be an ordinary hash function on 8×8 blocks. For some large block B, one can define to be the sum of for every 8×8 X that is contained in B. Then a translated block will only differ in terms of the hashes on the boundary, which on a large block will be a minority. By truncating the last digits of this you get a hash that’s completely identical for most small translations.

Now, one problem that can come up is: now that we found two blocks that are approximately translates, how do tell by how much one is a translate of the other? In this case there is an easy method. Alongside the function , one can also calculate two other functions , , such that (respectively, ) is the sum of (resp. ) where is an 8×8 block contained in whose northwest corner has coordinates (Here 8 is an arbitrarily chosen number, in this case because it’s a small power of 2). Then if and satisfy and they really are close translates, the position of relative to would be approximately

then the data structure for a block B will store along with these “integral hashes” , .

I will not discuss how to take advantage of the overlapping blocks found this way to speed up the computation.of the cellular automaton.

This in itself may already be an improvement (I haven’t written any code so I can’t benchmark this), but has some weaknesses. The problem is that it is way too loose. It produces a collision for two overlapping blocks, but it also produces a hash collision in loads of other situations. For instance, it produces an almost identical value for the empty block and an almost empty block except for a small object. These are closer to each other than most of the combinations of overlapping blocks, which are the things what are supposed to collide. Worse, if there are two small objects on an otherwise empty block which are far away from each other, then returns an exactly identical hash. If you want any algorithm based on this hash function to work, it is necessary to check a block found by the hash table to verify it actually overlaps. This adds to the computation time.

The problem is that the hash function is too local: it only depends on the properties of a random 8×8 region in a block. Perhaps a better idea would be to use larger subregions, for instance, sum the hashes of subregions when the block is . However, this would take too long to compute (asymptotically , around the same it would take to calculate the evolution of the pattern directly for steps). Instead, it would be better to look at the hashes of only some of the subregions, which are determined in a translation-invariant way. Here is my second idea: define an -focal point as follows:

- Every point is 0-focal.
- The
*-hash*of an -focal point is the (ordinary) hash of the rectangle which has that point as its southwest corner. This rectangle is called the*region*associated with that point. - An -focal point is an -focal point whose -hash is greater the -hash of all -focal points up to tiles south and up to tiles east of it.

Then considering only the -hashes of -focal points is translation-invariant and feasible to compute.

However, once we have these -focal points there’s something even better we can do. Remember that the goal of the whole translation-invariant hash was so that we’d be able to recognize a pattern we’ve already encountered even when it’s translated. However, these -focal points and their corresponding hashes do the job even better: The same region will have the same -focal points no matter how it is translated, and no coarse-graining is necessary from any averaging process. So it is a good idea to make a hash table for caching all the regions associated -focal points to recognize translates, and not at all using the original averaging idea with translation-invariant hashes. However, I only came up with that simplification while I was writing this up I decided to still include the original idea. I know that makes this description pretty messy.

All this doesn’t mention time. The original Hashlife also has a feature where it evaluates blocks for generations at once. This causes temporal alignment problems similar to the spatial alignment problems I’ve already discussed. I expect pretty much the same solutions to work here. These are really just general ideas for recognizing patterns in n-dimensional space and should still work when time is added as a coordinate.

[0] Actually, the translation only needs to a multiple of half the block size, due to how Hashlife calculates the areas between the blocks.

# This blog now has a title

**Note:** Woops, made this a “page” rather than an ordinary post. Changed it.

I remember the early days of this blog. A lot of things have changed since then. Back then, it was just me writing things and nobody was reading it. Right now there probably still isn’t anybody reading this (not even you). I guess some things never change.

Indeed, this blog started from nothing and almost quadrupled in size since. And in this great growth story, today is an important landmark. That is because today, **this blog has a title**. It also now has a tagline.

# Gauge Theory

One very important aspect of modern physics is something called gauge theory.

Gauge theory explains how all four fundamental forces — gravity,

electromagnetic, weak, and strong — work. However, it’s hard to find a

nontechnical exposition of the principles of this theory. Here is the best attempt

I’ve seen, although I still find it unsatisfying. Instead, I’m taking this

absence as an opportunity for me to give my own explanation of gauge theory. [1]

Let me start with the strong force, which is the most straightforward example.

This is the force between the quarks which makes them combine into protons and

neutrons. Now, quarks have a property called “color”. A quark can have one of

three colors, which physicists call red, blue, and green. However, these

“colors” have nothing to do with the colors that you see. Each of these colors

behaves the same way, and it’s only possible to tell what color a quark is is by

comparing it with other quarks. A proton or neutron is made out of three quarks,

one of each color, and similarly every particle which can be directly observed

has a symmetrical combination of colors. So although we have names for these

different colors, we can’t tell which is which.

This problem is a lot worse than it seems. You see, it’s actually

*fundamentally impossible* to tell what color a quark is in an absolute

sense. Let me explain: suppose some physicists in MIT decided to set a standard

for quark colors. They start by isolating a quark[2] and making sure that it

never changes color; they declare this to be the standard red quark. Comparing

with this quark, they have a standard for when their own quarks are red. Now,

the physicists in Harvard want to use this standard, so the physicists in MIT

take another quark with the same color as their first quark, and send it to

Harvard. You’d think that the quark in Harvard and the quark in MIT would

have the same color. Now, just to verify this, two days later the physicists in

MIT take another quark with the same color as the one they’re storing and send

it to Harvard. Well, lo and behold, this second quark has a different color

than the first quark. No matter how hard they try to prevent the quarks from

changing color, the colors would still not be consistent.

Here’s the explanation: Color is not actually one universal concept. Every point

in space and time has its own concept of quark color, and MIT September 5th

color is something different from MIT September 7th color, which is different

from Harvard September 7th color. It’s only possible to directly compare quark

colors when they are in the same place and time. There has to be *some*

relationship between these different concepts of color. After all, a quark which

stays still between 7:00PM and 8:00PM must start with a color in the 7:00PM

sense and end up with having color in the 8:00PM sense. Indeed, there’s

something called a connection which links up these

different sorts of color. The connection is the crucial component to this entire

theory. For any journey a particle can take from one place to another, going in

certain speeds in certain places in the middle, the connection describes a way

to gradually transform a color for the beginning of the journey into a color for

the end of the journey. When time passes for a quark and it needs to have

another type of color, it always gets that color according to the way it went

and the connection. So there is a correspondence between these color concepts,

but it depends on the path you take from one to the other. That’s why the quarks

in my imaginary experiment ended up with different colours; they took different

paths.

Now, one of you might ask when are we going to get to strong force. Well,

hypothetical reader, what I described is the strong force. The attraction of

quarks with each other, their formation into protons and neutrons and other

particles, everything about the strong force all comes as an indirect result of

this connection. To give an example, you may have heard of gluons. The

conventional explanation for the strong force is that it is generated by these

gluon particles. What are gluons? Well, the connection only has an interesting

role when multiple paths have inconsistent color changes. This inconsistency is

called *curvature*. A gluon is a small area of space where this

inconsistency is focused. The connection can always be described as a

combination of many gluons, so in a sense, the strong force really is generated

by gluons. A similar thing is true for all the other forces, so photons,

gravitons, and the W and Z particles are all generated by the curvature of

different connections.

The connection gives a theory for how quark color works, but how can it be

force? How can it influence how a quark moves, rather than just what color it

is? To answer that, I’ll move on to the second force I intend to discuss, the

electromagnetic force.

The electromagnetic force is similar to the strong force. Here too, there is a

connection, and here it influences every charged particle. One important thing

is different, and it might seem a bit ridiculous: There’s only one color! So

how can this connection do anything? Well, this whole time I have neglected the

laws of quantum mechanics. However, they are important at everything that

happens in this scale.

Let me summarize the rules for quantum mechanics. You would expect a particle

to always be in certain place at any time, and for a quark to always have one

specific color. Instead, particles are usually in something called a

superposition. You can think of it being in a superposition as saying that the

particle has different chances of being in any particular place. However, the

superposition is more than that: it also gives something called a phase.

My description here of quantum mechanics is rather crude, but it will do for

now. For better description, I reccommend this introduction, or for a

more longer explanation these

series of blog posts or the book *QED: The Strange Theory of Light and
Matter* by Richard Feynmann.

Anyways, the crucial fact is that when a particle can be in many different

positions, it is possible to compare the phases of these different

possibilities. It is also possible to compare the phase between different times.

Now, have you ever heard about how in quantum mechanics, a particle behaves like

a wave? A wave is when the water is rising and falling in a regular pattern. In

quantum mechanics, it is the phase that is changing in a particle which makes it

act like a wave. Now, it turns out that how quickly the phase changes among

different positions determines a particle’s momentum, and how quickly it changes

over time determines it’s energy. Now, back to electromagnetism. You might have

guessed by now what the connection does here: it influences phase. Well, just

like in the case of colors, it doesn’t exactly change anything. More precisely,

for charged particles phase is subjective and has different meanings in

different places, and the connection gives a default for turning one kind of

phase to another. This is just like the strong force.

Now it gets a bit subtle. When a particle is staying still in one spot, it will

minimize energy, and so it will tend to not change its phase from the

perspective of its own path. However, naturally it is not in one exact location,

but in a superposition over a small region. As the phase tries to stay the same

in each spot, the phases will be more and more inconsistent between different

places. This means the particle will accumulate momentum, which will make the

particle move. Put in a different way, the natural movement of phase acts as a

sort of energy, which when it differs in different places generates a force.

So we know now how the connection influences the particle, but where does the

connection come from? How did get to be in whatever state that it is in? Well,

there are internal forces constraining the connection. Remember that the

curvature of the connection measures how inconsistent it is in a small area.

Well, the connection naturally is constrained so that it minimizes how much

curvature it has. So in a vacuum, there is no curvature, and the connection

doesn’t have any inconsistencies, except for small quantum fluctuations.

However, when there is a charged particle, things are different. See, the

particle has its own constraints, and it’s trying to move as smoothly as

possible. How it moves is partially determined by the connection, and so when

the particle is present, the connection naturally gives way. That is where the

curvature of the connection comes from, and why there are forces between

particles.

Now let’s get to gravity. With gravity what the connection changes is very

interesting. First of all, here’s a question: Which way is up? There should be a

fairly obvious answer. And yet, what you think is up, if travel far enough

abroad, will turn out to be sideways are even down, with the real up. It can get

even stranger: your watch is telling you the time is 6:00 AM, the locals insist

that it’s 5:00 PM.

Of course, we all know that the Earth being round, and all of the strange

consequences this implies, but that’s nothing profound. It’s just a strange

convention. Although what you recognized as up is no longer called “up”, it’s

still recognized as a direction. And it’s very easy for people from different

locations to synchronize their watches. It’s still the same directions and

times, just given different names.

At least, that’s what it seems. If you make very careful observations, you’ll

find that direction, just like quark color, are not consistent in different

places. Similarly, there is no absolute time which is consistent everywhere.

These effects are all very subtle, except for one.

So now let’s really get to gravity. Throw something upwards, and afterwards it

will fall. What makes it change direction? Nothing. That’s because it

*isn’t* changing direction. So why does it hit the ground? Because the

ground is moving, upwards and upwards until they collide.

More precisely, the connection also influences what it means to be move and what

it means to stay still. That porcelain cup you threw is changing its path in the

natural way with respect to the connection. The “move up” of one time is the

same as the “stay still” of another which is the same as “move down” still

further in the future. Our notion of staying still comes from comparing things

with the ground, which accelerates upwards compared to the natural flow.

So why is the ground moving upwards? Well, for the small porcelain cup,

following the connection is easy. It’s fairly self-consistent where the cup

moves. But the ground is attached to the Earth, which is very big. And in spite

of the curvature, the inconsistencies, all of the Earth has to move in unison.

And it’s worse, since the curvature is generated by the Earth, and so it won’t

ever move out of the way.

Now, finally, the weak force. Remember the rule with all the other forces: that

a particle has an attribute that has different meanings in in different places,

and so there can’t be a universal standard. Now let’s break these rules.

Back when people first studied the strong nuclear force, they found that it was

very complicated. But in spite of the complexities, they did find one pattern:

that protons and neutrons behave in the same way. The strong force between

protons and protons and between protons and neutrons and between neutrons and

neutrons is the same. Later on physicists discovered the pions. There are three

types of pions, one positively charged, one negatively charged, and one neutral.

They too all behave the same way with respect to the strong force, and all had

approximately the same mass. Later more particles were discovered, and they

always fitted into arrays of similar particles with the same strong interactions

and about the same mass.

The whole thing reminded physicists of something else they’ve already seen,

called spin. Spin is a property that every particle has. For example, electrons

have spin 1/2. This means that each electron can be in one of two states: spin

up or spin down. These are also called spin 1/2 and spin -1/2 (note the

distinction: while the spin of a particular electron may be -1/2, the spin of

electrons in general is always 1/2. Although the same word is used, these are

different concepts). As you can guess from the name, flipping a spin up electron

upside down makes a spin down electron, so spin is related to orientation.

Excuse me while I ignore the question of what happens when you flip an electron

sideways.

Not all particles are spin 1/2. Spin 0 particles are particles which only have

one spin state, spin 0, while spin 1 particles can be in spin state -1, 0, or 1.

In general, a particle has a set of consecutive spin states, more the higher the

spin is. This is similar to the pattern described earlier for similar particles

with different charges. Based on the analogy, this property of particles is

called isospin.

So because of these similarities, it seems as though there’s some sort of

virtual orientation[3] responsible for this isospin. Physicists considered the

possibility that this virtual orientation has a connection to it, making yet

another gauge theory. But to do that, the different virtual orientations must be

indistinguishable. And yet, the proton and neutron, with isospin up and down,

are clearly different. The virtual orientation with which the proton is isospin

up is special, contradicting the principle that no virtual orientation is

special.

What’s happening is something called symmetry breaking. Although fundamentally

all of these virtual orientations are the same, there is a sort of pointer in

each place in the universe which designates one virtual orientation as special.

These pointers are called the Higgs field. Each pointer tends to align with the

nearby pointer. However, this is impossible when the connection for the virtual

is curved. So what happens is that any curvature of the connection is associated

with movement of the Higgs field. This is called mixing. This mixture manifests

as the Z and W particles. In addition, the Higgs field interacts with particles

so that their behavior depends on how their virtual orientation aligns with it.

So there you have it. As I’ve just shown you, every fundamental force is based

on the same idea: Some aspect of a particle is relative in place and time, and

the connection mediates this aspect over different places and times to make it

seem consistent locally. This is the basis of gauge theory. However, for each

force this a different subtlety or tweak is added on. This makes each force

behave in a unique way. But the underlying concept is always the same: gauge

theory.

[1] While I was writing this article in my leisurely pace, Sean Carroll posted

this article which is pretty much the sort of thing I was looking for, and

so my justification for writing this is out of date. Oh well, these sorts of

things happen in a 50 year old field.

[2] Actually, even getting this far is impossible. There is a phenomenon called

quark confinement, which is that quarks never exist on their own, only in

color-neutral groups. I hope you trust me when I tell you that this subtlety

isn’t important, because I’m going to completely ignore it from now on.

[3] I will continue talking about this virtual orientation, but I want to note

that it’s not a technical term. I don’t know if there is a technical term for

it.

# A Solution to problem 20 in Scott Aaronson’s “Projects aplenty”

**Added 2016-06-04:** Although I wrote this at the bottom, to be extra clear I’m writing this at the top: After further investigation, the method I described was found not to work.

Scott Aaronson has a post in his blog, Projects aplenty, where he proposes many research problems for students. One of them is:

20. Do there exist probability distributions , over n-bit strings such that (an equal mixture of two independent samples from and two independent samples from ) is efficiently samplable, even though and themselves are not efficiently samplable? (This is closely-related to a beautiful question posed by Daniel Roy, of whether the de Finetti Theorem has a “polynomial-time analogue.”)

In the comments, I proposed one solution to the problem. Scott Aaronson sent me an email pointing out that the conditions in which the solution work are so strong that the solution is effectively trivial. I have thought about the problem more and I believe I have found a much better solution.

To be specific, what I have is: Given data there are distributions and such that it is possible to efficiently approximately sample given . However, neither nor are efficiently approximately sampleable, even given bits of advice. Moreover, it is easy to generate appropiate , in the sense that there is a polynomial-time probablistic algorithm which given creates an with and satisfies these properties.

Before I present the solution, some definitions: If and are prime with and , then I define . When is an odd number, not necessarily prime, and , then is the Jacobi symbol of mod .

Now, to generate : first, pick primes , , and with approximately bits. Let and be primes with and . Set . Set to be a natural number with , and . Set to be a random th root of unity. Set to be a random of elements of subject to these properties:

- For all , and is a square.
- For any , we have .
- Similarly, for any , we have .
- For any , , where is chosen randomly from the binomial distribution .

Then .

Given and a permutation , define the value . Then for random , samples from a distribution , which constists of random with and and such that where is sampled from .

It is possible to split the distribution , where consists of sampling and post-selecting for squares, and consists of sampling and post-selecting for nonsquares. Then it is possible to sample as follows: Pick a random permutation and take . However, as far as I can tell, it is impossible to sample either or individually.

Note that must be large. Otherwise, there is a nonnegligable probability that , in which case both and can be sampled by taking for appropiate and large random .

One concern with this approach is that is not being sampled exactly. However, I believe what is being sampled has a neglible statistical distance from , at least for large enough . However, I haven’t proven this.

**Added 2013-10-25:** I have now written an implementation of this distribution. Find it here.

**Added 2014-01-01:** Two problems were found in this method. First of all, this method

relies on the fact that and have the same

square-status. However, more is true. If is the

decomposition of into cycles and given any with each equal to either or , then still and have the same

square-status. Then picking a with many cycles and sampling randomly

from is a way to sample from where if

is a square and otherwise.

This alternate process has around possible outcomes while the

process for sampling has possible outcomes.

This suggests a possible resolution of this problem, namely, fine-tuning so that the latter has enough entropy to sample randomly, while the former

does not. To investigate weather this works, I analyzed more carefully how

random is using a Fourier analysis. I discovered a second

problem: My method does not perfectly sample , but

creates a small correlation between the elements of this pair which decrease

polynomially in rather than exponentially in .