This blog now has a title

Note: Woops, made this a “page” rather than an ordinary post. Changed it.

I remember the early days of this blog. A lot of things have changed since then. Back then, it was just me writing things and nobody was reading it. Right now there probably still isn’t anybody reading this (not even you). I guess some things never change.

Indeed, this blog started from nothing and almost quadrupled in size since. And in this great growth story, today is an important landmark. That is because today, this blog has a title. It also now has a tagline.

Gauge Theory

One very important aspect of modern physics is something called gauge theory.
Gauge theory explains how all four fundamental forces — gravity,
electromagnetic, weak, and strong — work. However, it’s hard to find a
nontechnical exposition of the principles of this theory. Here is the best attempt
I’ve seen, although I still find it unsatisfying. Instead, I’m taking this
absence as an opportunity for me to give my own explanation of gauge theory. [1]

Let me start with the strong force, which is the most straightforward example.
This is the force between the quarks which makes them combine into protons and
neutrons. Now, quarks have a property called “color”. A quark can have one of
three colors, which physicists call red, blue, and green. However, these
“colors” have nothing to do with the colors that you see. Each of these colors
behaves the same way, and it’s only possible to tell what color a quark is is by
comparing it with other quarks. A proton or neutron is made out of three quarks,
one of each color, and similarly every particle which can be directly observed
has a symmetrical combination of colors. So although we have names for these
different colors, we can’t tell which is which.

This problem is a lot worse than it seems. You see, it’s actually
fundamentally impossible to tell what color a quark is in an absolute
sense. Let me explain: suppose some physicists in MIT decided to set a standard
for quark colors. They start by isolating a quark[2] and making sure that it
never changes color; they declare this to be the standard red quark. Comparing
with this quark, they have a standard for when their own quarks are red. Now,
the physicists in Harvard want to use this standard, so the physicists in MIT
take another quark with the same color as their first quark, and send it to
Harvard. You’d think that the quark in Harvard and the quark in MIT would
have the same color. Now, just to verify this, two days later the physicists in
MIT take another quark with the same color as the one they’re storing and send
it to Harvard. Well, lo and behold, this second quark has a different color
than the first quark. No matter how hard they try to prevent the quarks from
changing color, the colors would still not be consistent.

Here’s the explanation: Color is not actually one universal concept. Every point
in space and time has its own concept of quark color, and MIT September 5th
color is something different from MIT September 7th color, which is different
from Harvard September 7th color. It’s only possible to directly compare quark
colors when they are in the same place and time. There has to be some
relationship between these different concepts of color. After all, a quark which
stays still between 7:00PM and 8:00PM must start with a color in the 7:00PM
sense and end up with having color in the 8:00PM sense. Indeed, there’s
something called a connection which links up these
different sorts of color. The connection is the crucial component to this entire
theory. For any journey a particle can take from one place to another, going in
certain speeds in certain places in the middle, the connection describes a way
to gradually transform a color for the beginning of the journey into a color for
the end of the journey. When time passes for a quark and it needs to have
another type of color, it always gets that color according to the way it went
and the connection. So there is a correspondence between these color concepts,
but it depends on the path you take from one to the other. That’s why the quarks
in my imaginary experiment ended up with different colours; they took different
paths.

Now, one of you might ask when are we going to get to strong force. Well,
hypothetical reader, what I described is the strong force. The attraction of
quarks with each other, their formation into protons and neutrons and other
particles, everything about the strong force all comes as an indirect result of
this connection. To give an example, you may have heard of gluons. The
conventional explanation for the strong force is that it is generated by these
gluon particles. What are gluons? Well, the connection only has an interesting
role when multiple paths have inconsistent color changes. This inconsistency is
called curvature. A gluon is a small area of space where this
inconsistency is focused. The connection can always be described as a
combination of many gluons, so in a sense, the strong force really is generated
by gluons. A similar thing is true for all the other forces, so photons,
gravitons, and the W and Z particles are all generated by the curvature of
different connections.

The connection gives a theory for how quark color works, but how can it be
force? How can it influence how a quark moves, rather than just what color it
is? To answer that, I’ll move on to the second force I intend to discuss, the
electromagnetic force.

The electromagnetic force is similar to the strong force. Here too, there is a
connection, and here it influences every charged particle. One important thing
is different, and it might seem a bit ridiculous: There’s only one color! So
how can this connection do anything? Well, this whole time I have neglected the
laws of quantum mechanics. However, they are important at everything that
happens in this scale.

Let me summarize the rules for quantum mechanics. You would expect a particle
to always be in certain place at any time, and for a quark to always have one
specific color. Instead, particles are usually in something called a
superposition. You can think of it being in a superposition as saying that the
particle has different chances of being in any particular place. However, the
superposition is more than that: it also gives something called a phase.

My description here of quantum mechanics is rather crude, but it will do for
now. For better description, I reccommend this introduction, or for a
more longer explanation these
series of blog posts or the book QED: The Strange Theory of Light and
Matter
by Richard Feynmann.

Anyways, the crucial fact is that when a particle can be in many different
positions, it is possible to compare the phases of these different
possibilities. It is also possible to compare the phase between different times.
Now, have you ever heard about how in quantum mechanics, a particle behaves like
a wave? A wave is when the water is rising and falling in a regular pattern. In
quantum mechanics, it is the phase that is changing in a particle which makes it
act like a wave. Now, it turns out that how quickly the phase changes among
different positions determines a particle’s momentum, and how quickly it changes
over time determines it’s energy. Now, back to electromagnetism. You might have
guessed by now what the connection does here: it influences phase. Well, just
like in the case of colors, it doesn’t exactly change anything. More precisely,
for charged particles phase is subjective and has different meanings in
different places, and the connection gives a default for turning one kind of
phase to another. This is just like the strong force.

Now it gets a bit subtle. When a particle is staying still in one spot, it will
minimize energy, and so it will tend to not change its phase from the
perspective of its own path. However, naturally it is not in one exact location,
but in a superposition over a small region. As the phase tries to stay the same
in each spot, the phases will be more and more inconsistent between different
places. This means the particle will accumulate momentum, which will make the
particle move. Put in a different way, the natural movement of phase acts as a
sort of energy, which when it differs in different places generates a force.

So we know now how the connection influences the particle, but where does the
connection come from? How did get to be in whatever state that it is in? Well,
there are internal forces constraining the connection. Remember that the
curvature of the connection measures how inconsistent it is in a small area.
Well, the connection naturally is constrained so that it minimizes how much
curvature it has. So in a vacuum, there is no curvature, and the connection
doesn’t have any inconsistencies, except for small quantum fluctuations.
However, when there is a charged particle, things are different. See, the
particle has its own constraints, and it’s trying to move as smoothly as
possible. How it moves is partially determined by the connection, and so when
the particle is present, the connection naturally gives way. That is where the
curvature of the connection comes from, and why there are forces between
particles.

Now let’s get to gravity. With gravity what the connection changes is very
interesting. First of all, here’s a question: Which way is up? There should be a
fairly obvious answer. And yet, what you think is up, if travel far enough
abroad, will turn out to be sideways are even down, with the real up. It can get
even stranger: your watch is telling you the time is 6:00 AM, the locals insist
that it’s 5:00 PM.

Of course, we all know that the Earth being round, and all of the strange
consequences this implies, but that’s nothing profound. It’s just a strange
convention. Although what you recognized as up is no longer called “up”, it’s
still recognized as a direction. And it’s very easy for people from different
locations to synchronize their watches. It’s still the same directions and
times, just given different names.

At least, that’s what it seems. If you make very careful observations, you’ll
find that direction, just like quark color, are not consistent in different
places. Similarly, there is no absolute time which is consistent everywhere.
These effects are all very subtle, except for one.

So now let’s really get to gravity. Throw something upwards, and afterwards it
will fall. What makes it change direction? Nothing. That’s because it
isn’t changing direction. So why does it hit the ground? Because the
ground is moving, upwards and upwards until they collide.

More precisely, the connection also influences what it means to be move and what
it means to stay still. That porcelain cup you threw is changing its path in the
natural way with respect to the connection. The “move up” of one time is the
same as the “stay still” of another which is the same as “move down” still
further in the future. Our notion of staying still comes from comparing things
with the ground, which accelerates upwards compared to the natural flow.

So why is the ground moving upwards? Well, for the small porcelain cup,
following the connection is easy. It’s fairly self-consistent where the cup
moves. But the ground is attached to the Earth, which is very big. And in spite
of the curvature, the inconsistencies, all of the Earth has to move in unison.
And it’s worse, since the curvature is generated by the Earth, and so it won’t
ever move out of the way.

Now, finally, the weak force. Remember the rule with all the other forces: that
a particle has an attribute that has different meanings in in different places,
and so there can’t be a universal standard. Now let’s break these rules.

Back when people first studied the strong nuclear force, they found that it was
very complicated. But in spite of the complexities, they did find one pattern:
that protons and neutrons behave in the same way. The strong force between
protons and protons and between protons and neutrons and between neutrons and
neutrons is the same. Later on physicists discovered the pions. There are three
types of pions, one positively charged, one negatively charged, and one neutral.
They too all behave the same way with respect to the strong force, and all had
approximately the same mass. Later more particles were discovered, and they
always fitted into arrays of similar particles with the same strong interactions
and about the same mass.

The whole thing reminded physicists of something else they’ve already seen,
called spin. Spin is a property that every particle has. For example, electrons
have spin 1/2. This means that each electron can be in one of two states: spin
up or spin down. These are also called spin 1/2 and spin -1/2 (note the
distinction: while the spin of a particular electron may be -1/2, the spin of
electrons in general is always 1/2. Although the same word is used, these are
different concepts). As you can guess from the name, flipping a spin up electron
upside down makes a spin down electron, so spin is related to orientation.
Excuse me while I ignore the question of what happens when you flip an electron
sideways.

Not all particles are spin 1/2. Spin 0 particles are particles which only have
one spin state, spin 0, while spin 1 particles can be in spin state -1, 0, or 1.
In general, a particle has a set of consecutive spin states, more the higher the
spin is. This is similar to the pattern described earlier for similar particles
with different charges. Based on the analogy, this property of particles is
called isospin.

So because of these similarities, it seems as though there’s some sort of
virtual orientation[3] responsible for this isospin. Physicists considered the
possibility that this virtual orientation has a connection to it, making yet
another gauge theory. But to do that, the different virtual orientations must be
indistinguishable. And yet, the proton and neutron, with isospin up and down,
are clearly different. The virtual orientation with which the proton is isospin
up is special, contradicting the principle that no virtual orientation is
special.

What’s happening is something called symmetry breaking. Although fundamentally
all of these virtual orientations are the same, there is a sort of pointer in
each place in the universe which designates one virtual orientation as special.
These pointers are called the Higgs field. Each pointer tends to align with the
nearby pointer. However, this is impossible when the connection for the virtual
is curved. So what happens is that any curvature of the connection is associated
with movement of the Higgs field. This is called mixing. This mixture manifests
as the Z and W particles. In addition, the Higgs field interacts with particles
so that their behavior depends on how their virtual orientation aligns with it.

So there you have it. As I’ve just shown you, every fundamental force is based
on the same idea: Some aspect of a particle is relative in place and time, and
the connection mediates this aspect over different places and times to make it
seem consistent locally. This is the basis of gauge theory. However, for each
force this a different subtlety or tweak is added on. This makes each force
behave in a unique way. But the underlying concept is always the same: gauge
theory.

[1] While I was writing this article in my leisurely pace, Sean Carroll posted

this article
which is pretty much the sort of thing I was looking for, and
so my justification for writing this is out of date. Oh well, these sorts of
things happen in a 50 year old field.

[2] Actually, even getting this far is impossible. There is a phenomenon called
quark confinement, which is that quarks never exist on their own, only in
color-neutral groups. I hope you trust me when I tell you that this subtlety
isn’t important, because I’m going to completely ignore it from now on.

[3] I will continue talking about this virtual orientation, but I want to note
that it’s not a technical term. I don’t know if there is a technical term for
it.

A Solution to problem 20 in Scott Aaronson’s “Projects aplenty”

Scott Aaronson has a post in his blog, Projects aplenty, where he proposes many research problems for students. One of them is:

20. Do there exist probability distributions D_1, D_2 over n-bit strings such that (D_1^2+D_2^2)/2 (an equal mixture of two independent samples from D_1 and two independent samples from D_2) is efficiently samplable, even though D_1 and D_2 themselves are not efficiently samplable?  (This is closely-related to a beautiful question posed by Daniel Roy, of whether the de Finetti Theorem has a “polynomial-time analogue.”)

In the comments, I proposed one solution to the problem. Scott Aaronson sent me an email pointing out that the conditions in which the solution work are so strong that the solution is effectively trivial. I have thought about the problem more and I believe I have found a much better solution.

To be specific, what I have is: Given data A there are distributions D_0(A) and D_1(A) such that it is possible to efficiently approximately sample D_0(A)^2 + D_1(A)^2 given A. However, neither D_0(A) nor D_1(A) are efficiently approximately sampleable, even given O (\log |A|) bits of advice. Moreover, it is easy to generate appropiate A, in the sense that there is a polynomial-time probablistic algorithm which given 1^\lambda creates an A with |A| = \mathrm{poly} (\lambda) and A satisfies these properties.

Before I present the solution, some definitions: If p and P are prime with p | P-1 and n \in \mathbb{Z} / P \mathbb{Z}, then I define \left( \frac {n} {P} \right)_p = n ^{P/p} \mod P. When N is an odd number, not necessarily prime, and n \in \mathbb{Z} / N \mathbb{Z}, then \left( \frac {n} {N} \right) is the Jacobi symbol of n mod N.

Now, to generate A: first, pick primes p, q_0, and q_1 with approximately \lambda bits. Let P and Q be primes with p | P-1 and q_0 q_1 | Q-1. Set N = P Q. Set n to be a natural number with \log(N) \ll n, and m = \lfloor \sqrt{P} \rfloor. Set g \in \mathbb{Z}/ P\mathbb{Z} to be a random pth root of unity. Set (a_{ij}) to be a random n \times n of elements of \mathbb{Z} / n\mathbb{Z} subject to these properties:

  • For all 0 \leq i,j < n, \left( \frac {a_{ij}} {N} \right) = 1 and a_{ij} a_{ji} is a square.
  • For any 0 \leq i_0,i_1,j < n, we have \left( \frac {a_{i_0 j}} {Q} \right) _{q_0} = \left( \frac {a_{i_1 j}} {Q}) \right) _{q_0}.
  • Similarly, for any 0 \leq i,j_0,j_1 < n, we have \left( \frac {a_{i j_0}} {Q} \right) _{q_1} = \left( \frac {a_{i j_1}} {Q}) \right) _{q_1}.
  • For any 0 \leq i,j < n, \left( \frac {a_{ij}} {P} \right)_p = g^i, where i is chosen randomly from the binomial distribution B(m, 1/2).

Then A = (N, n, (a_{ij})).

Given A and a permutation \pi \in S_n, define the value f(\pi) = \prod _{0 \leq i < n} a_{i \pi(i)}. Then for random \pi, f(\pi) samples from a distribution D, which constists of random x \in \mathbb{Z} / n\mathbb{Z} with (x/Q)_{q_0} = \prod_{i<n} (a_{0i} / Q)_{q_0} and (x/Q)_{q_1} = \prod_{j<n} (a_{j0} / Q)_{q_1} and such that (x/P)_p = g^i where i is sampled from B(n m, 1/2).

It is possible to split the distribution D = (D_0 + D_1) /2, where D_0 consists of sampling D and post-selecting for squares, and D_1 consists of sampling D and post-selecting for nonsquares. Then it is possible to sample (D_0^2 + D_1^2)/2 as follows: Pick a random permutation \pi \in S_n and take (f(\pi), f(\pi^{-1})). However, as far as I can tell, it is impossible to sample either D_0 or D_1 individually.

Note that m must be large. Otherwise, there is a nonnegligable probability that \left( \frac {a_{i_0 j_0} a_{i_1 j_1}} {P} \right)_p = \left( \frac {a_{i_0 j_1} a_{i_1 j_0}} {P} \right)_p, in which case both D_0 and D_1 can be sampled by taking f(\pi) (a_{i_0 j_0} a_{i_1 j_1} / a_{i_0 j_1} a_{i_1 j_0}) ^{2 k} for appropiate \pi and large random k.

One concern with this approach is that (D_0^2 + D_1^2)/2 is not being sampled exactly. However, I believe what is being sampled has a neglible statistical distance from (D_0^2 + D_1^2), at least for large enough n. However, I haven’t proven this.

Added 2013-10-25: I have now written an implementation of this distribution. Find it here.

Added 2014-01-01: Two problems were found in this method. First of all, this method
relies on the fact that f (\pi) and f (\pi ^{-1}) have the same
square-status. However, more is true. If \pi = c_0 \dots c_{r-1} is the
decomposition of \pi into cycles and given any \pi' = c'_0 \dots  c'_{r-1} with each c'_i equal to either c_i or c_i  ^{-1}, then still f (\pi) and f (\pi ^{-1}) have the same
square-status. Then picking a \pi with many cycles and sampling randomly
from f (\pi') is a way to sample from D_i where i = 0 if
f (\pi) is a square and i = 1 otherwise.

This alternate process has around C^n possible outcomes while the
process for sampling (D_0^2 + D_1^2)/2 has n! possible outcomes.
This suggests a possible resolution of this problem, namely, fine-tuning n so that the latter has enough entropy to sample randomly, while the former
does not. To investigate weather this works, I analyzed more carefully how
random f (\pi) is using a Fourier analysis. I discovered a second
problem: My method does not perfectly sample (D_0^2 + D_1^2)/2, but
creates a small correlation between the elements of this pair which decrease
polynomially in n rather than exponentially in n.

A Comment on the Prime Number Theorem

(Note: I’m describing here a topic where I can’t personally follow the proofs. Think of this as being a second-hand account)

The prime number theorem states that asymptotically the number of primes less than N is approximately N / \log(N). It is fairly easy to prove that this is true up to a constant, that is, letting \pi(x) denote the number of primes less than x, that

\frac{1}{C} \frac {N} {\log(N)} < \pi(N) < C \frac {N} {\log(N)}

for some constant C. So the interesting thing is the constant factor.

When you think about it, it’s rather strange that the constant factor is exactly one. Consider a similar case: counting the number of twin primes less than a given N. The value is conjectured to be approximately C N / \log(N)^2, where

C = 2 \prod_{p\text{ prime}, p \geq 3} 1 - \frac{1} {(p-1)^2}

This constant C can be generated as follows: We want to determine the probability that q, q+2 are both prime for q < N-2. The probability for q to be prime and for q+2 to be prime are both \frac{1} {\log(N}, so assuming they are independent the probability for both being prime is \frac{1} {\log(N)^2}. However, these probabilities are not independent. For example, assuming q and q+2 are independent would lead to a probability of \frac{1}{4} of them both
being odd, whereas the actual probability is \frac{1}{2}. Taking this into account, the probability estimate should be twice what it was before, up to \frac{2} {\log{N}^2}. Similarly, it is possible to calculate that the probability that a certain prime p \geq 3 divides neither q nor q+2 is different by a factor of 1-1/q^2 than it would under the indepence assumption. Multiplying all these factors out you get the constant C.

You would expect something similar to happen when counting primes: That there should be some constant factor that is some product over all primes, and that by default it would be a fairly random-looking number rather than exactly 1. The naive estimate \pi (N) \approx N \prod_p (1-1/p) = 0 fails. Still, it seems strange that the fact that each prime number is odd, which reduces the frequency of prime by a half, and that they are all 1 or 2 mod 3, which reduces by \frac{2}{3}, and so, all reduce to a constant factor of exactly 1.

The explanation is this: The frequency of the primes is self-adjusting. For instance, in the interval [1, N] there are less primes than there are supposed to be, that would make any number in [N, 2 N] more likely to be prime. Thus facts like how the specific number 2 is prime in the long run make no difference in the density of prime.

Proposed terminological change: Replace “Inductive Type” with “Free Type”

The elimination axiom for natural numbers, surreal numbers, lists, and trees can all be properly called induction. However, I don’t think the expression if b then x else y is an example of recursion, and so I don’t think 2 should be called an inductive type. Similarly, I don’t like how the term inductive type is used for dependent sums, unit and null types, and the various new types of homotopy type theory. What all these types share in common is that they have a set of constructors and an elimination axiom which essentially says that these constructors are the only way to obtain an element of these types. A better terminology would be such a type is the free type over a set of
constructors, and to call such a type a free type.

A Small Insight on Quantum Field Theory

I’ve been thinking about quantum field theory for a while now, trying to
understand it.. Here is a small insight I recently made.

One way of thinking about the configuration space is through the field
perspective. There are observables at every point in space, they obey the
Klein-Gordon equation with nonlinear perturbations. Taking the Fourier
transform, you get at every momentum a set of harmonic oscillators, and the
interactions are couplings between them. Quantizing this, each oscillator gets
an integer spectrum which corresponds to how many particles of a given type
there is in a given momentum. So now we get the particle perspective in momentum
space.

Another perspective, which is usually used in less rigorous descriptions of QFT,
is the particle presentation in position space. For example, this is implicit
in the “cloud of virtual particles” intuition. It seems to work, but nobody uses
it in calculation. Presumably it can described precisely by taking the Fourier
transform of the momentum space particles to get position space particles. Like
the field presentation, it seems to be manifestly local.

My small insight is this: there is a direct way to describe this perspective in
terms of the field perspective. It is this: a field at a given point over time
behaves somewhat like a harmonic oscillator. Take the basis derived from the
fixed energy states of these harmonic oscillators to get position-space particle
presentation.

Some consequences:

  • A massless field doesn’t behave like a harmonic oscillator at a fixed position because there is no spring constant. This explains a claim I heard that position
    space doesn’t work for massless particles. However, extrapolating the creation
    operator as the spring consant goes to zero, it looks like something similar can
    be made to work when thinking of particle count as an index for a
    Taylor-expansion-like decomposition of the wavefunction into polynomials.
  • This position space perspective is not Lorenz-invariant. This partially
    explains the claim I heard that position space also doesn’t work for massive
    particles in the relativistic theory. It also explains why nobody uses this
    perspective seriously.

Why Scientists Don’t Write Poetry

A pillow’s softness, from how it behaves:
To forces it reacts yet it conforms.
The patient repitition of a wave,
Transforming an atom from form to form.
How everything existing had begun,
Matter from matter moving fast away.
How everything is hidden in the sum
Of amplitudes of all the different ways.
Yet all the things I do ’till now express
Are mere fragments of a deeper core.
How would I love to faithfully address
The full wonders with all of their galore.
Yet when I try those feelings to compose
My poetry is hightened into prose.