Special Relativity Notes

Table of Contents

Preface

These are my notes on special relativity. They have been taken from various sources. References are cited in the section to which they are pertinent.

As on other of my pages, clicking on the link in the table of contents will bring the reader to the indicated section. Links associated with the title of individual sections will bring the reader back to the table of contents.

Additional explanatory information can be viewed by clicking on buttons, often labeled “here.” When clicked, a dropdown box with the information will appear. To hide the information again, click on the button that opened the box.

I. Introduction

In the seventeenth century, Galileo came up with a thought experiment that has come to be known as Galileo’s ship. Paraphrased, his idea goes something like this:

Lock yourself up below deck in a docked ship with no contact with the outside world. Allow water to drip from a faucet into a jar. Each drop falls straight down. Throw a ball across the width of the ship to a friend. It goes straight across the room. Now let the ship set sail. The water drops still fall straight down and the ball still goes across to your friend. In short, you can’t tell if the ship is moving or not.

From this, Galileo deduced that the laws of physics should be the same in any frame of reference that moves with a constant velocity (including velocity equal to zero, i.e. at rest). Such a constant velocity reference frame became known as an inertial frame of reference. When applied to Newtonian mechanics, the logical extensions of this idea became known as Galilean relativity.

Figure 1.1

We’re all familiar with the concepts of Galilean relativity from everyday experience. It works like this: Imagine car A driving down the road at a speed of 30 mph rightward relative to Observer O on the side of the road. A car B is traveling along side car A at a speed of 50 mph rightward relative to Observer A (figure 1.1a). To the driver of car A, car B will appear to be moving at a speed of 50 mph – 20 mph = 20 mph past him.

If car A is driving at 30 mph rightward relative to Observer O and car B is driving leftward at 50 mph relative to O (figure 1.1b), then the driver of car A will see the speed of car B as -50 mph – 30 mph = -80 mph (i.e., 80 mph in a direction opposite him.

Figure 1.2

Furthermore, suppose a man is pulling a wagon at 2 mph on a sidewalk rightward with with respect to Observer O on the grass next to the sidewalk. A girl in the wagon throws a ball in the direction of motion at 5 mph (figure 1.2a). Observer O will see the ball traveling rightward at 2 mph + 5 mph = 7 mph.

Finally (figure 1.2b), consider the same setup as the last example, except this time, the girl throws the ball backward at 1 mph in her frame of reference (leftward according to O). Observer O will see the ball as traveling rightward at 1 mph.

This is all straightforward enough.

Now, fast-forward to the early twentieth century. By that time, Maxwell had developed a theory of electromagnetism that was regarded to be on the same solid footing as Newtonian mechanics and Newtonian gravity. At the turn of the century, a then unknown physicist working as a patent clerk in Switzerland, Albert Einstein, began to contemplate these issues. He recognized that problems arise when one attempts to apply Galilean relativity to electromagnetism.

Diagram showing problem posed by EM for Galilean relativity
Figure 1.3

Although Einstein didn’t mention the specific problem I’m about to describe in his original paper, figure 1.3 depicts the type of difficulty that Einstein identified.

In figure 1.3a, we have reference frame S in which an observer (call her Observer S) sees a wire with fixed protons and mobile electrons, moving to the right at velocity, \vec{v}, creating a current, which by convention, moves to the left (in the direction of positive charge). We fire in a test electron which our observer describes as moving to the right at velocity  \vec{v}, the same velocity as the electrons in the wire.

In reference frame S´, our observer (call him Observer S´) takes the viewpoint of the test electron, which Observer S would say is traveling to the right at the same velocity as the mobile electrons in the wire. Observer S´ considers himself to be at rest and the electrons in the wire to be at rest. However, he sees the protons in the wire, which observer S considers stationary, as moving with velocity v to the left.

So what does the test electron experience from the viewpoint of Observer S. Well, the positive charges from protons in the wire are seen as balancing the electrical charge associated with the wire’s electrons. The wire, then, is seen as having zero net charge. Therefore, Observer S sees no electric field field. However, the electrons in the wire are moving. Such moving charge creates a magnetic field which gets weaker the further way from the wire one sits.

Right hand rule
Figure 1.4

The direction of the magnetic field follows the right hand rule, depicted in figure 1.4b: point your thumb in the direction of movement of positive charge and curl your fingers. The direction in which your fingers point indicate the direction of the magnetic field. Figure 1.4a is a schematic of the direction of current flow and its associated magnetic field. It follows the convention that current flows in the direction of positive charge*.

*Only one “ring” of isotropic field strength is shown. In reality, there are an infinite number of isotropic rings of differing field strength, the strength of the field decreasing in proportion to the square of the ring’s distance away from the charge source.

In the figure 1.4a, the leftward-pointing arrow indicates the direction of movement of positive charge. In our example, negatively-charged electrons move to the right so, by convention, current flows to the left. The interrupted blue circle indicates the direction of the magnetic field. Again, per convention, the circle with the dot in the middle of it means that the magnetic field is coming out of the screen toward us while the circle with the x in it means that the magnetic field is moving into the screen, away from us.

I won’t derive the formulas from classical electromagnetism I use in this discussion; perhaps I’ll create a page showing these derivations in the future. Instead, for now, I’ll just state them and use them.

To start, it’s known that the magnitude of a magnetic field created by a moving charge is given as:

\vec{B}=\displaystyle\frac{\mu_0}{2\pi } \left( \frac{I}{\vec{r}}\right) \quad \text{eq (1.1)} where

\displaystyle\vec{B} is the magnetic field vector created by the moving charge
\displaystyle\mu_0 is the permeability of free space
\displaystyle I is the electrical current in the wire
\displaystyle \vec{r} is the distance from the wire

We’ll change units to make subsequent math a little easier. Recognizing that

\mu_0 = \displaystyle\frac{1}{\epsilon_0 c^2} \quad \text{eq (1.2)}     where

\displaystyle \epsilon_0 is the permittivity of free space
\displaystyle c is the speed of light

we write:

\vec{B}=\displaystyle\frac{1}{2\pi \epsilon_0 c^2 } \left( \frac{I}{\vec{r}}\right) \quad \text{eq (1.3)}

The force produced on a charged particle (in our case, the test electron) by an electric or magnetic field is given by the Lorentz force law:

\displaystyle \vec{F}=q(\vec{E} + \vec{v}\times \vec{B}) \quad \text{eq (1.4)} where

\vec{F} is the force on the charged particle
\displaystyle q is the electric charge of the test particle
\vec{E} is the electric field affecting the test particle
\vec{v} is the velocity of the test particle
\vec{B} is the magnetic field effecting the test particle
\times means take the cross product between to vectors, in this case \vec{v}   and  \vec{B}, in that order

Because the wire is electrically neutral, there is no electrical force on the test electron. Therefore, in our case, the Lorentz Force Law reduces to:

\vec{F}=q(\vec{E} + \vec{v}\times \vec{B})=q(0 + \vec{v}\times \vec{B}) = q(\vec{v}\times \vec{B}) \quad \text{eq (1.5)}

Let’s take a moment and talk about the cross product. The cross product is an operation which takes in two 3-dimensional vectors and results in a new vector. The magnitude of that new vector is obtained by multiplying the magnitudes of the individual vectors we’re taking the cross product of. Its direction can be obtained by a variety of methods. One is the right hand rule we’ve already discussed. Applied in the cross product operation, the fingers of the right hand are curled from the first vector in the cross product expression (in this case \vec{v}) toward the second vector in the expression (in this case \vec{B}). The direction in which the thumb is pointing is the direction of the resulting vector. As it applies to our magnetic force problem, the vector \vec{v} refers to the velocity of positive charge. In our case, a negatively charged electron is moving to the right. This is considered the same as positive charge moving to the left. The right hand in figure 1.5 is upside down with the palm facing us, indicating that, in our case of a magnetic field acting on a rightward moving test electron, the force on the test electron is downward, toward the wire.

RH rule applied to test electron in magnetic field
Figure 1.5

The outcome of this interaction is that the test electron follows a parabolic path into the wire. The magnitude of the downward component of force on the test electron toward the wire is:

\lvert\vec{F}\rvert =\displaystyle q\lvert\vec{v}\rvert \times \lvert\vec{B}\rvert=\left(  \frac{1}{2\pi \epsilon_0 c^2} \left( \frac{I}{r}\right) \right)qv \quad \text{eq (1.6)}

where the vertical lines on each side of a vector symbol indicates the magnitude of that vector.

Now let’s turn to the reference frame of Observer S´. He is traveling to the right with the test electron which, itself, is traveling with the same rightward velocity, \vec{v}, as the electron in the wire. Again, because the wire is electrically neutral, there is no electric field, so the Lorentz force equation again reduces to:

\lvert\vec{F}\rvert =\displaystyle q\lvert\vec{v}\rvert \times \lvert\vec{B}\rvert=\left(  \frac{1}{2\pi \epsilon_0 c^2} \left( \frac{I}{r}\right) \right)qv \quad \text{eq (1.7)}

But in this case, Observer S´ sees the test electron as being at rest (i.e., v=0). Therefore, according to Observer S´, there is no force on the test electron. (Contrast this with the magnetic force on the test electron Observer S sees.) This troubled Einstein, who believed – as Galileo did – that the laws of physics should be the same in any frame of reference.

[Note that the above derivation is taken from Dr. Martin Smalley, https://www.youtube.com/watch?v=iUBiF2-1Tq4.]

Einstein also realized that manipulation of Maxwell’s equations predicts a constant value for the speed of light in a vacuum (approximately 3\times10^8 meters per second). To see where this comes from, click .

Albert Michelson (the same Michelson who would later say that there was nothing left in physics to discover) subsequently confirmed this prediction by experiment. Furthermore, in 1887 Michelson and Edward Morley, in attempting to prove the existence of a medium through which light traveled called the ether, proved just the opposite: that the speed of light seemed to be the same no matter in what direction it was measured. (For details, click here.)

Given these facts, Einstein postulated that the speed of light was the same in all reference frames.

But there was a problem. If one applies Galilean relativity to objects traveling near the speed of light, things didn’t work out so well. To see this, suppose Observer S is sitting still in outer space, in a frame of reference that we’ll call S. Observer S watches Observer S´, traveling from his left to his right (the + direction in Observer ‘S‘s frame of reference), in a rocket ship, at a constant velocity of 0.5 times the speed of light (= 0.5c, where c = the speed of light = meters per second). Simultaneously, a light beam moves past Observer S, right to left. What we want to know is, at what velocity does Observer S´ say the beam of light is traveling.

According to Galilean relativity,Observer S´ should measure the speed of the light beam as -1.5c (i.e., -c – 0.5c = -1.5c). But this contradicts the idea that the speed of light is constant (equal to c) in all inertial frames. Einstein realized that either Maxwell’s theory was wrong (the speed of light is not constant) or Galilean relativity was wrong. Einstein banked on the latter. The result was his 1905 landmark paper “On the Electrodynamics of Moving Bodies” which outlined his theory of special relativity. You can find a copy of it here:

https://www.fourmilab.ch/etexts/einstein/specrel/specrel.pdf

In it, he begins with two postulates:

  • The laws of physics are the same in all inertial frames of reference (i.e., frames of reference where objects are moving at constant velocity)
  • The speed of light is constant in all inertial frames of reference

Einstein recognized that, if these postulates were true, then Galilean relativity had to be revised. Namely, if time moved more slowly and units of length became shorter in frames of reference moving with respect to an observer, especially as the velocity approached the speed of light, then the invariant nature of the speed of light would be preserved. Such behavior of space and time would also solve the discordance between non-relativistic physics and electromagnetism. To see the counterintuitive specifics of Einstein’s theory, read on.

II. Loss of Simultaneity

Imagine an observer, Observer B, on the platform of a train station (figure 2.1). A train goes by Observer B traveling 30 mph to her right. In the middle of the train is an observer, Observer A. Observer A has two machines that fires tennis balls toward the front and back of the train at 50 mph.

Figure 2.1

Like any observer in their own frame of reference, Observer A thinks he’s at rest. He see the tennis balls hit the left and right sides of the train simultaneously.

What about Observer B? In her frame of reference, the gun firing the tennis ball toward L is moving at 30 mph to the right. The ball is fired at 50 mph to the left so the net velocity of the ball from Observer B’s viewpoint is 20 mph to the left. However, the left wall of the train is moving at the ball at 30 mph so the “gap” between the ball and the wall are closing at 20 mph – (-30 mph) = 50 mph. At the same time, the rightward traveling ball’s velocity, according to Observer B, is 30 mph due to the motion of the train + 50 mph due to the velocity the machine =. 50 mph. However, the righthand wall of the train is receding from the ball at 30 mph. Therefore, the gap between the rightward traveling ball and receding righthand wall of the train is closing at 80 mph – 30 mph = 50 mph, same as for the leftward traveling ball and rightward traveling left wall of the train. Thus, ignoring an inconsequential (at this velocity) correction due to special relativity, Observer B sees the balls hit the right and left walls of the train at the same time.

Now consider what happens if we measure the time that light beams shined rightward and leftward from the middle of a similar train hits the right and left walls (figure 2.2). Again, there are 2 observers, Observer A on the train and Observer B on the train station platform, and again, the train is moving rightward with respect to Observer B at velocity v. And per Einstein’s second postulate of special relativity, the speed of light, c, is the same in all inertial frames of reference.

Figure 2.2

What will Observer A see? Well, Observer A sees himself, the light guns the light beams and the walls as stationary. If the length of the train is \ell, then the time it takes for the leftward-directed light beam to hit the lefthand wall is

\displaystyle T_L = \frac{\ell}{2c} \quad \text{eq (2.1)} \quad \text{where}

T_L is the time it takes for the leftward light beam to hit the left wall
\ell is the length of the train
c is the speed of light

Likewise, the time it takes for the rightward-directed light beam to hit the righthand wall is also

\displaystyle T_R = \frac{\ell}{2c} \quad \text{eq (2.2)} \quad \text{where}

T_R is the time it takes for the rightward beam to hit the right wall.

That is, the light beams hit the left and right walls of the train simultaneously. Now what does Observer B observe?

First off, unlike the tennis balls, in which different observers see different velocities depending on the frame of reference, the speed of light is constant in all inertial frames. Using this fact, we can calculate the time it takes for the light beams to hit the left and right hand walls in Observer B’s frame. First, the left side.

We know that the rate at which the gap between the rightward-traveling lefthand wall and light beam closes is c-(-v)=c+v. Note that I’m not saying that the light beam is traveling at c+v. The light, as always, travels at c. What I’m saying is the “gap” between the left wall and the light beam is decreasing at c+v. Similarly, the gap between the rightward traveling light beam and receding righthand wall is closing at a rate of c-v. Thus,

\displaystyle T_L = \frac{\ell^{\prime}}{2(c+v)}\quad \text{and} \quad \displaystyle T_R = \frac{\ell^{\prime}}{2(c-v)} \quad \text{eq (2.3)}

Even though \ell and \ell^{\prime} may be different due to the relativistic effect of length contraction (discussed below), the effect is the same everywhere. Therefore, it has no effect on the relative values of T_L and T_R. The values of T_L and T_R are still different. Therefore, unlike the arrival of light at the left and right walls of the train in Observer A’s frame of reference – which is simultaneous – these events, in Observer B’s frame of reference, are NOT simultaneous. And if the train were moving in the opposite direction, Observer B would see the light beam hit the righthand wall AFTER it hits the lefthand wall.

In short, the relative timing of events depends on the frame of reference in which it is being observed.

This discussion is patterned after David Morin at https://scholar.harvard.edu/files/david-morin/files/relativity_chap_1.pdf

III. Time Dilation

Figure 3.1

In special relativity, to Observer A, an object (Object B) moving with constant velocity relative to Observer A will perceive a clock moving with Object B as ticking slower than a clock they are holding. This is called time dilatation.

Figure 3.1a shows a clock of sorts in which a beam of light is sent straight upward, hits a mirror L units away, then reflects back, striking a detector at the site from which it originated. The total distance that the light beam travels during this trip is 2L. Observer A standing at rest relative to the light source measures the elapsed time for this round trip as \Delta t. The speed of the light beam is the speed of light, c. Now \text{velocity} = \displaystyle \frac{\Delta \, \text{distance}}{\Delta \, \text{time}}. Therefore, \Delta \, \text{time} = \displaystyle \frac{\Delta \, \text{distance}}{\text{velocity}}.

Applying this to figure 3.1a, we can see that

\displaystyle \Delta \, t = \frac{2L}{c} \quad \text{eq (3.1)}

In figure 3.1b, Observer A remains at point A. The light source at A emits a light beam toward B then starts moving to the right at velocity v. A mirror sitting at point B, a distance L above the plane at which the light source was emitted (and halfway between points A and C), reflects the beam. The moving detector (which is adjacent to the light source) detects the light beam that it previously sent, at point C. Observer A measures the time it takes the detector plate to move from A to C as \Delta t^{\prime}. The distance the detector travels from A to C is given by v \cdot \Delta t^{\prime}. The light travels a distance D from A to the mirror at B then another distance D from the mirror to the detector at C. The total distance for the trip is 2D. Thus, the time that Observer A measures the time the light beam takes on its journey from A to B to C as

\displaystyle \Delta \, t^{\prime} = \frac{2D}{c} \quad \text{eq (3.2)}

Our ultimate task is to relate \Delta \, t and \Delta \, t^{\prime}. To do this, we note that lines connecting A, B and E create a right triangle. We see from figure 1b that

\displaystyle \lvert AE \rvert = \frac12 v \cdot \Delta \, t^{\prime} \quad \text{eq (3.3)}

From the Pythagorean theorem

\displaystyle D^2=(\frac12 v \cdot \Delta \, t^{\prime})^2 +L^2  \quad \text{eq (3.4)} \,\,\,\,\,\, \textbf{which implies that}

\displaystyle D=\sqrt{(\frac12 v \cdot \Delta \, t^{\prime})^2 +L^2}=\sqrt{\frac14 v^2 \cdot \Delta \, t^{\prime}^2} \quad \text{eq (3.5)}

But

\displaystyle \Delta \, t^{\prime} = \frac{2D}{c} \quad \text{eq (3.2)}

Therefore

\displaystyle {\Delta \, t^{\prime} = \frac{2 \sqrt{\frac14 v^2 \cdot \Delta \, t^{\prime}^2}+L^2}{c}       Square both sides

\Delta \, t^{\prime}^2 = \displaystyle  \frac{4 \cdot \left( {\frac14 v^2 \cdot \Delta \, t^{\prime}^2}+L^2 \right) }{c^2} = \frac{v^2 \cdot \Delta \, t^{\prime}^2}{c^2} + \frac{4L^2}{c^2}       Subtract \displaystyle \frac{v^2 \cdot \Delta \, t^{\prime}^2}{c^2} from both sides

\Delta \, t^{\prime}^2 - \displaystyle \frac{v^2 \cdot \Delta \, t^{\prime}^2}{c^2} = \displaystyle \frac{4L^2}{c^2}       Factor out \Delta \, t^{\prime}^2 from the left side

\displaystyle \Delta \, t^{\prime}^2 \left( 1 - \frac{v^2}{c^2} \right) = \frac{4L^2}{c^2}        Divide both sides by \displaystyle 1 - \frac{v^2}{c^2}

\Delta \, t^{\prime}^2 = \frac{\displaystyle \frac{4L^2}{\displaystyle c^2}}{1 - \displaystyle \frac{v^2}{c^2}}        Take the square root of both sides

\Delta \, t^{\prime} = \displaystyle {\displaystyle \frac{\displaystyle \sqrt{\displaystyle \frac{\displaystyle 4L^2}{\displaystyle c^2}}}{\displaystyle \sqrt{1-\displaystyle \frac{\displaystyle v^2}{\displaystyle c^2}}}}

But we know from figure 1a that   \displaystyle \frac{2L}{c} = \Delta \, t,   so

    \begin{align*}\Delta \, t^{\prime} = \frac{\Delta \, t}{\displaystyle \sqrt{1 - \displaystyle \frac{\displaystyle v^2}{\displaystyle c^2} }} \quad \text{eq (3.6)} \end{align*}

In the literature, the term that multiplies \Delta t is referred to as gamma:

\displaystyle \gamma = \displaystyle \frac{1}{\sqrt{1 - \displaystyle \frac{\displaystyle v^2}{\displaystyle c^2} }} \quad \text{eq (3.7)}

Let’s see how this equation can be applied. Let’s examine a dramatic example. Suppose the moving detector is traveling at 75% the speed of light (i.e., v=0.75c). Observer A’s clock ticks 1 second (i.e., \Delta t = 1). Plugging in these values, we get:

    \begin{align*}\Delta \, t^{\prime} & = \frac{1\,\text{sec}}{\sqrt{1 - \frac{0.75c^2}{c^2} }} \\ &= \frac{1\,\text{sec}}{\sqrt{1-.75}} \\  & = \frac{1\,\text{sec}}{\sqrt{0.25}} \\ & = \frac{1\,\text{sec}}{0.5} \\ & = 2 \,\text{sec} \end{align*}

This means that, in the time Observer A’s clock ticks 1 second, Observer A perceives a clock on the moving detector as having ticked 2 seconds. That is, Observer A thinks the clock on the moving detector is ticking slower than their clock (i.e., time has “dilated”).

Note that the time measured in the moving frame of reference (in our case here referred to as \Delta t^{\prime}) is often referred to as coordinate time. On the other hand, the time measured by an observer in the same frame of reference as the clock that’s measuring it is called the proper time. In this case, it’s what we’re calling \Delta t. Proper time, in special relativity, is an invariant quantity. That is, observers in all frames of reference will measure this same quantity. Or stated in more formal terms, proper time is a Lorentz scalar. A scalar is an entity that’s just a number. It doesn’t change no matter what coordinate system one is using. A Lorentz scalar is an entity that doesn’t change under a coordinate transformation called a Lorentz transformation. More about this later.

The discussion presented in this section as well as the basic plan for figure 3.1 were taken from Wikepidia https://en.wikipedia.org/wiki/Time_dilation.

IV. Length Contraction

Figure 4.1

In special relativity, an object in a frame of reference that is moving relative to an observer, Observer A, will be measured by Observer A as being shorter than Observer A measures the object in his or her own frame of reference. This is illustrated in figure 4.1.

In figure 4.1a, we have two observers: Observer B who is inside a train and Observer A who is outside the train. Both are at rest. The length of the train measures L_0 in B’s frame of reference. Because this length is being measured in the frame of reference in which the length exists, L_0 is referred to as a proper length. On the left-hand side of the train is a light source (yellow circle) which doubles as a photon receptor. On the right wall of the train is a mirror (vertically-oriented gray band). Since Observers B is not moving relative to Observer A, A also measures the length of the train as L_0.

In figure 4.1b, Observer B clicks on the light source which sends a light ray from the source to the mirror. The mirror reflects the light back to the source where it is detected. Observer B measures the time it takes for the light beam to complete this round trip as

    \[\Delta t_0 =\displaystyle  \frac{L_0}{c} \quad \text{eq (4.1)} \]

Observer A (not shown in figure 4.1b) is still at rest with respect to B and thus measures the same time for the light’s round trip.

What happens if the train is set in motion, to the right, at velocity v? What does Observer A see? This is depicted in figure 4.1c. In this figure, to avoid clutter, the train is not shown. Only the light source and the mirror are shown (at various times/positions).

The light , whose course is depicted as the rightward brown arrow, will eventually hit the mirror . In her stationary frame of reference, Observer A measures the time at which this occurs as \Delta t_1. However, to hit the mirror, the light must travel the length of the train, L_0 plus the distance the train has traveled over \Delta t_1 which is given v \cdot \Delta t_1. The point at which the light hits the mirror is at the right-hand red dot. Observer A measures the time that the light travels before hitting the mirror as:

    \[ \Delta t_1 = \frac{\displaystyle L + v \Delta t_1}{\displaystyle c}  \quad \text{eq (4.2)} \]

Rearranging, we find:

    \begin{align*} c \Delta t_1 &= L + v \Delta t_1 \\ c \Delta t_1 - v \Delta t_1 &= L \\ \Delta t_1(c - v) &= L \\ \Delta t_1 &= \displaystyle \frac{\displaystyle L}{\displaystyle c-v}  \quad \text{eq (4.3)}  \end{align*}

Note that we’re considering the possibility that the length of the train measured by A with the train moving (L) may not be the same as the length of the train measured by Observer B or Observer A with the train at rest (L_0).

At any rate, the light is reflected and eventually detected at the left-hand side of the train, at the left-hand blue dot. Observer A measures the time it takes for the light to get from the mirror to the detector as \Delta t_2. Notice that during this time, in A’s frame of reference, \Delta t_2, the left-hand side of the train where the detector is has moved a distance v \Delta t_2 to the right. Therefore, per Observer A, the distance the light travels from the mirror to the detector is

    \[ \Delta t_2 = \frac{\displaystyle L - v \cdot \Delta t_2}{\displaystyle c}  \quad \text{eq (4.4)}  \]

Manipulating this equation in a manner similar to what we did before, we ultimately wind up with:

    \[ \Delta t_2 &= \displaystyle \frac{\displaystyle L}{\displaystyle c+v}   \quad \text{eq (4.5)} \]

Observer A measures the total time the light takes to go from the source to the mirror and back to the source as

    \begin{align*} \Delta t &= \frac{L}{c-v} + \displaystyle \frac{\displaystyle L}{\displaystyle c+v} \\    &= \displaystyle \frac{L}{c-v} \cdot \frac{c+v}{c+v} + \frac{L}{c+v} \cdot \frac{c-v}{c-v} \\   &= \displaystyle \frac{Lc+Lv+Lc-Lv}{c^2-v^2} \\  &= \displaystyle \frac{2Lc}{c^2-v^2}  \quad \text{eq (4.6)}  \end{align*}

We can relate \Delta t_0 and \Delta t using the equations we derived in the last section on time dilatation:

    \[ \Delta \, t^{\prime} = \frac{\displaystyle \Delta \, t}{\displaystyle \sqrt{1 - \displaystyle \frac{\displaystyle v^2}{\displaystyle c^2} }}  \quad \text{eq (3.6)}  \]

However, we need to be careful about which time expressions we substitute where. In our current case, proper time – the time measured in train’s frame of reference i.e., that measured by Observer B (and Observer A with the train at rest) – is given as \Delta t_0. This correlates with \Delta t in the equation above. On the other hand, so-called coordinate time – the time measured by Observer A for the moving train – is referred to as \Delta t in our current discussion. This correlates with \Delta \, t^{\prime} in the time dilatation equation.

Given these considerations, then, we can write:

    \[ \Delta \, t = \frac{\displaystyle \Delta \, t_0}{\displaystyle \sqrt{1 - \displaystyle \frac{\displaystyle v^2}{\displaystyle c^2} }}  \quad \text{eq (4.7)}  \]

Plugging in our values for \Delta \, t and \Delta \, t_0, then doing some algebra, we obtain:

    \begin{align*} \frac{2Lc}{c^2-v^2} &= \frac{2L_0}{c\sqrt{1-\frac{v^2}{c^2}}} \\ \frac{Lc}{c^2-v^2} &= \frac{L_0}{\sqrt{c^2}\sqrt{1-\frac{v^2}{c^2}}} \\ \frac{Lc}{c^2-v^2} &= \frac{L_0}{\sqrt{c^2-c^2 \cdot c^2 \frac{v^2}{c^2}}} \\ \frac{Lc}{c^2-v^2} &= \frac{L_0}{\sqrt{c^2-v^2}} \\ Lc &= \sqrt{c^2-v^2}L_0 \\ L^2 c^2 &= (c^2-v^2)L_0^2 \\ L^2 &= \frac{c^2-v^2}{c^2}L_0^2 \\ L^2 &= (1-\frac{v^2}{c^2})L_0^2 \\ L &= \sqrt{1-\frac{v^2}{c^2}}L_0   \quad \text{eq (4.8)} \end{align*}

Since \displaystyle \sqrt{1-\frac{v^2}{c^2} is < 1, this equation indicates that L, the length of the train measured by A in the moving reference frame, is smaller than L_0, the length of the train in the rest frame – the proper length. That is, length contraction has occurred.

The arguments put forth in this section as well as figure 4.1 where patterned after the Faculty of Khan video at https://www.youtube.com/watch?v=DJKDF86Ebnw.

V. Spacetime Diagrams

A useful tool in the study of special relativity is the spacetime diagram. This will prove particularly useful in elucidating the relationship between coordinates in different frames of reference – the so-called Lorentz transformations.

One of the prime tenants of special relativity is that space and time are both parts of one continuum: spacetime. When we plot graphs of position versus time in regular classical mechanics, we think of three spatial dimensions, which we’ll call x, y and z, and a separate time dimension. In special relativity, on the other hand, we consider time on the same footing as spatial dimensions. Indeed, what we plot in a so-called spacetime diagram as time is actually the speed of light, c, multiplied by time, t, to give us ct, the “time” dimension in spacetime. When we do this, the units measured on the “time” axis now has units of length. So, for example, if we’re working in units of light-seconds, then 3.0 x 108 meters/second x 1 second = 3.0 x 108 meters. The distance that light travels in 1 second along the x direction is 3.0 x 108 meters. Thus, 1 unit in the ct dimension is the same as 1 unit in the x direction. We plot light rays on these spacetime diagrams, then, as lines at 45° angles, as shown in figure 5.1.

Figure 5.1

Now let’s consider Observer S floating in space. Like any observer in their own frame of reference, Observer S thinks he’s at rest. Observer is piloting a rocket, in a train of rockets, all traveling toward Observer S at 0.5 times the speed of light (0.5c), all 3 x 108 m/s (meters/second). Like all observers in their own frame of reference, because the rockets are all moving at the same speed, Observer considers herself and the rocket in front of her as stationary. She experiences “movement in time” but no movement in the x direction.

1 second (or 3 x 108 m since we’re working in spacetime units) before she passes Observer S, Observer emits a light ray in the positive x direction. In figure 5.2a, this is depicted by the orange line, a line that makes an angle of 45° with the ct´-axis (and -45° with the -axis) since we’re working in units in which 1 unit on the ct´-axis equals 1 unit on the -axis.

Figure 5.2

After 1 second, Observer , in her own frame of reference, has “moved forward in time” to ct´ = 0, the time at which she and Observer S, pass each other. (Note that, at time ct´ = 0, Observer S sees Observer in her rocket pass him traveling to the right while Observer sees Observer S passing her to the left at 0.5c.

Also at time ct´ = 0, Observer s light ray reflects off a mirror on the back of the rocket in front of her (point a) and reaches her again at ct´ = 1 second (point b).

Figure 5.2b shows the axes of Observer S’s frame of reference in black with the axes of Observer s frame of reference superimposed in cyan. The orange line oriented at 45° represents the light ray she emits. Observer S’s ct´ axis is angled to the right, indicating a component of velocity in the positive x direction relative to Observer S. The angle of Observer S’s ct´ axis relative to Observer S’s ct axis is given by tan 𝛼 = v/c. This makes sense since v represents the x-component of Observer S’s ct´ axis and c its ct component.

But what about the Observer S’s axis? What angle does it make with Observer S’s x-axis? Well, we know from figure 2a that the point where the light ray hits the mirror, point a, is the point on Observer S’s axis x´ = 3.0 x 108 m. In this spacetime diagram, it then makes an angle of 90° with the x´ axis and eventually arrives back at Observer at ct´ = 3.0 x 108 m, at point b. So we’ve now defined 2 points on the x´ axis: x´ = 0, ct´ = 0 and x´ = 3.0 x 108 m, ct´ =0. Two points are all we need to draw a line, in this case the x´-axis. Now for the angle it makes with the x-axis.

Referring to figure 5.2b, we know that extensions of the light path from the x´-axis to the ct´-axis (magenta dotted lines in the diagram) make 45° angles with the x– and ct axes, respectively. Distances from the origin to point a and point b (depicted in purple in the diagram) form the sides of an isosceles triangle. From geometry, we know, then, that the angles labeled ß are equal. The angles complementary to angle ß around points a and b equal 180 – ß. The triangles from the origin to point a to the x´-axis and from the origin to point b to the ct´-axis have 2 angles that are the same: 45° and 180° – ß. Since the angles of a triangle (at least in flat space) must equal 180°, the angles 𝛼 between the ct´- and ct-axes and the x´- and x-axes must be equal. The primed axes are also always symmetric about the 45°-angled path of a light ray. The closer the speed of the moving frame of reference those axes represent get to the speed of light, the closer the axes get to the 45° light ray path. This variation in axis configuration in the moving frame reflects the effects of time dilation and length contraction needed to keep the speed of light constant in all reference frames.

VI. Lorentz Transformation

With knowledge of spacetime diagrams in hand, let’s make such a diagram to help us derive the formulas for the Lorentz transformations For simplicity, we’ll consider only one spatial dimension in the following derivation because the same arguments we apply to this one spatial dimension can easily be applied to each of the other two spatial dimensions.

Spacetime diagram showing Lorentz transformation
Figure 6.1

The diagram on the left represents the frame of rest of an observer, Observer S, floating in space. Like any observer in their own frame of reference, Observer S believes she is at rest. The black coordinate axes in the diagram on the left represent the coordinates that correspond to Observer S.

Observer S sees a rocket go by at velocity v relative to her. Of course, the pilot of the rocket, Observer , thinks he’s at rest and sees Observer S moving to the left with velocity v. His coordinates are shown in cyan and are labeled as ct´ and in the diagram on the right.

Superimposed on the coordinates of Observer S in frame S, in cyan, are the coordinates of Observer S´. Similarly, superimposed on the coordinates of S´, in black, in frame S´, are the coordinates of Observer S. Notice that they point in a direction opposite to the coordinates of Observer S´ in frames S. This is because Observer S´ sees Observer S moving to the left.

What we want to do, in this section, is to find expressions that relate the x and ct coordinates of Observer S to the x and ct coordinates of Observer S´(which we’ll refer to as x´ and ct´). These expressions are called Lorentz transformations after Dutch physicist Hendrick Lorentz, who discovered them. There are a number of ways to derive such expressions. The derivation that follows is taken from Khan Academy, https://www.khanacademy.org/science/physics/special-relativity/lorentz-transformation/v/lorentz-transformation-derivation-part-1

Let’s begin by trying to come up with equations for x´ and x. A good start would be to use Galilean transformations x^{\prime}=x-vt for Observer S and x = x^{\prime} + vt^{\prime} for S´ (+\,vt for the S´ frame because Observer S´ sees Observer S moving to the left). But we know that time dilation and length contraction must be taken into account. Therefore, perhaps, we should add a scaling factor which we’ll call \gamma. We know that the lengths of units on coordinate axes in a given frame of reference are equal everywhere (although the length of those units will vary from inertial frame to inertial frame). Due to this homogeneity, we can make the assumption that our scaling factor will be linear. And because of the symmetry of frames S and S´, we will make the educated guess that the same scaling factor applies to expressions for both the x and x´. When we do, we get:

    \[x^{\prime}=\gamma(x-vt)\quad \quad \text{eq (6.1)}\]

and

    \[x = \gamma(x^{\prime} + vt^{\prime})\quad \quad \text{eq (6.2)}\]

Multiply eq (6.1) by x. We get:

    \begin{align*}x\cdot x^{\prime} &= x \gamma (x-vt)\\  &= \gamma(x^{\prime} + vt^{\prime}) \gamma (x-vt)\\  &= \gamma^2(x^{\prime} + vt^{\prime})(x-vt)\\  &= \gamma^2(x^{\prime}x - x^{\prime}vt + vt^{\prime}x - v^2tt^{\prime})\quad \quad \quad \quad \text{eq (6.3)} \end{align*}

From the diagram, we know that the equation for the orange line representing the light ray in frame S is x=ct and frame S´ is x^{\prime}=ct^{\prime}. Substituting these values for x and x´ into eq (6.3) gives us:

    \begin{align*} x\cdot x^{\prime} &= \gamma^2(x^{\prime}x - x^{\prime}vt + vt^{\prime}x - v^2tt^{\prime})\\ ct\cdot ct^{\prime} &= \gamma^2(ct^{\prime}ct - ct^{\prime}vt + vt^{\prime}ct - v^2tt^{\prime})\\ c^2\cancel{tt^{\prime}} &= \gamma^2(c^2\cancel{tt^{\prime}} - \cancel{ctvt^{\prime}} + \cancel{ctvt^{\prime}} - v^2\cancel{tt^{\prime}})\\ c^2 &= \gamma^2(c^2 - v^2)\\ \gamma^2 &= \frac{c^2}{c^2 - v^2}\\ \gamma^2 &= \frac{c^2}{c^2 - v^2} \cdot \frac{\frac{1}{c^2}}{\frac{1}{c^2}}\\ \gamma^2 &= \frac{1}{1-\frac{v^2}{c^2}}\\ \gamma &= \frac{1}{\sqrt{1-\frac{v^2}{c^2}}}\quad \quad \quad \quad \text{eq (6.4)} \end{align*}

Now that we’ve got an expression for our scaling factor \gamma, we can plug it into equations (6.1) and (6.2) to obtain the Lorentz transformations for x and x´:

    \[x^{\prime} =\frac{1}{\sqrt{1-\frac{v^2}{c^2}}} (x + vt)\quad \quad \text{eq (6.5)}\]

and

    \[x = \frac{1}{\sqrt{1-\frac{v^2}{c^2}}}(x^{\prime} + vt^{\prime})\quad \quad \text{eq (6.6)}\]

What we need to do next is find the Lorentz transformations for t and t´. To do this, we begin with equations (6.5) and (6.6) then perform some algebraic manipulation. Let’s start by finding the Lorentz transformation for t´:

    \begin{align*} x &=\gamma(x^{\prime} + vt^{\prime})\\ &=\frac{x}{\gamma} = x^{\prime} + vt^{\prime}\\ &=\frac{x}{\gamma} - x^{\prime} = vt^{\prime}\\ &=\frac{x}{\gamma v} - \frac{x^{\prime}}{v} =t^{\prime}\\ &= \frac{x}{\gamma v} - \frac{\gamma(x-vt)}{v} = t^{\prime}\\ t^{\prime}&= \gamma(\frac{x}{\gamma^2 v} - \frac{x}{v} + t)  \quad \quad \text{eq (6.7)}\end{align*}

We can modify this equation further to put it into the form that is most recognized:

    \begin{align*} t^{\prime} &= \gamma(\frac{x}{\gamma^2 v} - \frac{x}{v} + t) \quad \quad \text{expand}\\  &= \frac{x}{\gamma v} - \frac{\gamma x}{v} +\frac{\cancel{v} t \gamma}{\cancel{v}}\\  &= \frac{x}{\gamma v} - \frac{\gamma x}{v} + \gamma t \quad \quad \text{factor out } \gamma\\  &= \gamma\left[ \left( \frac{x}{\gamma^2 v} - \frac{x}{v} \right) + t \right] \quad \quad \text{factor out } x \text{ from } ()\\  &= \gamma \left[ x\left( \frac{1}{\gamma^2 v} - \frac{1}{v} \right) + t \right] \quad \quad \text{multiply } 1 \times \frac{1}{v}\\  &= \gamma \left[ x\left( \frac{1}{\gamma^2 v} - \frac{\gamma^2}{\gamma^2} \cdot \frac{1}{v} \right) + t \right] \quad \quad \text{expand and simplify}\\  &= \gamma \left[ x\left( \frac{1-\gamma^2}{\gamma^2 v} \right) + t \right]  \quad \quad \text{eq (6.8)} \end{align*}

From here, we note that

    \[ \gamma = \frac{1}{\sqrt{1-\frac{v^2}{c^2}}} \,\, \Rightarrow \,\, \gamma^2 = \frac{1}{1-\frac{v^2}{c^2}} \cdot  \frac{c^2}{c^2} = \frac{c^2}{c^2-v^2} \quad \quad \text{eq (6.9)} \]

We make use of this to modify the expression \displaystyle \frac{1-\gamma^2}{\gamma^2 v} in eq (6.8):

    \begin{align*} t^{\prime} &= \gamma\left[ x \left( \displaystyle  \frac{\displaystyle \frac{c^2-v^2}{c^2-v^2} - \displaystyle \frac{c^2}{c^2-v^2}}{\displaystyle \frac{c^2 v}{c^2-v^2}} \right) + t  \right]\\  \\  &= \gamma \left[ x \left( \displaystyle  \frac{\displaystyle \frac{-v^2}{c^2-v^2}}{\displaystyle \frac{c^2 v}{c^2-v^2}} \right) + t \right]\\  \\  &= \gamma \left[ x \left( \displaystyle \frac{-v^2}{\cancel{c^2-v^2}} \cdot \displaystyle \frac{\cancel{c^2-v^2}}{c^2 v} \right) + t \right]\\  \\  &= \gamma \left[ x \left( \displaystyle \frac{-v^2}{c^2 v} \right) + t \right]\\  \\  &= \gamma \left( t - \displaystyle \frac{vx}{c^2} \right) \end{align*}

So now we have an expression for the Lorentz transform for t^{\prime}:

    \[  t^{\prime} =  \gamma \left( t - \displaystyle \frac{vx}{c^2} \right) \quad \quad \text{eq (6.10)}\]

We can derive an expression for t as well:

    \[  t =  \gamma \left( t ^{\prime}+ \displaystyle \frac{vx^{\prime}}{c^2} \right) \quad \quad \text{eq (6.11)}\]

We could do this in a manner similar to the way we derived the expression for t^{\prime} except we start with the equation x ^{\prime}&=\gamma(x + vt). To see the proof, click .

Here is a summary of the Lorentz transform equations:

    \[x^{\prime} =\frac{1}{\sqrt{1-\frac{v^2}{c^2}}} (x - vt)\quad \quad \,\text{eq (6.5)}\]


    \[x = \frac{1}{\sqrt{1-\frac{v^2}{c^2}}}(x^{\prime} + vt^{\prime})\quad \quad \text{eq (6.6)}\]


    \[t^{\prime} =  \gamma \left( t - \displaystyle \frac{vx}{c^2} \right) \quad \quad \quad \quad \,\,\,\,\text{eq (6.10)}\]


    \[t =  \gamma \left( t ^{\prime}+ \displaystyle \frac{vx^{\prime}}{c^2} \right) \quad \quad \quad \quad \,\text{eq (6.11)}\]

A useful way to work with these equations is to put them into matrix form using the so-called Lorentz transformation matrix. Here’s what it looks like:

    \[ \begin{pmatrix} t^{\prime}\\ x^{\prime}\\ y^{\prime}\\ z^{\prime} \end{pmatrix} = \begin{pmatrix} \gamma & -\gamma v/c^2 & 0 & 0\\ -\gamma v & \gamma & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} t\\ x\\ y\\ z \end{pmatrix} \quad \quad \text{eq (6.12)}\]

For readers that need some help with matrix multiplication, click .

Eq (6.12) is the matrix equation that transforms coordinates when there is relative motion in the x-direction. Similar matrices can be constructed to transform vectors when there is relative motion in the y- and z-directions. For the y-direction, we have:

    \[B_y =\begin{pmatrix}\gamma & 0 & -\gamma v/c^2 & 0\\0 & 1 & 0 & 0\\-\gamma v & 0 & \gamma & 0\\0 & 0 & 0 & 1\end{pmatrix}\quad \text{eq (6.13)}\]

For relative motion in the z-direction, we use:

    \[B_z =\begin{pmatrix}\gamma & 0 & 0 & -\gamma v/c^2\\0 & 0 & 0 & 0\\0 & 0 & 1 & 0\\-\gamma v & 0 & 0 & \gamma\end{pmatrix}\quad \text{eq (6.14)}\]

In eq (6.13) and eq (6.14), B_i stands for boost in the direction of i. A boost simply represents transformation to a frame of reference with motion in the i direction relative to some other frame of reference. So if coordinate system S^{\prime} is moving in the y-direction relative to coordinate system S, then that’s referred to as a boost in the y-direction.

Spatial rotations can also be represented by matrices in special relativity. Matrices representing pure spatial rotations include:

Around the x-axis:

    \[R_{\theta_z}=\begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & \cos \theta & -\sin \theta \\  0 & 0 & \sin \theta & \cos \theta \end{pmatrix} \quad \text{eq (6.15)} \]

Around the y-axis:

    \[R_{\theta_y}=\begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & \cos \theta & 0 & -\sin \theta \\ 0 & 0 & 0 & 0 \\  0 & 0 & \sin \theta & \cos \theta \quad \text{eq (6.16)}  \end{pmatrix}\]

Around the z-axis:

    \[ R_{\theta_z}=\begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & \cos \theta & -\sin \theta & 0 \\ 0 & \sin \theta & \cos \theta  & 0 \\  0 & 0 & 0 & 0 \end{pmatrix} \quad \text{eq (6.17)}\]

We can combine boosts, B_i, and rotations, R_j by performing matrix multiplication. For example, a boost in the x direction followed by a rotation around the z-axis would be accomplished mathematically by:

    \[  \vec{x}\,^{\prime} = R_{\theta_z} B_x\, \vec{x}\]

or

\begin{pmatrix} t^{\prime}\\ x^{\prime}\\ y^{\prime}\\ z^{\prime} \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & \cos \theta & -\sin \theta & 0 \\ 0 & \sin \theta & \cos \theta  & 0 \\  0 & 0 & 0 & 0 \end{pmatrix} \begin{pmatrix} \gamma & -\gamma v/c^2 & 0 & 0\\ -\gamma v & \gamma & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{pmatrix}\begin{pmatrix} t\\ x\\ y\\ z \end{pmatrix}

Another well-known example of this is the so-called Thomas-Wigner rotation. This refers to the fact that 2 boosts can be shown to be equivalent to one boost and one rotation. Intuition as to why this is so can be gleaned from figure 6.2.

Thomas-Wigner rotation: 2 boosts equal 1 boost and 1 rotation
Figure 6.2

In figure 6.2, it can be seen that a boost in the x-direction, B_{x_1}, followed by a boost in the y-direction, B_y leads to vector \vec{S}. But so does a larger boost in the x-direction, B_{x_2}, followed by a rotation around the z-axis, R_{\theta_z}. The math that goes along with this gets hairy so I won’t reproduce it here. However, for those who are interested, a detailed mathematical description can be found at https://www.mathpages.com/home/kmath714/kmath714.htm.

The main significance of the Lorentz transformation is that it leaves an entity to which it is applied invariant. This is critical since the ability to do physics depends on its laws being the same in all frames of reference. The concept of invariant quantities is central to special relativity. Therefore, we’ll have much more to say about this concept in the next section.

VII. Invariants in Special Relativity

Figure 7.1

A significant difference between the geometry that governs Newtonian mechanics – Euclidean geometry, the geometry that’s taught in high school – and the geometry that describes special relativity – Minkowski space – is the way time is viewed. This is shown in figure 7.1. In Euclidean space, shown in figure 7.1a, space and time are treated separately. We’re only considering one spatial dimension in figure 7.1a. However, I’ve drawn them on 2-D sheets to convey the idea that, in Euclidean space, “slices of space” progress through time. These slices of space are subject to Galilean relativity but time is considered to be the same in all frames of reference.

By contrast, in Minkowski space, time and space are considered to be on the same footing, as shown in figure 7.1b. In Minkowski space, time and space coordinates depend on each other (per the Lorentz transformation equations) and change in such a way that the speed of light is always constant, in keeping with Einstein’s second postulate.

In both, however, there are entities upon which observers in all frames of reference agree. We’ll call these invariant quantities. In Euclidian geometry, an example of such an invariant is the distance between 2 points in space. Specifically,

    \[ \Delta s ^2= \Delta x ^2 + \Delta y^2 \ \quad \quad \text{eq (7.0.1)} \quad \text{where}\]

\displaystyle \Delta s is the distance between the 2 points
\displaystyle \Delta x is the difference between the x-coordinates at the beginning and end of the line segment formed by the 2 points
\displaystyle \Delta y is the difference between the y-coordinates at the beginning and end of the line segment formed by the 2 points

Of course, this is just the well-known Pythagorean theorem.

Figure 7.2 provides an example.

Figure 7.2

In figure 7.2a, we have a vector that represents the displacement between the points (0,0) and (0,5). The length of this vector is given by :

    \[ \Delta s^2 = (5-0)^2 -(0-0)^2 \quad \Rightarrow \quad \Delta s = \sqrt{25}  - 0 = 5  \]

In figure 7.2b, our coordinate system has been displaced 2 units to upward and 2 units to the right with respect to the coordinate system we used in figure 7.2a. The blue line vector, by eye, looks to be the same length as in figure Xa. Mathematically, we have:

    \[ \Delta s^2 = (3-(-2))^2 + (-2-(-2))^2 \quad \Rightarrow \quad \Delta s = \sqrt{25}  - 0 = 5 \]

In figure 7.2c, the axes are rotated counterclockwise compared with those in figure 7.2a. To see that the length of the blue vector is the same as in the first 2 cases takes a little more work. In this case, the vector \displaystyle \begin{pmatrix} 5\\0\end{pmatrix} is multiplied by the rotation matrix \displaystyle \begin{pmatrix}\cos \theta & + \sin \theta \\ -\sin \theta & \cos \theta \end{pmatrix} to get a new vector \displaystyle \begin{pmatrix} x^{\prime}\\y^{\prime}\end{pmatrix} (where \theta, if we measure it, is ~36.87°):

    \[ \begin{pmatrix} x^{\prime}\\y^{\prime}\end{pmatrix} = \begin{pmatrix}\cos (36.87) & \sin (36.87) \\ -\sin (36.87) & \cos (36.87) \end{pmatrix} \begin{pmatrix} 5\\0\end{pmatrix} \]

When we perform the matrix multiplication, we get:

    \[  x^{\prime}=5\cdot \cos(36.87) + 0\cdot  \sin (36.87) =  5(0.8) + 0 = 4 \]

and

    \[  y^{\prime}=5\cdot \left(-\sin(36.87)\right) - 0\cdot  \cos (36.87) =  -5(0.6) + 0 = -3 \]

We already know the lefthand-most point of the blue vector is at the origin, at (0,0). From our calculations above, the vector ends at the point (4, -3). Putting these values into our equation for \Delta s, we obtain:

    \[ \Delta s^2 = (4-0)^2 + (-3-0)^2 \,\,\Rightarrow \,\, \Delta s = \sqrt{16 +9}  - 0 = \sqrt{25}-0 = 5 \]

If desired, you can find further details about the rotation matrix here.

We can make the interval \Delta s smaller and smaller, until it’s infinitesimal in length. We call this the line element, ds where

    \[ds^2 = dx^2 + dy^2\quad \quad \text{eq (7.0.2)}\]

dx and dy being infinitesimal displacements in the x- and y-directions, respectively.

In 3-dimensional space, eq (7.2) becomes:

    \[ds^2 = dx^2 + dy^2 + dz^2 \quad \quad \text{eq (7.0.3)}\]

We need to remember, though, that this only works when the coordinate system has what’s called an orthonormal basis. In general, the length of a vector is given by it’s dot product. Let me explain.

Figure 7.3

Suppose, as shown in figure 7.3a, we have a Euclidean coordinate system with a column vector \displaystyle \begin{pmatrix}1 \\ 2 \end{pmatrix}. The 1 and 2 within the parentheses in this vector are just a part of the vector – the components – which represent the magnitude of the vector in a particular direction. The directions, in turn, are given by basis vectors, \vec{e_x} and \vec{e_y}. The combined vector, \vec{R} is:

    \[ \vec{R} = 1\vec{e_x} + 2\vec{e_y} \]

To find the length of \vec{R}, since we’re dealing with an orthonormal basis, we can apply the Pythagorean theorem and it works:

    \[ \lvert R \rvert ^2 = 1^2 + 2^2 = 5 \Rightarrow \lvert R \rvert = \sqrt{5}\]

By the dot product method:

    \begin{align*} \lvert \vec{R} \rvert ^2 &= \vec{R} \cdot \vec{R}\\  &= \left( 1\vec{e_x} + 2\vec{e_y} \right) \cdot \left( 1\vec{e_x} + 2\vec{e_y} \right)\\  & \\  &= (1)(1)\underbrace{\vec{e_x}\cdot\vec{e_x}}_{1} + (1)(2)\underbrace{\vec{e_x}\cdot\vec{e_y}}_{0} + (2)(1)\underbrace{\vec{e_y}\cdot\vec{e_x}}_{0} + (2)(2)\underbrace{\vec{e_y}\cdot\vec{e_y}}_{1}\\  &= 1 + 4\\ \text{so,} & \\ \lvert \vec{R} \rvert &= \sqrt{5} \end{align*}

To see why the basis vector dot products are what they are, click .

This result agrees with the result we got by applying the Pythagorean theorem. So far, so good.

Now lets look at figure 7.3b.

In a Euclidean space, the basis vectors are orthonormal – ortho meaning orthogonal, meaning their dot product is 0 (because they are perpendicular to each other) – and normal meaning their length is equal to 1. This is true for the basis vectors in Xa. To see this, take the dot product of \vec{e_x} and \vec{e_y}:

    \[ \vec{e_x} \cdot \vec{e_y} = \begin{pmatrix} 1\\0\end{pmatrix} \cdot \begin{pmatrix} 0\\1\end{pmatrix} = 1 \cdot 0 + 0 \cdot 1 = 0 + 0 = 0\]

This is not the case with \vec{e_x}^{\prime} and \vec{e_y}^{\prime}:

    \[ \vec{e_x}^{\prime} \cdot \vec{e_y}^{\prime} = \begin{pmatrix} 1\\1\end{pmatrix} \cdot \begin{pmatrix} 0\\1\end{pmatrix} = 1 \cdot 0 + 1 \cdot 1 = 0 + 1 = 1\]

In addition, \vec{e_x}^{\prime} is not normalized (i.e., its length is not equal to 1):

    \[ \lvert \vec{e_x}^{\prime}\rvert^2 = 1^2 + 1^2 = 2; \,\, \Rightarrow \,\, \left( \vec{e_x}^{\prime} \right) = \sqrt{2} \]

We can see from figure 7.3b that the equation describing the vector \vec{R^{\prime}} is:

    \[ \vec{R^{\prime}} = 1e_x^{\prime} + 1e_y^{\prime}\]

For those who need a quick refresher on vector addition, click .

We describe the vectors \vec{e_x}^{\prime} and \vec{e_y}^{\prime} in terms of \vec{e_x} and \vec{e_y}:

    \[ \vec{e_x}^{\prime} = \vec{e_x} + \vec{e_y} \quad \text{and} \quad \vec{e_y}^{\prime} = \vec{e_y} \]

So

    \begin{align*} \vec{e_x}^{\prime} \cdot \vec{e_x}^{\prime} &= (\vec{e_x} + \vec{e_y})(\vec{e_x} + \vec{e_y})\\ &= \vec{e_x} \cdot \vec{e_x} + \vec{e_x} \cdot \vec{e_y} + \vec{e_y} \cdot \vec{e_y} + \vec{e_y} \cdot \vec{e_y}\\ &= 1 + 0 + 0 +1 \\ &= 2\\ & \\ & \\ \vec{e_x}^{\prime} \cdot \vec{e_y}^{\prime} &= (\vec{e_x} + \vec{e_y}) \cdot \vec{e_y}\\ &= \vec{e_x} \cdot \vec{e_y} + \vec{e_y} \cdot \vec{e_y} \\ &= 0 + 1\\ &= 1 \\ & \\ \vec{e_y}^{\prime} \cdot \vec{e_y}^{\prime} &= \vec{e_y} \cdot \vec{e_y}\\ &= 1\end{align}

The vector length-by-dot-product equation becomes:

    \begin{align*} \vec{R}^{\prime} &= 1\vec{e_x}^{\prime} + 1\vec{e_y}^{\prime}\\ \lvert \vec{R} \rvert^2 &= \vec{R}^{\prime} \cdot \vec{R}^{\prime}\\ &= \left( 1\vec{e_x}^{\prime} + 1\vec{e_y}^{\prime}\right) \cdot \left( 1\vec{e_x}^{\prime} + 1\vec{e_y}^{\prime}\right)\\ &= 1 \cdot 1(\vec{e_x}^{\prime} \cdot \vec{e_x}^{\prime}) + 1 \cdot 1(\vec{e_x}^{\prime} \cdot \vec{e_y}^{\prime}) + 1 \cdot 1(\vec{e_y}^{\prime} \cdot \vec{e_x}^{\prime}) + 1 \cdot 1(\vec{e_y}^{\prime} \cdot \vec{e_y}^{\prime})\\ &= 2 + 1 + 1 + 1\\ &= 5\\ \lvert \vec{R} \rvert &= \sqrt{5} \end{align*}

We can see, from this, that the dot product method of obtaining the length of a vector works in non-orthogonal bases.

In general, the formula for dot product (and thus, for the invariant length, \Delta s) can be written as a matrix equation. Here are the details:

    \begin{align*} \vec{R} &= x\vec{e_x} + y\vec{e_y}\\ \lvert R \rvert &= \vec{R} \cdot \vec{R}\\ &= (x\vec{e_x} + y\vec{e_y}) \cdot x\vec{e_x} + y\vec{e_y}\\ &= xx(\vec{e_x} \cdot \vec{e_x}) + xy(\vec{e_x} \cdot \vec{e_y}) + yx(\vec{e_y} \cdot \vec{e_x}) + yy(\vec{e_y} \cdot \vec{e_y})\\ &= \begin{bmatrix} x & y \end{bmatrix} \begin{bmatrix} \vec{e_x} \cdot \vec{e_x} & \vec{e_x} \cdot \vec{e_y}\\ \vec{e_y} \cdot \vec{e_x} & \vec{e_y} \cdot \vec{e_y}\end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix}\\ &= \begin{bmatrix} x & y \end{bmatrix} \mqty[ g_{xx} & g_{xy} \\ g_{yx} & g_{yy}] \begin{bmatrix} x \\ y \end{bmatrix} \end{align*}

VII.1 Spacetime Interval

So what is the spacetime analogue of this invariant in Euclidian space – the length of a vector? It’s the spacetime interval, also an invariant, which we’ll also represent by \Delta s. But what is it?

Since one of the fundamental postulates of special relativity is the invariance of the speed of light in all reference frames, you might guess that postulate might be the foundation on which the derivation of an invariant quantity in special relativity is built. And you would be correct.

Spacetime Invariant: Light beam in 2 frames of reference
Figure 7.1.1

Consider a spacetime diagram (figure 7.1.1.) in which a light beam (blue arrow) traveling from the origin to point x_1 in the x-direction in a frame of reference we’ll call the unprimed frame (black coordinate axes). We know the light beam travels a distance x_1 in time \Delta t_1 so:

    \[ x_1 = c \Delta t_1 \quad \Rightarrow \quad x^2_1 = (c \Delta t_1)^2 \quad \Rightarrow \quad -(c\Delta t_1)^2 + x_1 ^2= 0 \quad \text{eq (7.1.0.1)}\]

Now, consider a second frame of reference – the primed frame – whose axes are shown in red. We know that the speed of light is the same in all frames of reference. Therefore:

    \[  x_1^{\prime} = c\Delta t_1^{\prime} \quad \Rightarrow \quad x_1^{\prime}^2 = (c \Delta t_1^{\prime} )^2\quad \Rightarrow \quad -(c \Delta t_1^{\prime})^2 + x_1^{\prime}^2 = 0 \quad \text{eq (7.1.0.2)}\]

Notice that the components of the blue vector representing the world line of the light beam differ for each reference frame i.e., (ct_1, x) versus (ct_1^{\prime},x_1^{\prime}). However, it’s obvious from the diagram these are just different ways of describing the same entity. That is, the spacetime distance traversed by the light beam is invariant.

Note also that, since the expressions on the left side of eq (7.1.0.1) and eq (7.1.0.2) both equal zero, these expressions must be equal to each other:

    \[-(c \Delta t)^2 + x_1^2 = -(c \Delta t^{\prime})^2 + x_1^{\prime}^2 \quad \text{eq (7.1.0.3)}\]

And since c \Delta t _i= x_i for any point x_i, we have, in general:

    \[-(c\Delta t)^2 + \Delta x^2 = -(c \Delta t^{\prime})^2 + \Delta x^{\prime}^2  \quad \text{eq (7.1.0.4) for all x}\]

-(c \Delta t)^2 + \Delta x^2, then, is a candidate for an invariant in special relativity. Let’s call it the spacetime interval and give it the same variable name as the length interval in Euclidean geometry, \Delta s.

Note that, as I’ll discuss later, there are two equally correct conventions for writing the invariant interval in special relativity and the Minkowski metric that’s derived from it. I’ve written the equation -(c\Delta t)^2 + x^2= 0. This leads to one of the conventions. I could’ve written (c\Delta t)^2 - x_1 ^2= 0 as well. This would lead to the other.

At any rate, we know, from what we just said, that \Delta s = -(c \Delta t)^2 + \Delta x^2 for things traveling at the speed of light (i.e., photons). We’d like, next, to prove that this is true for frames of reference traveling at other speeds.

To do this, we consider events (x,\, ct) and (x^{\prime},\, ct^{\prime}) occurring in 2 different frames of reference: S and S´, respectively. We know that we can relate these events via the Lorentz transformations.

We start with ct^{\prime} and will not use units that result in c=1. The Lorentz transformation is:

    \[ c\Delta t^{\prime} = \gamma \left( c\Delta t - \Delta x \frac{v}{c} \right)  \quad \text{eq (7.1.0.5) for all x}\]

Squaring this equation, recognizing from eq (6.9) that \displaystyle \gamma = \frac{c^2}{c^2-v^2}   and doing some algebra, we get:

    \begin{align*} c^{\prime}^2\Delta t^{\prime}^2 &= \gamma^2 \left( c\Delta t - \Delta x \frac{v}{c} \right)^2 \\ &= \frac{c^2}{c^2-v^2}  \left( \frac{c^2 \Delta t - \Delta xv}{c} \right)^2 \\ &= \frac{c^2}{c^2-v^2} \left( \frac{c^4\Delta t^2 - 2c^2\Delta t\Delta xv + \Delta x^2v^2}{c^2} \right)  \quad \text{eq (7.1.0.6) for all x}\end{align*}

Next we perform a Lorentz transformation on x:

    \[ \Delta x^{\prime} = \gamma \left( -c\Delta t\frac{v}{c} + \Delta x \right)   \quad \text{eq (7.1.0.7) for all x}\]

Squaring eq (7.1.0.7), using our alternate expression for \gamma and doing some more algebra gives us:

    \begin{align*} \Delta x^{\prime}^2 &= \gamma^2 \left( -c\Delta t \frac{v}{c} + \Delta x \right) ^2 \\ &= \frac{c^2}{c^2-v^2}\left( \frac{c\Delta x -c\Delta tv}{c} \right)^2 \\ &= \frac{c^2}{c^2-v^2}\left( \frac{c^2\Delta x^2 -2c^\Delta x\Delta tv + c^2\Delta t^2v^2}{c^2} \right)^2 \quad \quad \text{eq (7.1.0.8)}\end{align*}

Now that we have expressions for \Delta x^{\prime}^2 and c^{\prime}^2 \Delta t^{\prime}^2, we can combine them to get an expression for -c^{\prime}^2 \Delta t^{\prime}^2 + \Delta x^{\prime}^2:

    \begin{align*} -c^{\prime}^2 \Delta t^{\prime}^2 + \Delta x^{\prime}^2 &= -\frac{c^2}{c^2-v^2}\left( \frac{c^4\Delta t^2 - 2c^2\Delta t\Delta xv + \Delta x^2v^2}{c^2} \right) \\ & \quad +\, \frac{c^2}{c^2-v^2}\left( \frac{c^2\Delta x^2 - 2c^2\Delta x\Delta tv + c^2\Delta t^2v^2}{c^2} \right)\\ \\ &= -\frac{c^2}{c^2-v^2}\left( c^2\Delta t^2 - 2\Delta t\Delta xv + \frac{\Delta x^2v^2}{c^2} \right) \\ & \quad +\, \frac{c^2}{c^2-v^2}\left( \Delta x^2 - 2\Delta x\Delta tv + \Delta t^2v^2 \right)\\ \\ &= \frac{c^2\Delta x^2 - \Delta x^2v^2 + c^2\Delta t^2v^2 - c^4\Delta t^2}{c^2-v^2}\\ \\ &= \frac{\Delta x^2(c^2-v^2) - c^2\Delta t^2(c^2-v^2)}{c^2-v^2}\\ \\ &= -c^2 \Delta t^2 + \Delta x^2 \quad \quad \text{eq (7.1.0.9)} \end{align*}

So \Delta s^{\prime} = \Delta s, which is what we wanted to prove i.e., \Delta s = -(c^2\Delta  t)^2 + \Delta x^2), the spacetime interval in 2 spacetime dimensions is, indeed, invariant for all coordinate frames involving difference in velocity in the x-direction (so-called x-direction boosts).

We’ve done this for the x direction but we can apply this to the y and z directions as well to get an invariant in 4D spacetime:

    \[\Delta s^2 = -(c\Delta t)^2 + \Delta x^2 + \Delta y^2 +\Delta z^2 \quad \text{eq (7.1.0.10)} \]

The spacetime interval is invariant under rotations of space as well by the following arguments:

  • Spatial rotations do not effect time so the -(c\Delta t)^2 term should remain unchanged
  • \Delta x^2 + \Delta y^2 +\Delta z^2 is just the length element in Euclidean space which is invariant under rotation
  • Since all of the terms on the right side of eq () are invariant, \Delta s^2, too, must be invariant.

Thus, while observers in different frames of reference may disagree on time and spatial coordinates, all will agree on the separation of spacetime events.

And if we allow all of the elements in eq () to become infinitesimally small, we can come up with an equation that describes the length of infinitesimal distance (or line element) in special relativity:

    \[ds^2 = -ct^2 + dx^2 + dy^2 + dz^2 \quad \text{eq (7.1.0.11)} \]

As alluded to above, we can express the length of any vector as its inner (or dot) product with itself, and if that length is infinitesimal, then what we get is the line element. We also said that we could express this line element in a matrix equation. For the space of special relativity, referred to as Minkowski space, the matrix used in such an expression is called the Minkowski metric:

    \[\displaystyle \eta_{\mu \nu} = \displaystyle \begin{bmatrix} g_{00} & g_{01} & g_{02} & g_{03} \\ g_{10} & g_{11} & g_{12} & g_{13} \\ g_{20} & g_{21} & g_{22} & g_{23}\\ g_{30} & g_{31} & g_{32} & g_{33}\end{bmatrix} = \begin{bmatrix} -1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\0 & 0 & 1 & 0\\0 & 0 & 0 & 1 \end{bmatrix} \quad \text{eq (7.1.0.12)}  \]

Following the format I used previously, the matrix equation is:

    \begin{align*} ds^2 &= \begin{matrix} [ct & x & y & z] \\ \,& \,& \,& \,\\  \,& \,& \,& \,\\ \,& \,& \,& \,\\ \end{matrix}\begin{bmatrix} -1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\0 & 0 & 1 & 0\\0 & 0 & 0 & 1 \end{bmatrix}\begin{bmatrix} ct \\ x \\ y \\ z\end{bmatrix} \\ \,\\ &= -(ct)^2 + x^2 +y^2 +z^2\\ \,\\ &=  \begin{matrix} [x_0 & x_1 & x_2 & x_3] \\ \,& \,& \,& \,\\  \,& \,& \,& \,\\ \,& \,& \,& \,\\ \end{matrix}\begin{bmatrix} -1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\0 & 0 & 1 & 0\\0 & 0 & 0 & 1 \end{bmatrix}\begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ x_3\end{bmatrix} \\ \, \\ &= -(x_0)^2 + (x_1)^2 + (x_2)^2 + (x_3)^2 \quad \text{eq (7.1.0.13)}  \end{align*}

Notice that, in the lower 2 terms in eq (7.1.0.13), I’ve made the following replacements:

x_0=ct, \quad x_1=x, \quad x_2=y, \quad x_3=z, \quad

This is standard nomenclature in many texts. We’ll use both the (t, x, y, z) and (x_0, x_1, x_2, x_3) on this page, depending on what best illustrates the point I’m trying to make.

At this point, if the reader has no familiarity with tensors and the Einstein summation convention, it would do well to obtain some knowledge about these topics since they pop up frequently in relativity. For example, eq (7.1.0.13) is often expressed as:

ds^2 = \eta_{\mu \nu}dx^{\mu} dx^{\nu} \quad \text{eq (7.1.0.14)}

I have a page on this website that gives an introduction to tensors. On that page, I list several additional resources for learning about tensors. Also on my tensor page is a section on the metric tensor. However, for the time being, hopefully readers unfamiliar with these topic will get the gist of eq (7.1.0.14) if I re-write it in the following form:

    \begin{align*}\displaystyle ds^2 &= g_{00}dx^0dx^0 + g_{01}dx^0dx^1 + g_{02}dx^0dx^2 + g_{03}dx^0dx^3 \\ &+  g_{10}dx^1dx^0 + g_{11}dx^1dx^1 + g_{12}dx^1dx^2 + g_{13}dx^1dx^3  \\   &+g_{20}dx^2dx^0 + g_{21}dx^2dx^1 + g_{22}dx^2dx^2 + g_{23}dx^2dx^3  \\   &+g_{30}dx^3dx^0 + g_{31}dx^3dx^1 + g_{32}dx^1dx^2 + g_{33}dx^3dx^3 \quad \text{eq (7.1.0.15)}  \end{align*}

We know from the line element of the Minkowski metric that the only g_{\mu \nu} terms that are nonzero are the ones where \mu = \nu

Therefore, we have:

    \begin{align*}ds^2 &= g_{00}dx^0dx^0 + g_{11}dx^1dx^1 + g_{22}dx^2dx^2 + g_{33}dx^3dx^3 \\ &= (-1)(dx^0)^2 + (1)(dx^1)^2 + (1)(dx^2)^2 + (1)(dx^3)^2 \\ &= -(dx^0)^2 + (dx^1)^2 + (dx^2)^2 + (dx^3)^2 \\ &=  -(cdt)^2 + dx^2 +dy^2 + dz^2 \quad \text{eq (7.1.0.16)} \end{align*}

Here are a few comments about tensors and Einstein summation notation that may be helpful:

  • Tensors are categorized according to the number of indices they have; the number of indices a tensor has is called its rank
  • Scalars (plain numbers) have 0 indices and are rank 0 tensors; vectors have 1 index and are rank 1 tensors; the metric tensor has 2 indices and is a rank 2 tensor
  • Superscripted (upper) indices are referred to as contravariant; Subscripted (lower) indices are referred to as covariant
  • When the same upper and lower index appears in a single expression, that index is summed over and yields a scalar
    • e.g. Dot product can be written: A_uA^u = A_1A^1 + A_2A^2 + A_3A^3
  • Indices that are summed over in an expression can be “cancelled”, kind of like the same number in the numerator and denominator when multiplying 2 fractions
    • e.g. g_{\cancel{\mu} \cancel{\nu}}dx^{\cancel{\mu}}dx^{\cancel{\nu}} = ds^2 (a scalar that has no indices) vs. \displaystyle \frac{5}{\cancel{3}} \frac{\cancel{3}}{7} =\frac57
  • Scalars are the same in all coordinate systems. Thus, if we can manipulate terms to create a scalar, that entity will be an invariant

VII.1.1 Spacetime Interval Categories

Spacetime Interval Categories
Figure 7.1.1.1

We know that light travels at, well, the speed of light, c Light will travel a distance ct in the x-direction in per each increment of ct on the ct axis. Therefore, on a spacetime diagram, the world line of a light beam is represented as a line at a 45° angle (if the light is moving in the +x direction) or a line at 135° (if the light is moving in the -x direction). Light world lines are depicted as blue lines in figure 7.1.1.1. The spacetime interval for a light beam from point (0,0) to (x,ct) is:

    \[  s^2 = (ct)^2 - x^2 \quad \text{eq (7.1.1.1}) \]

Here, I’m using the (+,-,-,-) form of the Minkowski metric rather than the the (-,+,+,+) form we’ve used previously simply because it corresponds to what’s shown in figure 7.1.1.1 and figure 7.1.1.1 is the way the following discussion in most often presented. If I had used the (-,+,+,+) convention, then figure 7.1.1.1 would’ve had to have been rotated clockwise 90°.

At any rate, we just said that in time, ct, the light beam travels a distance ct. Therefore, the spacetime interval from (0,0) to (x,ct) is zero:

    \[  s^2 = (ct)^2 - x^2  = (ct)^2 - (ct)^2 = 0 \quad \text{eq (7.1.1.2})\]

From this, an object traveling at the speed of light is said to follow a light-like course through spacetime.

On the other hand, if the object’s speed is < c, then \frac{x}{t}<\frac{ct}{t} which implies that x<ct. Therefore,

    \[  s^2 = (ct)^2 - x^2  > 0  \quad \text{eq (7.1.1.3})\]

Such objects are said to follow a time-like course through spacetime. Since all objects that we currently know about in physics travel at a speed < c, all known object follow a time-like world lines.

We talked previously about how observers in different frames of reference will consider events to occur at different times. However, for all objects following time-like trajectories (which are all objects that we know of), there is no frame of reference that will change the sequence of events. Which is a good thing because for event A to cause event B, event A has to occur before event B. If we could find a frame of reference where event B occurred before event A, then the effect (event B) would precede the cause (event A), as situation that has never been observed.

Finally, if an object’s speed were > c, then, by the same argument presented above, ct>x. In this case:

    \[  s^2 = (ct)^2 - x^2 < 0 \quad \text{eq (7.1.1.4})\]

The world line of such an object is called space-like. Under space-like conditions, frames of reference can be found where sequence of events can be changed, and thus, causality can be “overturned.” Since we don’t think objects can travel with speeds > c, we don’t believe space-like spacetime trajectories exist.

To summarize:

s^2 > 0 \quad  \Rightarrow \quad \text{time-like}
s^2 = 0 \quad \Rightarrow \quad  \text{light-like}
s^2 < 0 \quad \Rightarrow \quad  \text{space-like}

This terminology applies when we use the Minkowski metric of the form (+,-,-,-). If we had used the opposite convention (-,+,+,+), then the categories of the spacetime interval would be:

s^2 < 0 \quad  \Rightarrow \quad \text{time-like}
s^2 = 0 \quad \Rightarrow \quad  \text{light-like}
s^2 > 0 \quad \Rightarrow \quad  \text{space-like}

VII.1.2 Hyperbolic Geometry

Hyperbolic geometry of Minkowski spacetime
Figure 7.1.2.1

We’ve seen the equation for the spacetime interval in Minkowski space, the spatial geometry where special relativity plays out:

    \[ s^2 =  (ct)^2 - x^2 \quad \text{eq (7.1.1.1)} \]

The graph of these equations are a series of hyperbolae, as depicted in figure 7.1.2.1.

Circle vs hyperbola with equations
Figure 7.1.2.2

At this point, it’s necessary for the reader to have at least basic familiarity with hyperbolic geometry. A brief introduction to hyperbolic geometry can be found here. However, to gain some intuition as to the meaning of figure 7.1.2.1, let’s compare the equations for a hyperbola and circle:

    \[ \underbrace{(ct)^2 - x^2 = s^2}_{\text{hyperbola}} \quad \text{vs} \quad \underbrace{(ct)^2 + x^2 = s^2}_{\text{circle}} \quad \text{eq (7.1.2.1)}  \]

In both cases, the quantity s represents a distance from the origin that’s the same. In the case of the circle, s equals the circle’s radius and the invariant quantities are represented by concentric circles. With the hyperbola, graphing the invariant “distances” results in the mirror-image boomerang-shaped objects seen in the diagram. The obvious differences in the equations for the hyperbola and circle that create these differing pictures are the minus sign in the hyperbola equation versus the plus sign in the equation for the circle.

Going back to figure 7.1.2.1, we note that the red, green and blue line segments all represent spacetime intervals of 2. They don’t look the same “length” because we’re not dealing with Euclidean space; we’re dealing with Minkowski space which is a different entity. Note that the red line segment is oriented along the vertical time axis and represents an object at rest, moving in time but not in space. The green and blue segments are oriented along time axes for objects in frames of reference that are moving in space with respect to that of the red segment. The speed of the object which the blue line segment represents is moving faster than the object represented by the green segment.

With this background, we’re now ready to see how boosts (Lorentz transformations that relate coordinates moving relative to each other) can be described by rotations in the hyperbolic geometry that underlies Minkowski space.

VII.1.2.1 Lorentz Boosts as Hyperbolic Rotations
Graph of hyperbolic tangent
Figure 7.1.2.2.1

Figure 7.1.2.2.1 shows an invariant hyperbola with s=1. 2 coordinate systems are pictured: ct, \,x and ct^{\prime}, x^{\prime}. (The x^{\prime} is not shown.) We’re looking at things from the nonprimed system’s viewpoint. The primed system is in motion relative to the nonprimed system (at quite high speed, actually). The ct^{\prime} axis makes a hyperbolic angle, \phi, with the ct axis. Similar to trigonometry in Euclidean space, the point at which the ct^{\prime} axis intersects the invariant hyperbola is give by the hyperbolic trigonometric functions \sinh \phi and \cosh \phi. And like in Euclidean trigonometry:

    \[\displaystyle \tanh \phi = \frac{\sinh \phi}{\cosh \phi} \quad \text{eq (7.1.2.1.1)} \]

From the diagram, we can see that \sinh \phi = x and \cosh \phi = ct. Therefore:

    \begin{align*}  \tanh \phi &=  \frac{\sinh \phi}{\cosh \phi} \\ &= \frac{x}{ct} \\ &= \frac{1}{c} \left ( \frac{x}{t} \right) \\ &= \frac{v}{c} \quad \text{eq (7.1.2.1.2)} \end{align*}

Next, let’s go back to the equations we derived for the Lorentz transformation in matrix form. I’ll reproduce it here except I’ll call the time axes ct and ct^{\prime} instead of t and t^{\prime}. I’ll also follow the lead of many authors and define:

\displaystyle \beta = \frac{v}{c}
\displaystyle \gamma = \displaystyle \frac{1}{\sqrt{1-\beta^2}}

We end up with:

    \begin{align*}x^{\prime} =\frac{1}{\sqrt{1-\frac{v^2}{c^2}}} (x - vt)&=\gamma(x - \beta ct)\\ &=\gamma x - \gamma \beta ct \quad \quad \,\text{eq (7.1.2.1.3)}\end{align*}


    \begin{align*}x = \frac{1}{\sqrt{1-\frac{v^2}{c^2}}}(x^{\prime} + vt^{\prime}) &=\gamma (x^{\prime} + \beta ct^{\prime}) \\ &= \gamma x^{\prime} + \gamma \beta ct^{\prime} \quad \quad \text{eq (7.1.2.1.4)}\end{align*}


    \begin{align*}ct^{\prime} = \frac{1}{\sqrt{1-\frac{v^2}{c^2}}} \left( ct - \displaystyle \frac{vcx}{c^2}  \right) &= \gamma \left(ct - \displaystyle \frac{vx}{c}\right)\\ &= \gamma ct - \gamma   \beta x  \quad \quad \,\,\,\,\text{eq (7.1.2.1.5)}\end{align*}


    \begin{align*}ct = \frac{1}{\sqrt{1-\frac{v^2}{c^2}}} \left( ct ^{\prime}+ \displaystyle \frac{vcx^{\prime}}{c^2} \right) &= \gamma \left( ct ^{\prime}+ \displaystyle \frac{vx^{\prime}}{c} \right)\\ &= \gamma ct ^{\prime}+ \gamma \beta  x^{\prime} \quad \quad \,\text{eq (7.1.2.1.6)}\end{align*}

We can write these equations in matrix form:

    \[ \begin{pmatrix} ct^{\prime}\\ x^{\prime} \end{pmatrix} = \begin{pmatrix} \gamma & -\gamma \beta \\ -\gamma \beta & \gamma \end{pmatrix} \begin{pmatrix} ct\\ x \end{pmatrix} \quad \quad \,\text{eq (7.1.2.1.7)}\]

and

    \[  \begin{pmatrix} ct\\ x \end{pmatrix} = \begin{pmatrix} \gamma & \gamma \beta \\ \gamma \beta & \gamma \end{pmatrix} \begin{pmatrix} ct^{\prime}\\ x^{\prime} \end{pmatrix} \quad \quad \,\text{eq (7.1.2.1.8)}\]

Now we need to ask, Can we express \gamma and \beta in terms of hyperbolic trigonometric function? Happily, the answer is yes. We know that:

    \[ \tan \phi = \frac{v}{c} = \beta \quad \,\text{eq (7.1.2.1.9)}\]

Therefore:

    \begin{align*}  \gamma =  \frac{1}{\sqrt{1 - \displaystyle \left( \frac{v}{c} \right)^2}} &= \frac{1}{\sqrt{1 - \beta^2}} \\ &= \frac{1}{\sqrt{1 - \tanh^2 \phi}} \quad \,\text{eq (7.1.2.1.10)}  \end{align*}

Similar to 1 - \tan^2 \theta = \sec^2 \theta and \displaystyle \frac{1}{\sec \theta} = \cos \theta:

    \[ \frac{1}{\sqrt{1 - \tanh^2 \phi}} =  \frac{1}{\sqrt{\sech^2 \phi}} = \frac{1}{\sech \phi} = \cosh \phi  \quad \,\text{eq (7.1.2.1.11)}\]

So:

    \[ \gamma = \cosh \phi  \quad \,\text{eq (7.1.2.1.12)} \]

Then:

    \[ \gamma \beta = \cosh \phi \tanh \phi = \cancel{\cosh \phi}\frac{\sinh \phi}{\cancel{\cosh \phi}} = \sinh \phi  \quad \,\text{eq (7.1.2.1.13)}\]

Substituting these values into matrix equations eq (7.1.2.1.7) and eq (7.1.2.1.8), we get:

    \[ \begin{pmatrix} ct^{\prime}\\ x^{\prime} \end{pmatrix} = \begin{pmatrix} \cosh \phi & -\sinh \phi \\ -\sinh \phi & \cosh \phi \end{pmatrix} \begin{pmatrix} ct\\ x \end{pmatrix} \quad \quad \,\text{eq (7.1.2.1.14)}\]

and

    \[  \begin{pmatrix} ct\\ x \end{pmatrix} = \begin{pmatrix} \cosh \phi & \sinh \phi\\ \sinh \phi & \cosh \phi \end{pmatrix} \begin{pmatrix} ct^{\prime}\\ x^{\prime} \end{pmatrix} \quad \quad \,\text{eq (7.1.2.1.15)}\]

The matrices

    \[\begin{pmatrix} \cosh \phi & -\sinh \phi \\ -\sinh \phi & \cosh \phi \end{pmatrix}\quad \text{and} \quad \begin{pmatrix} \cosh \phi & \sinh \phi\\ \sinh \phi & \cosh \phi \end{pmatrix}\quad \,\text{eq (7.1.2.1.16)}\]

resemble the Euclidean rotation matrix and its inverse:

    \[ \begin{bmatrix}\,\,\,\,\cos\theta&\sin\theta\\-\sin\theta&\cos\theta\end{bmatrix} \quad \text{and} \quad \begin{bmatrix}\cos \theta & -\sin \theta\\ \sin \theta & \cos \theta \end{bmatrix}\quad \,\text{eq (7.1.2.1.17)} \]

Thus, the matrices shown in eq (7.1.2.1.16) are usually referred to as hyperbolic rotation matrices.

However, the geometric meaning of hyperbolic rotations are decidedly different than rotations in Euclidean space.

Euclidean vs hyperbolic axis rotation
Figure 7.1.2.2.2

Figure 7.1.2.2.2a shows a polar coordinate system in Euclidean space containing a red vector with the length, s, equal to the radius of a circle. s is an invariant in Euclidean space (i.e., x^2 + y^2 = s^2. Line segments connecting points on the circle to ends of the coordinate axes form a blue square.

Figure 7.1.2.2.2b depicts a clockwise rotation of the the coordinate axis. Note that the radius, s, remains the same. Also, the area of the square also remains unchanged (i.e., it’s the same square; it’s just rotated).

Figure 7.1.2.2.2c and figure 7.1.2.2.2d represent Minkowski space. Unit hyperbolae represent invariant “surfaces” with s = 1. In figure 7.1.2.2.2c, the green box contains blue axes and a red vector with “length” of s = 1. Figure 7.1.2.2.2d depicts a hyperbolic rotation. In contrast to the Euclidean rotation, the green box changes shape (elongates in one direction and narrows in the other). However, its area and the invariant quantity, s, remain unchanged. Note that the coordinate axes in the green box “scissor together” as a result of the hyperbolic rotation. Of course, we’ve already seen that relative motion due to a velocity boost also yields coordinate axes that scissor together, again indicating the equivalence of the Lorentz boost and hyperbolic rotation.

Note also that the determinants of both the Euclidean and hyperbolic rotation matrices equals 1, and the determinant represents the area of the blue and green polygons reflecting the invariant nature of these operations. For more information about the determinant and its geometric interpretation, click here.

VII.1.2.2 Velocity Addition

I’ve waited until now to introduce the relativistic velocity formula for reasons that will become apparent soon. We’ll start with the standard derivation of velocity addition then look at it from the perspective of hyperbolic geometry.

This derivation is taken from Khan Academy although it’s similar to derivations found elsewhere.

Velocity addition in special relativity
Figure 7.1.2.2.1

In figure 7.1.2.2.1, we have an observer, floating in space, in frame of reference S, with spatial coordinates (x,y). This observer sees a blue rocket moving to the right at relativistic speed v. We’ll call the blue rocket’s frame of reference S^{\prime} and its coordinates (x^{\prime},t^{\prime}). There’s also a red rocket moving to the left with velocity \displaystyle \frac{\Delta u}{\Delta t}=\frac{\Delta x}{\Delta t} in the S frame. What we want to find out is the velocity, \displaystyle \frac{\Delta x^{\prime}}{\Delta t^{\prime}}, seen in the S^{\prime} frame.

We start by writing the terms \Delta x^{\prime} and \Delta t^{\prime} in terms of the S frame of reference coordinates using the Lorentz transformation:

    \begin{align*} \Delta x^{\prime} &= \gamma(\Delta x - \frac{v}{c} c\Delta t) \\ &= \gamma(\Delta x - v\Delta t) \quad \text{7.1.2.2.1}\end{align*}

and

    \[  \Delta t^{\prime} = \gamma(\Delta t - \frac{v}{c^2}) \Delta x ) \quad \text{7.1.2.2.2} \]

To get the velocity in the S^{\prime} frame, we divide eq (7.1.2.2.1) by eq (7.1.2.2.2), then do some algebra:

    \begin{align*}   \displaystyle \frac{\Delta x^{\prime}}{\Delta t^{\prime}} &= \displaystyle \frac{\gamma(\Delta x - v\Delta t)}{\gamma(\Delta t - \displaystyle \frac{v}{c^2} \Delta x)} \\ & \\ &= \displaystyle \frac{\cancel{\gamma}(\Delta x - v\Delta t)}{\cancel{\gamma}(\Delta t - \displaystyle \frac{v}{c^2} \Delta x)} \cdot \displaystyle \frac{\frac{1}{\displaystyle \Delta t}}{\displaystyle \frac{1}{\Delta t}} \\ & \\ &=\displaystyle \frac{\displaystyle \frac{\Delta x}{\Delta t}-v}{1-\displaystyle \frac{v}{c^2}\cdot\displaystyle \frac{\Delta x}{\Delta t}} \quad \text{7.1.2.2.3}  \end{align*}

But \displaystyle \frac{\Delta u}{\Delta t}=\frac{\Delta x}{\Delta t}. So:

    \[ \frac{\Delta x^{\prime}}{\Delta t^{\prime}}  = \frac{\displaystyle \frac{\Delta u}{\Delta t}-v}{1 - \displaystyle \frac{v}{c^2}\cdot\displaystyle \frac{\Delta u}{\Delta t}} \quad \text{7.1.2.2.4}   \]

Now we take the limit as \Delta t \rightarrow 0:

    \[  \lim_{\Delta t \rightarrow 0} \frac{\Delta x^{\prime}}{\Delta t^{\prime}} = v^{\prime} = \displaystyle \frac{u-v}{1-\displaystyle \frac{uv}{c^2}} \quad \text{7.1.2.2.5} \]

Notice that u-v is just the Galilean velocity addition rule. In special relativity, this is modified by the factor 1-\displaystyle \frac{uv}{c^2}.

We can put in some numbers to better understand how this works. Suppose the blue rocket is moving to the right at v=0.5c and the red rocket is moving to the left at u=-0.7c. What is the velocity of the red rocket as seen by the observer in the blue rocket (v^{\prime})? In Galilean relativity, this would be u-v=0.5c-(-0.7c)=1.2c. Of course, far as we know, this is impossible since nothing has ever been observed to move faster than the speed of light. However, when we apply special relativity, we get:

    \begin{align*}v^{\prime} &= \displaystyle \frac{u-v}{1-\displaystyle \frac{uv}{c^2}}\\&=\displaystyle \frac{0.5c-(-0.7c)}{1-\displaystyle \frac{(0.5c)(-0.7c)}{c^2}}\\&=\displaystyle \frac{1.2c}{1-(-0.35)}\\&=\displaystyle \frac{1.2c}{1.35}\\&= 0.8889c\end{align*}

Relativistic velocity addition can also be achieved using hyperbolic geometry. Recall:

    \begin{align*}  \tanh \phi &=  \frac{\sinh \phi}{\cosh \phi} \\ &= \frac{x}{ct} \\ &= \frac{1}{c} \left ( \frac{x}{t} \right) \\ &= \frac{v}{c} \quad \text{eq (7.1.2.1.2)} \end{align*}

Figure 7.1.2.2.2 shows a graph of y=\tanh \phi.

Graph of tanh
Figure 7.1.2.2.2

We can see from this graph:

  • Unlike Euclidean geometry, where angles of a circle repeat after 2\pi radians, \phi goes to infinity in both the + and – directions.
  • As \phi goes to \pm \infty, \tanh \phi approaches y=1 asymptotically (i.e., tanh gets close to but never reaches y=1). This makes sense since \displaystyle \tanh \phi = \frac{v}{c}, and as we know, massive objects can get close to but never equal c.

Figure 7.1.2.2.3 is designed to help illustrate how this works.

Hyperbolic velocity addition diagram
Figure 7.1.2.2.3

In figure 7.1.2.2.3:

  • The red curves represent “unit” invariant hyperbolae where (ct)^2 - x^2 = 1.
  • The blue curves are “unit” invariant hyperbolae where -(ct)^2 + x^2 = 1.
  • The 45° and -45° diagonal black lines represent the paths of light rays.
  • The right hand margin of the green “hyperbolic triangle” represents the ct-axis of the blue spaceship traveling in the +x-direction in figure 7.1.2.2.1. It makes a hyperbolic angle of \phi_1 with the ct-axis of frame of reference S in figure 7.1.2.2.1. \phi_1 is referred to as the rapidity. Looking back at eq (7.1.2.1.2), since we’re working with a unit hyperbola, ct=1 so \phi_1 = \displaystyle \frac{x}{ct} = v=\frac{x_1}{1} = x_1.
  • Likewise, the lefthand margin of the magenta “hyperbolic triangle” represents the ct-axis of the red spaceship traveling in the -x direction in figure 7.1.2.2.1. It makes a hyperbolic angle of \phi_2 with the ct-axis of frame of reference S in figure 7.1.2.2.1. And per the same rationale as we just used, \phi_2 = u = x_2

In equation 7.1.2.2.5, we have an equation for velocity addition in special relativity:

    \[ v^{\prime} = \displaystyle \frac{u-v}{1-\displaystyle \frac{uv}{c^2}}  \]

Divide both sides of this equation by c:

    \[ \displaystyle \left(\frac{v^{\prime}}{c}\right) = \displaystyle \frac{\displaystyle \left(\frac{u}{c}\right)-\displaystyle \left(\frac{v}{c}\right)}{1-\displaystyle \left(\frac{u}{c}\right)\displaystyle \left(\frac{v}{c}\right)} \quad \text{eq (7.1.2.2.6)}   \]

But:

\displaystyle \frac{v^{\prime}}{c}=\tanh \phi_{v^{\prime}} \quad -1 \leq \displaystyle \frac{v^{\prime}}{c} \leq 1 \quad \text{eq (7.1.2.2.7)}

\displaystyle \frac{u}{c}=\tanh \phi_{u} \quad -1 \leq \displaystyle \frac{u}{c} \leq 1 \quad \text{eq (7.1.2.2.8)}

\displaystyle \frac{v}{c}=\tanh \phi_{v} \quad -1 \leq \displaystyle \frac{v}{c} \leq 1 \quad \text{eq (7.1.2.2.9)}

Substituting these values into eq (7.1.2.2.6) gives us:

    \[  \displaystyle \tanh \phi_{v^{\prime}} =  \displaystyle \frac{\tanh \phi_{u} -\tanh \phi_{v}}{1-\displaystyle \left(\tanh \phi_{u}\right)\displaystyle \left(\tanh \phi_{v}\right)} \quad \text{eq (7.1.2.2.10)} \]

However, eq (7.1.2.2.10) is just the hyperbolic tangent sum formula:

    \[  \displaystyle \tanh (\phi_{u} + \phi_{v}) =  \displaystyle \frac{\tanh \phi_{u} -\tanh \phi_{v}}{1-\displaystyle \left(\tanh \phi_{u}\right)\displaystyle \left(\tanh \phi_{v}\right)} \quad \text{eq (7.1.2.2.11)} \]

(For a proof of this formula, click here.)

This means that the hyperbolic angles \phi_{v^{\prime}}, \phi_u and \phi_v (referred to rapidities) are additive:

    \[  \phi_{v^{\prime}} = \phi_u + \phi_v  \quad \text{eq (7.1.2.2.12)} \]

We can apply these formulas to the problem we solved previously and compare the solutions. We’re given that:

v =  0.5c \,\, \Rightarrow \,\,\displaystyle \frac{v}{c}=\displaystyle \frac{0.5c}{c}=0.5

Therefore:

\tanh \phi_v = 0.5 \,\, \Rightarrow \,\, \phi_v = \text{arctanh}(0.5) = 0.5493

Similarly:

    \[v =  0.7c \,\, \Rightarrow \,\,\displaystyle \frac{v}{c}=\displaystyle \frac{0.7c}{c}=0.7\]

Therefore:

    \[\tanh \phi_v = 0.7 \,\, \Rightarrow \,\, \phi_v = \text{arctanh}(0.7) = 0.8673\]

Then:

    \begin{align*} \phi_{v^{\prime}} &= \phi_u + \phi_v  \\ &=  0.8673 +  0.5493 = 1.4166 \\ \tanh(1.4166) &= 0.8889\end{align*}

This implies that

    \[ \frac{v^{\prime}}{c}  = 0.8889 \,\, \Rightarrow \,\, v^{\prime} = 0.8889c \]

which is the same result we got from our other method of calculation.

VII.2 Proper Time

Proper time, \tau is defined as the time an observer measures in their own frame of reference. We know that in their own frame of reference, observers don’t think they’re moving. Therefore, in their own from of reference, \Delta \tau = \Delta t and \Delta x^2  = \Delta y^2 = \Delta z^2 = 0. Plugging these values into our equation for the spacetime interval, we get:

\Delta s^2 = -(c\Delta \tau)^2 + 0 + 0 + 0 \quad \text{eq (7.2.1)}

Which implies that:

\Delta s^2 = -(c\Delta \tau)^2 \quad \text{eq (7.2.2)}

And since \Delta s^2 is invariant, -(c\Delta \tau)^2 is also invariant. -1 and c^2 are constants so \tau^2 is invariant, and thus, \sqrt{\tau^2} = \tau is invariant.

VII.2.1 Twin Paradox

This discussion is adapted from Dr. John Simonetti, Dept. of Physics, Virginia Tech University, Frequently Asked Questions About Special Relativity – The Twin Paradox, 21 Oct, 1997.

Imagine twins Alice and Bob. Alice remains on earth while Bob makes a trip to a distant planet and back (figure 7.2.1.1). Since Bob is in a moving frame of reference, Alice sees Bob’s clock as ticking slower than hers. According to her, less time has elapsed for Bob during his trip. Therefore, she expects that Bob will be younger than her when he gets back. However, in special relativity, every frame of reference is equally valid. Thus, Bob, in his frame of reference sees himself at rest and the earth and distant planet as moving. Since Bob considers Alice to be in a moving frame of reference. He sees her clock as ticking slower than his, and thus, less time elapsing for Alice than for him during his trip. When he returns to earth, then, he expects that she will be younger than he is. But how can both twins, Alice and Bob, be older than the other at the same time. This seems to be a contradiction. Actually, it isn’t. Here’s why.

For convenience, let’s change the names of our twins to Unprime and Prime. Obviously, to start motion, stop motion and change directions, Prime must accelerate. Thus, technically, Prime’s frame of reference is not inertial so special relativity doesn’t apply. However, for purposes of this discussion, we’ll assume that acceleration is instantaneous so we don’t have to account for it in our time calculation. We’ll take out acceleration altogether later, but it’ll easier to understand the argument I’m about to make if we consider duration of accelerations to be negligible. Also, to make things more comprehensible, we’ll put in some numbers. Specifically, we’ll say that, according to Unprime, Prime moves at a velocity of v = 0.8c and the distant planet, Planet A, is a distance L = 1 light-year away. Now let’s consider how Unprime and Prime view 3 different events:

  • At the start of the experiment
  • When Prime reaches Planet A
  • At the finish when Prime is back on Earth
Twin Paradox
Figure 7.2.1.1
Unprime’s Observations
At the Start

At the start of our experiment, Unprime and Prime synchronize their watches. Unprime says that her clock reads t=0 and Prime says that his clock reads t^{\prime} = 0. Unprime then sees Prime speed off toward Planet A at v=0.8.

When Prime Reaches Planet A

A clock on Planet A, which is at rest in Unprime’s frame, measures \displaystyle \frac{L}{v}=\displaystyle \frac{1}{0.8}=1.25 \text{ years}. A photo is taken of this clock reading and sent, at the speed of light, to Unprime.

At Planet A, Prime takes a photo of his clock and transmits it to Unprime. The clock reads t=0.75 \text{ years}. This is because Unprime sees Prime’s clock as running slowly. Specifically,

    \begin{align*} t^{\prime}&=\frac{t}{\gamma}\\ &= \frac{L/v}{\gamma} \end{align*}

where

    \begin{align*} \gamma&=\frac{1}{\sqrt{1-\displaystyle \frac{v^2}{c^2}}}\\ &= \frac{1}{\sqrt{1-\displaystyle \frac{(0.8c)^2}{c^2}}}\\ &= \frac{1}{0.6} \end{align*}

so

    \begin{align*}t^{\prime} &= \frac{1/0.8}{\frac{1}{0.6}}\\&=(1.25)(0.6)\\&=0.75\end{align*}

At the Finish

When Prime arrives home, Unprime looks at his clock and notes that it reads t=2.5 \text{ years}. This is because, if it took 1.25 years for Prime to get to Planet A, and Prime travels the same distance back, at the same speed, then, to Unprime, the time of Prime’s return trip must be the same as his outbound trip. Thus, total time = 1.25 + 1.25 = 2.5.

Unprimed looks at Prime’s clock and notes that it reads t=1.5 \text{ years}. She is not surprised since she received the photo of Prime’s clock on Planet A which read t=0.75 \text{ years}. Because Prime traveled the same distance, at the same velocity, on his return trip as he did on his outbound trip, Unprime reasons that the return trip must also have taken 0.75 years and that the entire trip, as measured by Prime’s clock, must have taken 1.5 years.

Ultimately, the reason Unprime observes a longer time for the trip than Prime is that Prime’s clock ticked slower due to time dilation.

Prime’s Observations
At the Start

Prime, like Unprime, measures the time at the beginning of the experiment as zero (i.e. t=0 and t^{\prime}=0. Then Prime sees Unprime whizz off away from him at a velocity of v=0.8c and Planet A whizz toward him at v=0.8c.

When Prime Reaches Planet A

Or should I say, when Planet A reaches Prime. When it does, Prime notes that his clock measures t^{\prime}=0.75 \text{ years}. Why, because Planet A is moving relative to Prime, and therefore, due to length contraction, Prime sees the distance Planet A travels as:

    \begin{align*} L^{\prime}&=\frac{L}{\gamma} \\ &= \frac{1}{1/0.6} \\ &= 0.6 \text{ light-years} \end{align*}

He sees the time his clock measured for Planet A to reach him as:

    \begin{align*} t^{\prime}&=\frac{L^{\prime}}{v} \\ &= \frac{0.6}{0.8} \\ &= 0.75 \text{ years} \end{align*}

Then Prime looks at the clock on Planet A so he can take a photo of it and send it back to Unprime on earth. He’s initially puzzled to find that the clock on Planet A measures t= 1.25  \text{ years}. Then he remembers his special relativity and comes up with the following explanation:

While Unprime sees her clock as synchronized with the clock on Planet A, Prime does not. While Prime sees his clock as reading t^{\prime}=0 at the start of his journey (as does Unprime), to Prime, Planet A has instantaneously accelerated to a velocity of 0.8c. Thus, from the Lorentz transformation, he knows that the clock on Planet A reads something different. Specifically:

    \begin{align*} t_{\text{PlanetAInitial}}&=t^{\prime} + \displaystyle \frac{v}{c^2}L \\ &= 0 + \frac{vL}{c^2} \\ &= \frac{vL}{c^2} \end{align*}

To get the time he will read on Planet A’s clock when the planet gets to him, Prime knows that he must add the time that he sees Planet A as taking to reach him to Planet A’s clock’s initial reading that we just calculated:

    \[t_{\text{PlanetAatPrime}} = t_{\text{PlanetAtoPrime}} + t_{\text{PlanetAInitial}}\]

The time it takes for Planet A to get to prime is the length it has to travel divided by the velocity at which it travels. But we have to

1) adjust the distance between Prime and Planet A for length contraction i.e., Planet A is moving with respect to Prime so Prime sees the distance Planet A travels as:

    \[L^{\prime}=\frac{L}{\gamma}   \]

From this, we might expect that the travel time for Planet A to Prime is:

    \[ t_{\text{PlanetAtoPrime}} = \frac{L^{\prime}}{v} = \frac{L/ \gamma}{v}  \]

However, we also need to:

2) adjust Planet A’s travel time for time dilation i.e., Planet A is moving with respect to Prime, so Prime sees the planet’s actual travel time as:

    \[ t_{\text{PlanetToPrime}} = \frac{L^{\prime}/v}{\gamma} = \frac{(L/ \gamma)/v}{\gamma}  = \frac{L}{v} \cdot \frac{1}{\gamma^2}\]

So

    \begin{align*}t_{\text{PlanetAatPrime}} &= t_{\text{PlanetAToPrime}} + t_{\text{PlanetAInitial}} \\ &= \frac{L}{v} \cdot \frac{1}{\gamma^2} + \frac{vL}{c^2} \end{align*}

But \gamma^2=\frac{1}{1-\displaystyle \frac{v^2}{c^2}}. Thus, \frac{1}{\gamma^2} = 1-\displaystyle \frac{v^2}{c^2} and therefore,

    \begin{align*}t_{\text{PlanetAtPrime}} &= t_{\text{PlanetAtoPrime}} + t_{\text{PlanetAInitial}} \\ &= \frac{L}{v} \cdot \left( 1-\displaystyle \frac{v^2}{c^2} \right) + \frac{vL}{c^2} \\ &= \frac{L}{v} - \frac{L}{v} \cdot \frac{v^2}{c^2} + \frac{vL}{c^2} \\ &= \frac{L}{v} - \cancel{\frac{Lv}{c^2}} + \cancel{\frac{vL}{c^2}} \\ &= \frac{L}{v} \\ &= \frac{1}{0.8} \\ &= 1.25 \text{ years} \end{align*}

Of course, 1.25 years is exactly what Planet A’s clock measured in Unprime’s frame. And 0.75 years, the time Prime measures at Planet A, is exactly the time that Unprime saw Prime’s clock measuring in her frame of reference.

At the Finish

When planet Earth (with Unprime on it) reaches prime, Prime’s clock measures t^{\prime}=0.75 \text{ years}, as he expected since it took 0.75 years for the planet to reach him, and Earth’s/Unprime’s velocity traveling toward him is the same as Planet A’s speed. Therefore, the degree of time dilation Prime sees is the same as that which affected the time he observed when Planet A traveled to him.

Next, Prime looks at Unprime’s clock when Earth “re-reaches” him and sees that it measures t^{\prime}=2.5 \text{ years}. To explain this, he makes an argument similar to the one he used to find the time it took for Planet A to get to him.

Prime reasons that when the Earth (and Unprime) start their journey toward him, the clock on the planet and Unprime’s clock are not synchronized, for the same reason Prime’s and Planet A’s clocks are not synchronized at the start of the experiment. Unprime’s clock on Earth reads \displaystyle \frac{Lv}{c^2} years more than the clock on Planet A at the start of Unprime’s trip back to Prime. And the time, according to Prime, for Unprime to reunite with him is the length of the trip adjusted for length contraction, divided by the velocity of Earth/Unprime, all adjusted for the time dilation brought about by the Earth’s/Unprime’s motion relative to Prime, plus \displaystyle \frac{Lv}{c^2}. That is, it’s the same as the time interval measured after Planet A makes its trip to Prime: 1.25 years.

Thus, the total time for Planet A to get to Prime (who’s at rest in his own frame of reference) and the Earth/Unprime to return to Prime is 1.25 + 1.25 = 2.5 years, the same as the time Unprime measures for Prime to travel to and back from Planet A. This is due to a combination of time dilation and the relativity of simultaneity. In short, there is no paradox. Both Unprime and Prime agree that Prime is younger when he comes back to Earth after his journey.

It’s been said that it’s the acceleration experienced by Prime that somehow makes him age less than his twin sister. However, we can redo the above example without acceleration and get the same result. Here’s how:

Imagine Unprime sitting on Earth with a clock. Prime rushes by Unprime, on a rocket, moving at v=0.8c relative to Unprime. The rocket, for as long as we know about, has been moving at that constant velocity v. When Prime reaches Unprime, they both set their clocks to zero. Prime travels on to, then past, Planet A. At Planet A, Prime looks at his clock and notes that it reads 0.75 years, due to length contraction.

At the exact time Prime reaches Planet A, another observer, Doubleprime, flies by at a velocity v=-0.8c relative to Unprime. For as long as we know, Doubleprime has been moving at this velocity. As he passes Prime, he and Prime set their clocks to t = 0.75 years. Doubleprime continues on towards earth, then past. At the moment Doubleprime passes Unprime, they look at their clocks. They both note that Doubleprime’s clock reads 1.25 years and Unprime’s reads 2.5 years. Unprime’s explanation for the difference is time dilation and Doubleprime’s explanation is time dilation and lack of clock synchronization, just like in the example above with acceleration.

The difference in clock measurements by Doubleprime and Unprime in the latter example are not the result of acceleration (there is none in this example). Rather, it’s due to the fact that the entire experiment takes place in one inertial frame of reference for Unprime but requires two inertial frames of reference for Doubleprime (and for Prime in the previous example).

Another way to look at the reason for this difference is given by eignchris, beginning at 12:05 of his YouTube video, Relativity 105d: Acceleration – Twin Paradox and Proper Time Along Curves (Rindler Metric).

Lorentz transformations on inertial and noninertial frames of references
Figure 7.2.1.2: a) Prime and Unprimed are both in inertial frames of reference (green for Unprime, blue for Prime). Initially, Prime is in motion with respect to Unprime. A Lorentz transformation converts frames of reference such that Prime is at rest and Unprime is moving relative to him. b) Initially, Unprime is at rest (worldline along time axis). The worldline for Prime’s trip to Planet A and back to Earth is “zigzagged” (i.e., consists of 2 frames of reference). A Lorentz transformation cannot “straighten out” Prime’s worldline; it cannot make it into a single inertial frame representing Prime at rest (i.e., make Prime’s worldline run on a straight line along the time axis).

The video’s author points out that, to transform between two worldlines in special relativity, one must apply a Lorentz transformation. The assumption that leads to the twin paradox is that we can transform 1) the worldline corresponding to a frame of reference in which Unprime is at rest into 2) one where the worldline corresponds to a frame of reference in which Prime is at rest. The problem is that Prime’s worldline consists of two inertial frames of reference, a frame of reference eigenchris refers to as a “zigzag worldline.” And as eigenchris points out, a Lorentz transformation cannot “straighten out” a zigzag world line (i.e., make the entire worldline follow the course of the spacetime time axis, the type of worldline required to represent a body at rest).

VII.3 4-Velocity

This derivation is adapted from Collier, Peter. “Special Relativity.” A Most Incomprehensible Thing: Notes Towards a Very Gentle Introduction to the Mathematics of Relativity, Incomprehensible Books, 2019, pp. 119-121. This book can be found on Amazon at https://www.amazon.com/Most-Incomprehensible-Thing-Introduction-Mathematics/dp/0957389469

Consider an object moving through 3D space. The path of the object can be parameterized with each of the spatial coordinates being functions of time:

x=f(t),\,y=g(t),\,z=h(t) \quad \text{eq (7.3.1)}

We can find the velocity in each direction of space at each time by differentiating and multiplying by the basis vector in each spatial direction:

    \[V(t) = \frac{dx}{dt}\vec{e}_x + \frac{dy}{dt}\vec{e}_y + \frac{dz}{dt}\vec{e}_z \quad \text{eq (7.3.2)} \]

Notice that what we consider the “normal” velocities are components of a vector, not the vector itself. And we know that vector components usually vary with coordinate transformations (whereas the entire vector does not). Therefore, regular everyday velocities are not invariants in special relativity. However, we can create a velocity vector that is form-invariant. We do this by defining an entity called the 4-velocity which consists of the rate of change in each coordinate “direction” in special relativity (t, x, y and z = x^0, \,x^1, \,x^2 \text{ and } x^3, respectively) with respect to proper time (which we know is invariant). Thus:

    \[U^u = \displaystyle \frac{dx^u}{d\tau}  = \left(c\frac{dx^t}{d\tau},\, \frac{dx^t}{d\tau},\, \frac{dy^t}{d\tau},\, \frac{dz^t}{d\tau}\right) \quad \text{eq (7.3.3)}  \]

From our discussion of time dilation, we know that:

    \[ \Delta t = \gamma \Delta \tau  \quad \text{eq (7.3.4)} \]

That means:

    \[ x^0 = ct = c\gamma \tau  \quad \text{eq (7.3.5)} \]

Taking the derivative with respect to proper time gives:

    \[ U^0 = \displaystyle \frac{dx^0}{d\tau} = c\gamma \quad \text{eq (7.3.6)}  \]

Using the chain rule for the spatial components:

    \[ U^i = \displaystyle \frac{dx^i}{dx^0}\frac{dx^0}{d\tau} = \frac{dx^i}{dx^0}c\gamma = \frac{dx^i}{dx(ct)}c\gamma = \frac{dx^i}{cdt}c\gamma =  \frac{dx^i}{dt}\gamma \quad \text{eq (7.3.7)}   \]

\displaystyle \frac{dx^i}{dt} is the regular spatial velocity where

    \[\displaystyle \vec{v} = \left( \frac{dx^1}{dt},  \frac{dx^2}{dt},  \frac{dx^3}{dt}, \right) = (v_x,\, v_y\, v_z) \quad \text{eq (7.3.8)} \]

The components of the 4-velocity are therefore:

    \[U^u = (U^0,\, U^1,\, U^2,\, U^3) = \displaystyle \frac{dx^u}{d\tau} = (c\gamma, \gamma \vec{v}) = \gamma(c,\vec{v}) \quad \text{eq (7.3.9)}  \]

We know from tensor algebra that the inner (dot) product – which is a scalar and is, thus, invariant – is given by:

    \[ \vec{A} \cdot  \vec{B} = g_{\mu \nu}A^{\mu}B^{\nu} \quad \text{eq (7.3.10}  \]

The metric tensor, g_{\mu \nu}, for Minkowski space is \eta_{\mu \nu}.

If we take the inner product of the 4-velocity with itself, we get a scalar (which is invariant). Thus:

    \begin{align*} \vec{U} \cdot \vec{U} &= \eta_{\mu \nu}U^{\mu}U^{\nu} = -\gamma^2 c^2 + \gamma^2\left[ (v_x)^2 + (v_y)^2 + (v_z)^2 \right] \\ &= \gamma^2(v^2-c^2) \quad \text{eq (7.3.11)}  \end{align*}

But

    \begin{align*} \gamma^2 &= \left( \frac{1}{\sqrt{1-(v/c)^2}} \right)^2\\ &= \frac{1}{\sqrt{1-(v/c)^2}} = \frac{c^2}{c^2-v^2} \\ &= -\frac{c^2}{v^2-c^2} \quad \text{eq (7.3.12)}   \end{align*}

Substituting this into eq (7.3.11), we have:

    \begin{align*} \eta_{\mu \nu}U^{\mu}U^{\nu} &=  \gamma^2(v^2-c^2) \\ &= -\frac{c^2}{v^2-c^2}(v^2-c^2)  = -c^2 \quad \text{eq (7.3.13)}  \end{align*}

Of course, the speed of light, c, is invariant. Therefore, \displaystyle \vec{U} \cdot \vec{U} &= \eta_{\mu \nu}U^{\mu}U^{\nu} is invariant.

VII.4 Total Energy

This section is adapted largely from:

Collier, Peter. “Special Relativity.” A Most Incomprehensible Thing: Notes Towards a Very Gentle Introduction to the Mathematics of Relativity, Incomprehensible Books, 2019, pp. 121-123. This book can be found on Amazon at https://www.amazon.com/Most-Incomprehensible-Thing-Introduction-Mathematics/dp/0957389469

Perhaps the most important principle on which particle collider experiments are based is the invariance of total energy in all inertial frames. However, before we can talk about total relativistic energy, we need, first, to touch on a couple of preliminary topics.

VII.4.1 Relativistic Momentum

Newtonian momentum is defined as mass times velocity:

    \[ P_{\text{Newton}} = m\vec{v} \quad \text{eq (7.4.1.1)} \]

We can make this relativistic in the same way we made velocity relativistic:

    \[ \mathbf{p} = (p_x, p_y, p_z) = m\left( \displaystyle \frac{dx}{d\tau}, \frac{dy}{d\tau}, \frac{dz}{d\tau}\right)\) \quad \text{eq (7.4.1.2)}  \]

Knowing that \displaystyle \Delta \tau = \frac{\Delta T}{\gamma}, we can express \mathbf{p} in terms of coordinate time as:

    \[ \mathbf{p} = (p_x, p_y, p_z) = m\gamma\left( \displaystyle \frac{dx}{dt}, \frac{dy}{dt}, \frac{dz}{dt}\right)\) = m \gamma v  \quad \text{eq (7.4.1.3)} \]

which is invariant under Lorentz transformations in all inertial frames.

VII.4.2 Relativistic Kinetic Energy

In Newtonian mechanics, the kinetic energy of a particle is the equal to the work it takes to accelerate a particle of mass, m, from rest to a speed of v. Of course, it takes a force to do this. Thus, we can write:

    \[ W = \int_{s_0}^{s^1} F\,dx = \int_{s_0}^{s^1} ma\,dx \quad \text{eq (7.4.2.1)}  \]

    \[ ma = \frac{dv}{dt} = m\frac{dx}{dt\,dt}\,dx = m\frac{dx}{dt}\frac{dx}{dt} = p\,dv \quad \text{eq (7.4.2.2)} \]

Therefore:

    \[ W = \int_{v_0}^{v_1} p\,dv \quad \text{eq (7.4.2.3)} \]

We can now find relativistic kinetic energy (KE_{\text{rel}} as follows:

    \[ KE_{\text{rel}} = W = \int_{v_0}^{v_1} p\,d\left( \frac{v}{\sqrt{1-(v/c)^2}}\right) \quad \text{eq (7.4.2.4)} \]

We use integration by parts to help solve this integral. Recall:

    \[ \int_a^b F(x)\frac{dG}{dx}\,dx=\eval{F(x)G(x)}_a^b - \displaystyle\int_a^b G(x)\frac{dF}{dx}\,dx  \quad \text{eq (7.4.2.5)} \]

Letting:

    \[ F(x) = p = mv \quad \text{and} \quad G(x) = \frac{v}{\sqrt{1-(v/c)^2}}\]

and substituting these values into eq (7.4.2.5), we have:

    \begin{align*} KE_{\text{rel}} &= \eval{mv\left( \frac{v}{\sqrt{1-(v/c)^2}}\right)}_{v_0}^{v_1} - \int_{v_0}^{v_1} \frac{v}{\sqrt{1-(v/c)^2}}\,dp \\ &=  \eval{\frac{mv}{\sqrt{1-(v/c)^2}}}_{v_0}^{v_1} - \int_{v_0}^{v_1} \frac{mv}{\sqrt{1-(v/c)^2}}\,dv \quad \text{eq (7.4.2.6)} \end{align*}

We can solve the integral on the righthand side of eq (7.4.2.6) using the following substitution:

    \[ s = 1 - \frac{v^2}{c^2} \,\, \Rightarrow \,\, ds = \frac{-2v}{c^2}dv\]

Plugging this into the integral, we get:

    \begin{align*} m\int \frac{v}{\sqrt{1-(v/c)^2}}\,dv &= -\frac{mc^2}{2}\int \frac{1}{\sqrt{s}}\,ds \\ &= -mc^2\sqrt{s} + C \\ &= -mc^2\sqrt{1-(v/c)^2} + C \quad \text{eq (7.4.2.7)} \end{align*}

Putting this back into eq (7.4.2.6) gives us:

\displaystyle KE_{\text{rel}} = \eval{\frac{mv^2}{\sqrt{1-(v/c)^2}} + mc^2\sqrt{1-(v/c)^2}}_{v_0}^{v_1}
        \displaystyle= \eval{\frac{mv^2}{\sqrt{1-(v/c)^2}} + mc^2\sqrt{1-(v/c)^2}\frac{\sqrt{1-(v/c)^2}}{\sqrt{1-(v/c)^2}}}_{v_0}^{v_1}
        \displaystyle =\eval{\frac{mv^2 + mc^2(1-(v/c)^2)}{\sqrt{1-(v/c)^2}}}_{v_0}^{v_1}
        \displaystyle = \eval{\frac{mv^2 + mc^2 - mc^2v^2/c^2}{\sqrt{1-(v/c)^2}}}_{v_0}^{v_1}
        \displaystyle = \eval{\frac{mc^2}{\sqrt{1-(v/c)^2}}}_{v_0}^{v_1} \quad \text{eq (7.4.2.8)}

The particle is at rest when we start so v_0 = 0. The ending velocity is arbitrary so let v_0 = v. Plugging in these values yields:

    \begin{align*} KE_{\text{rel}} &= \eval{\frac{mc^2}{\sqrt{1-(v/c)^2}}}_{v_0}^{v_1} \\ &=  \frac{mc^2}{\sqrt{1-(v_1/c)^2}} - \frac{mc^2}{\sqrt{1-(v_0/c)^2}} \\ &= \frac{mc^2}{\sqrt{1-(v/c)^2}} - \frac{mc^2}{\sqrt{1-(0/c)^2}} \\ &=  \frac{mc^2}{\sqrt{1-(v/c)^2}} - \frac{mc^2}{\sqrt{1}}  \\ &= mc^2\left( \frac{1}{\sqrt{1-(v/c)^2}} - 1 \right) \\ &= (\gamma -1)mc^2 \\ &= \frac{mc^2}{\sqrt{1-(v/c)^2}} - mc^2  \quad \text{eq (7.4.2.9)} \end{align*}

As Collier points out in this book, eq (7.4.2.9) looks nothing like KE_{\text{Newton}} = \frac12 mv^2, the expression for kinetic energy in Newtonian mechanics. Note, however, that in the expression

    \[ \frac{mc^2}{\sqrt{1-(v/c)^2}} - mc^2\]

if v \ll c – as is the case under conditions in which Newtonian mechanics are applicable – then the term \displaystyle \sqrt{1-(v/c)^2}} is slightly less than 1, which means

  • the term \displaystyle \frac{mc^2}{\sqrt{1-(v/c)^2}} is only slightly > 1
    • which means
  • the expression \displaystyle \frac{mc^2}{\sqrt{1-(v/c)^2}} - mc^2, which is KE_{\text{rel}}, is minuscule when compared with mc^2

Collier, however, goes on to reconcile the relativistic kinetic energy equation with the Newtonian kinetic energy equation. Here is his proof:

He starts by expanding the Lorentz factor, \gamma, via Taylor’s theorem:

    \[ \gamma = \frac{1}{\sqrt{1-(v/c)^2}} = 1 + \frac{1v^2}{2c^2} + \frac38\left( \frac{v^2}{c^2}\right)^2 + \dots \quad \text{eq (7.4.2.10)} \]

Thus,

    \[ KE_{\text{rel}} = \left[ \left(  1 + \frac{1v^2}{2c^2} + \frac38\left( \frac{v^2}{c^2}\right)^2 + \dots \right)-1 \right]mc^2 \quad \text{eq (7.4.2.11)}  \]

In the nonrelativistic setting where Newtonian mechanics applies, v \ll  c, and therefore, \displaystyle \frac{v^2}{c^2} terms raised to powers higher than 1 will be close to zero and can be ignored. This leads to:

    \begin{align*} KE_{\text{rel}} &\approx \left[\left( 1 + \frac{1v^2}{2c^2} \right) - 1\right]mc^2 \\ &\approx mc^2 + \frac{mc^2v^2}{2c^2} - mc^2 \\ &\approx \frac12 mv^2 \approx KE_{\text{Newton}} \quad \text{eq (7.4.2.12)}  \end{align*}

VII.4.3 Relativistic Total Energy

At this point in Collier’s book, the author writes

    \[ E = \gamma mc^2 = \frac{mc^2}{\sqrt{1-(v/c)^2}} = KE_{\text{rel}} + mc^2 \quad \text{eq (7.4.3.1)}  \]

(were E is the total relativistic energy) but gives no explanation where E comes from. In their book, Susskind and Friedman derive the expression for E from first principles using Lagrangian and Hamiltonian mechanics. If desired, the reader can find a summary of their arguments .

Now consider a particle or other object at rest. “At rest” means v=0. Our expression for total energy becomes:

    \begin{align*} E = \gamma mc^2 &= \frac{mc^2}{\sqrt{1-(v/c)^2}} = KE_{\text{rel}} + mc^2  \\ &= \frac{mc^2}{\sqrt{1-(v/c)^2}} - mc^2 + mc^2 \\ &= \frac{mc^2}{\sqrt{1-(0/c)^2}} - mc^2 + mc^2 \\ &= mc^2 - mc^2 + mc^2 \quad \text{eq (7.4.3.2)}   \end{align*}

So we’re left with:

    \[E=mc^2 \quad \text{eq (7.4.3.3)} \]

which, of course, is Einstein’s famous mass-energy equation. In other words, mc^2 is the resting energy of an object, the energy maximum energy that could be extracted from the object or particle if its mass were all turned into energy. That is, mass and energy are interchangeable.

VII.5 4-Momentum

If we multiply the 4-velocity by mass, we get another invariant: 4-momentum:

    \[  P^u=mU^u \quad \text{eq (7.5.1)} \]

Given that 4-velocity is:

    \[U^u=(\gamma c,\gamma\vec{v}) \quad \text{eq (7.5.2)} \]

When we multiply by m, we get:

    \[ P^u=(\gamma mc,m\gamma\vec{v}) \quad \text{eq (7.5.3)} \]

But \displaystyle E=mc^2 and the relativistic momentum (Lorentz-invariant spatial components of momentum) is \displaystyle \mathbf{p}=m\gamma\vec{v}

Therefore,

    \[ P^0=\gamma mc=\frac{mc^2}{c}=\frac{E}{c} \quad \text{eq (7.5.4)} \]

and

    \[ P^u=\left( \frac{E}{c},\vec{p} \right) =  \left( \frac{E}{c},p_x, p_y, p_z\right) \quad \text{eq (7.5.5)}  \]

4-momentum is especially important in general relativity, in the energy-momentum tensor, which is a measure of the rate flow per unit area of 4-momentum.

VII.6 4-Force

As you might expect, we can create a Lorentz invariant force 4-vector by taking the derivative of the 4-momentum with respect to proper time:

    \[ F^u=\frac{dP^u}{d\tau} \quad \text{eq (7.6.1)} \]

VII.7 Energy-Momentum Relation

In the section on 4-vectors, in eq (7.3.13), we showed that the inner product of the 4-vector with itself is -c^2:

    \[ \eta_{\mu \nu}U^{\mu}U^{\nu} = -c^2 \quad \text{eq (7.3.13)}  \]

Since P^u=mU^u

    \[\displaystyle \eta_{\mu \nu}P^{\mu}P^{\nu} = m^2\eta_{\mu \nu}U^{\mu}U^{\nu} =-m^2c^2 \quad \text{eq (7.7.1)} \]

However, we can also find the inner product of the 4-momentum directly as:

    \begin{align*} \eta_{\mu \nu}P^{\mu}P^{\nu}  &= \begin{bmatrix} \frac{E}{c}&p_x&p_y&p_z \end{bmatrix} \begin{bmatrix} -1&0&0&0\\0&1&1&1\\0&0&1&0\\0&0&0&1\end{bmatrix} \begin{bmatrix} \frac{E}{c}\\p_x\\p_y\\p_z \end{bmatrix} \\  &= -\frac{E^2}{c^2} + p_x^2+p_y^2+p_z^2 \\ &= -\frac{E^2}{c^2} + \mathbf{p} \quad \text{eq (7.7.2)}  \end{align*}

Equating the two equations, we have:

    \[ -\frac{E^2}{c^2} + \mathbf{p} = -m^2c^2 \quad \text{eq (7.7.3)} \]

Rearranging:

    \[ \frac{E^2}{c^2}=\mathbf{p} + mc^2 \quad \text{eq (7.7.4)} \]

and finally:

    \[E^2 = \mathbf{p^2}c^2 + m^2c^4 \quad \text{eq (7.7.5)} \]

or

    \[E=\mathbf{p}c+mc^2  \quad \text{eq (7.7.6)} \]

We can see from eq (7.7.6) that:

If a massive particle is at rest (i.e., v=0 \,\Rightarrow\,\mathbf{p}c=0) then E=mc^2.

But if a particle is massless, like a photon, (i.e., m=0\,\Rightarrow\,mc^2=0) then its energy is given by E=\mathbf{p}c.

VIII. EM and Special Relativity Revisited

In the introduction to this subject, we identified a problem that arises with electromagnetism in the setting of nonrelativistic classical physics. With the background we’ve laid in this article, we’re now in a position to see how special relativity solves this problem. I’ll reproduce figure 1.3 to help remind us of what that problem is.

Diagram showing problem posed by EM for Galilean relativity
Figure 1.3

From the frame of reference S, at rest with respect protons in a wire, a test electron moving to the right experiences a downward magnetic force. However, in frame of reference S^{\prime} moving at the same velocity as electrons in the wire, the test electron feels no force. This posed a dilemma for physicists since one of the most fundamental principles of physics is that the laws of physics should be the same in all frames of reference. So how does special relativity fix this?

How special relativity fixes a problem in electromagnetism
Figure 8.1

Figure 8.1 will help illustrate the answer. When special relativity is being considered, things are the same as in figure 1.3 in the S frame of reference. We would expect some length contraction in the green electrons in relative motion and it doesn’t look like there is any in the diagram. However, we assume that if the electrons were at rest with respect to the S frame of reference, they would have been bigger. In fact, when we look at frame S^{\prime} in the setting of special relativity, we see this.

In frame S^{\prime}, our frame of reference is moving with the same rightward velocity as the red test electron and green electrons in the wire. Thus, these electrons appear to be at rest in this frame. Accordingly, the electrons in the wire are greater in diameter than they are in frame S (i.e., they return to their normal diameter). However, in frame S^{\prime}, the protons have relative velocity v to the left. Because they’re moving relative to an observer in this frame, to such an observer, their diameter is smaller than in frame S.

The effect of the larger diameter in the green electrons is that they become more spread out, decreasing their density, and thus, decreasing their charge density. On the other hand, the smaller diameter of the moving protons causes them to crowd together more. This crowding increases their charge density. The net effect is to produce a positive charge in the segment of wire we’re considering. And associated with this new positive charge density is a new electric field. This causes an attractive force on the test electron, pulling the test electron toward the wire. Note that the moving protons still cause a magnetic field. However, because the test electron is not moving in this frame of reference, it experiences no magnetic force.

The question then becomes, Does the downward force on the test electron caused by the electric field in frame S^{\prime} equal the downward force on the test electron caused by the magnetic field in S? Or put another way, is the path of the test electron the same in both frames? In fact, the answer to both of these questions is “yes.” The math involved in proving this is somewhat tedious. However, for those interested in seeing this proof, click .

We’ve considered simple cases in which the test electron is either completely at rest with respect to electrons in the wire or moving exactly at the velocity of the wire’s electrons, exactly perpendicular to any magnetic field generated by the wire. In other frames of reference where these simple conditions are not met, a more complicated combination of electric and magnetic fields exert force on the test electron. Indeed, a more complex mathematical object, the electromagnetic field tensor, is needed to capture this more complex physical behavior. I won’t address this issue here but I hope, in the future, to create a separate page on electromagnetism that does discuss this subject.

IX. Experimental Confirmation

The experimental evidence confirming the predictions of special relativity is overwhelming. No experiment has ever been performed which contradicts the predictions of special relativity. References that summarize some of the evidence supporting special relativity are:

Tom Roberts and Siegmar Schleif, What is the experimental basis of Special Relativity?, 2007, accessed 13 January 2024.

Wikipedia, Tests of special relativity, accessed 13 January, 2024.

Three of the classic experiments confirming the predictions of special relativity are described below.

IX.1 Alväger et al (Speed of Light is Always c)

In 1964, T. Alväger et al, working at CERN, published an often-cited paper that proved the speed of light is always measured as c (= 2.99792458 x 1010 cm/s). Figure 9.1.1. is a schematic of the experimental setup. The actual details of the experiment are more complicated but figure 9.1.1 conveys the general idea.

Schematic of experimental setup of Alvager et al paper that proved speed of light is constant.
Figure 9.1.1

Protons were accelerated to high energy and used to bombard a beryllium plate, producing pi mesons. The resulting pi mesons (made up of a quark and an antiquark) had energies of > 6 GeV and traveled at a speed of v = 0.99975 c. The produced particles can be positively charged, negatively charged or electrically neutral. Magnets were used to sweep away all the charged varieties (not shown) leaving only neutral pi mesons (π0). These particles decay quickly into high energy photons in the gamma range. (The average lifetime of π0 particles is ~8.5×10−17 seconds.) The photons are detected on a screen. Since 1) the distance between the beryllium target and the detector screen and 2) the time elapsed between beryllium target bombardment and photon detection, both were known, the speed of the photons could be calculated.

Nonrelativistic physics predicted that the speed of the photons should have been 0.99975 c + 1c = 1.99975 c. Special relativity predicted that the observed photon speed should have been c. To see this calculation, click .

So what did Alväger et al find? They measured the velocity of the photons as 2.9977 ± 0.0004 × 1010 cm/s which is well within limits of error of the known value of c.

IX.2 Muons (Time Dilation/Length Contraction)

A muon is a fundamental particle similar to an electron with an electric charge of -1 and a spin of 1/2 but with a much greater mass. Muons are formed when high-energy protons and atomic nuclei that move through space at near the speed of light collide with particles in our upper atmosphere, about 10,000 meters above the Earth. They move at a velocity of 0.98c. Therefore, one would expect that it would take 34 x 10-6 seconds to travel from a height of 10,000 m to the Earth’s surface:

    \begin{align*} T &= \frac{10^4 \text{ m}}{(0.98)(3\times 10^8 \text{m/s})}  \\ &= 34 \times 10^{-6} \text{ s}  \quad \text{(9.2.1)}\end{align}

where

T is the time it takes to get from 10,000 m in the sky to the ground
m is meters
s is seconds

Muons are short-lived. Their half-life, as measured in the laboratory, is 1.56 x 10-6 seconds. That means that, after 1.56 x 10-6 s, one-half of the muons decay (into an electron and two kinds of neutrinos), leaving one-half of the original number of muons intact. So, if we start with 1000 muons, after 1.56 x 10-6 s, 500 muons will have decayed and we’d be left with 500 muons. After another 1.56 x 10-6 s, half of 500 (i.e. 250) muons will decay and 250 muons will remain. After another 1.56 x 10-6 s, half of 250 (i.e. 125) muons will decay and 125 muons will be left, and so on. We can come up with an equation for this behavior as follows:

    \begin{align*} \text{# of muons left }  (N) &= \text{# of muons we start with } (N_0) \times (\frac12)(\frac12)(\frac12) \\ &= 1000(\frac12)(\frac12)(\frac12) \\ &= 1000(\frac18) \\ &= 125   \quad \text{(9.2.2)}\end{align*}

We can generalize this relationship as:

    \[  N = N_0(\frac12)^h = N_0(2)^{-h}   \quad \text{(9.2.3)}\]

where h is the number of half-lives that have elapsed.

We ultimately want to know how many muons make it to the ground. We’ll calculate this number under 3 conditions:

  • Using non-relativistic physics, from an earth-bound observer’s frame of reference
  • Using special relativity, from an earth-bound observer’s frame of reference
  • Using special relativity, from a frame of reference of an observer moving with a muon

Non-Relativistic, Earth Frame of Reference

Muon decay - nonrelativistic calculation, observer on Earth
Figure 9.2.1

We know the time it takes to get from 10,000 m in the sky to the ground. We also know how long the half-life of a muon is (as measured in a laboratory, on the ground). Therefore, we can figure out how many half-lives it takes for a muon to hit the ground:

    \begin{align*} \frac{\text{time to ground}}{\displaystyle \frac{\text{time}}{\text{half-life}}}  &= \frac{34 \times 10^{-6} \text{ s}}{1.56 \times 10^{-6} \, \displaystyle \frac{\text {s}}{\text{half-life}}} \\ &= 21.8 \text{ half-lives}   \quad \text{(9.2.4)}\end{align*}

Thus, if we start with 1,000,000 muons, then after 21.8 half-lives:

    \begin{align*} N &= N_0(\frac12)^h = N_0(2)^{-h} \\ &= 1,000,000(2)^{-21.8} \\ &= 0.27 \times 10^{-6}   \quad \text{(9.2.5)} \end{align*}

From these calculations, we’d expect to detect only about 0.3 out of every 1 million (or about 1 out of every 3.7 million) muons produced.

Relativistic, Earth Frame of Reference

Muon decay - relativistic calculations, Earth observer
Figure 9.2.2

Since the muons are moving relative to an observer on the Earth, the observer on the Earth sees the “internal clocks” of the muons as running slower than his clock. Thus, the time before a muon decays (which determines the muons half-life) should take longer. Specifically:

    \begin{align*}    T1/2_{\text{relativistic}}&=\gamma T1/2_{\text{nonelativistic}} \\ &= \frac{1}{\sqrt{1 - \displaystyle \frac{v^2}{c^2}}} (1.56 \times 10^{-6} \text{s}) \\ &= \frac{1}{\sqrt{1 - \displaystyle \frac{(0.98c)^2}{c^2}}} (1.56 \times 10^{-6} \text{s}) \\ &= \frac{1}{\sqrt{1 - 0.9604}} (1.56 \times 10^{-6} \text{s}) \\ &= \frac{1}{\sqrt{0.0396}} (1.56 \times 10^{-6} \text{s}) \\ &= \frac{1}{\sqrt{0.0396}} (1.56 \times 10^{-6} \text{s}) \\ &= (\sim 5) (1.56 \times 10^{-6} \text{s}) \\ &\approx 7.8  \times 10^{-6} \text{s}   \quad \text{(9.2.6)} \end{align*}

Recall, it takes 34 \times 10^{-6} \text{ s} for a muon to traverse the 10,000 m from the atmosphere to the ground. Therefore, the number of half-lives elapsed during this trip is:

    \[ \frac{34 \times 10^{-6} \text{ s}}{\sim 7.8 \times 10^{-6} \, \displaystyle \frac{\text {s}}{\text{half-life}}} =  4.36 \text{ half-lives}   \quad \text{(9.2.7)}\]

Thus, the number of muons that reach the ground per million when time dilation is considered is:

    \begin{align*} N &= N_0(2)^{-h} \\ &= 1,000,000(2)^{-4.36} \\ &\approx 49,000   \quad \text{(9.2.8)}\end{align*}

Relativistic, Muon Frame of Reference

Muon decay - relativistic calculations, observer moving with muon
Figure 9.2.3

In the frame of reference of an observer traveling with the muons, the muons are at rest and the Earth is rushing toward them at 0.98c. Since we’re working under the assumptions of special relativity, the length that the earth must travel to reach the muon, from the muon’s point of view, is length-contracted. In particular:

    \begin{align*}    L &= \frac{L_0}{\gamma} \\ &= \frac{10^4 \text{ m}}{\displaystyle \frac{1}{\sqrt{1 - \displaystyle \frac{v^2}{c^2}}}} \\ &= 10^4 \text{ m} \sqrt{1 - \displaystyle \frac{v^2}{c^2}} \\ &= 10^4 \text{ m} \sqrt{1-0.9604} \\ &= 10^4 \text{ m} \sqrt{0.0396} \\ &\approx 10^4 0.2 \text{ m} \\ &= 2000  \text{ m}   \quad \text{(9.2.9)} \end{align*}

The time it takes for the Earth to get to the muons is:

    \begin{align*}  T &= \frac{\sim 2000 \text{ m}}{(0.98)(3\times 10^8 \text{m/s})} \\ &\approx 6.8 \times 10^{-6} \text{ s}   \quad \text{(9.2.10)}\end{align*}

The number of half-lives this time represents is:

    \[ \frac{6.8 \times 10^{-6} \text{ s}}{1.56 \times 10^{-6} \, \displaystyle \frac{\text {s}}{\text{half-life}}} \approx 4.38 \text{ half-lives}   \quad \text{(9.2.11)}\]

The number of muons that survive when the Earth gets to them, then, is:

    \begin{align*} N &= N_0(2)^{-h} \\ &= 1,000,000(2)^{-4.36} \\ &\approx 49,000  \quad \text{(9.2.12)} \end{align*}

So have experiments been done to test whether muons behave according to the non-relativistic or relativistic paradigm? They have. The first of these, published in 1941, is described below.

Rossi and Hall Experiment

In 1941, Bruno Rossi and David B. Hall measured the number of muons detected 1) on the top of Mount Washington in New Hampshire, a peak that’s 2000 m high, and 2) on the ground below. They detected 570 muons per hr on the mountain top. Using calculations identical to those presented above, they predicted that they would measure approximately 25 muons per hour on the ground when they used the non-relativistic methods. Using special relativity, they predicted that they would detect about 400 muons per hour on the ground. What did they find? Results agreed with special relativity. Numerous experiments similar to this have been performed since. All agree with special relativity.

IX.3 Hafele and Keating (Twin Paradox)

In a 1971 paper, J.C. Hafele and R. E. Keating, Science 177, 166 (1972), describe their experiment in which they loaded 4 ultra-accurate cesium atomic clocks onto 2 planes, flew them around the world in opposite directions, then compared the clocks’ readings to 4 cesium clock remaining on the Earth, at the United States Naval Observatory. One of the planes flew eastward (Plane A, red in figure 9.3.1), with the direction of the Earth’s rotation. The other flew westward (Plane B, blue in figure 9.3.1), against the Earth’s rotational motion. The speed of the Earth’s rotation is 463 m/s. The eastbound plane speed was 685 m/s. The westbound plane speed was 241 m/s.

Schematic of Hafele & Keating experiment
Figure 9.3.1

Due to time dilation, it was expected that because

  • Plane A flew at higher velocity relative to the ground clocks, its clocks would tick slower than the ground clocks and thus would read an earlier time than the ground clocks
  • Plane B flew at lower velocity relative to the ground clocks, its clocks would tick faster than the ground clocks and thus would read a later time than the ground clocks

However, another factor had to be considered: time dilation due to general relativity. The motion of the planes was further away from the Earth’s center than the clocks on the Earth’s surface. Therefore, clocks on the plane were expected to tick faster than the clocks on the Earth’s surface. This factor had to be adjusted for when examining results.

Results of the experiment are given in Table 9.3.1:

Time Difference (compared with earthbound clocks) in nanoseconds


Eastward (Plane A)Westward (Plane B)
PredictedGravitational144 +/- 14179 +/- 18
Kinematic-184 +/-1896 +/- 10
Net effect-40 +/- 23275 +/- 21
Observed
-59 +/- 10273 +/- 21
Table 9.3.1

In the table, a positive time difference means the clocks ticked faster than the clocks on Earth, and therefore, would show a later time than the earthbound clocks after the trip. A negative time difference means the clocks ticked slower than the clocks on Earth, and thus, would show an earlier time than the earthbound clocks after the trip. As noted above, it was calculated that the Earth’s gravity should make the clocks on both planes tick slower than earthbound clocks. Therefore, the effect due to gravity on both planes’ clocks was positive. The term”kinematic” in the table refers to special relativity effects. Within limits of error, the observed effects due to special relativity (i.e., the net effects) agreed with predictions (i.e., the net effects).

Thus, this experiment proved 1) that the time dilation predicted by special relativity is real and 2) that an object making a round trip will come back to an initially same-aged object that remained at relative rest and will be younger than the stationary object (i.e., it proved that the twin paradox is not a paradox).

Numerous additional experiments have since corroborated these results. In fact, these results are confirmed everyday as GPS devices depend on exactly these effects to function properly.