This explanation of the basic concepts of special relativity (for relativistic rockets part 2) has grown to the point that it needs to be brutally excised and stand on its own.

So, read on to learn about spacetime diagrams, worldlines, Lorentz transformations, four-vectors, metric tensors and some very powerful notation. Which I’m not sure if I’ll actually need for the relativistic rockets posts! But let’s put it all together so I can link back to it if necessary.

Posting this now also gives me a chance to clarify places where I’ve been unclear sooner. So ask if something seems confusing!

Let’s begin.

In Newtonian physics, time is the same for everyone no matter what. You follow a path through space as a function of that absolute time.

In special relativity, time is different to every observer! Oh dear. To accomodate all the coordinate transforms we might need, we’re going to have to put a time axis in all our diagrams, making a spacetime diagram.

Illustrations of four worldlines on a spacetime diagram: a red vertical line labelled stationary, a blue diagonal line labelled constant velocity, a green line labelled accelerated, and a 45 degree diagonal line labelled light.

Every spacetime diagram depicts a frame of reference, roughly the perspective of a particular observer or a system of coordinates - it’s easier to explain them by showing them in use. The paths of particles are represented as curves through spacetime, called world lines.

Spacetime diagrams aren’t unique to special relativity. You could just as happily draw one for Newtonian physics! It’s just less informative, really.

Suppose you look at things in a different frame of reference, the perspective of a different observer moving at a constant speed in the positive \(x\) direction relative to the first. Working out the spacetime positions of the events in this other frame is effectively the same as drawing a new coordinate system over our spacetime.

Conventionally we use primes (\(’\)) to distinguish reference frames, so the white frame is called \(S\) with coordinates \(x\) and \(t\), and the blue frame is called \(S’\) with coordinates \(x’\) and \(t’\).

Illustration of two spacetime coordinate systems, related by a Lorentz transformation. A white grid, labelled S, with horizontal axis labelled x and vertical axis labelled ct, has perpendicular lines. A blue grid, labelled S-prime, with axes x-prime and ct-prime, has the horizontal lines skewed upwards, and the vertical lines skewed to the right.

The blue lines show lines of constant \(x’\) (events which happen at the same location relative to the observer in \(S’\)) and constant \(t’\) (events which the observer in \(S’\) would consider to be simultaneous).

Note that the direction of \(x\) and \(x’\) in 3D space is the same. All that’s different between the frames is the relationship between space and time.

For contrast, let’s use the rules of Newtonian physics instead. In Newtonian physics, the space axis doesn’t move, but the time axis becomes more slanty (it’s a shear transformation)…

Illustration of two spacetime coordinate systems, related by a Galilean transformation. A white grid, labelled S, with horizontal axis labelled x and vertical axis labelled ct' has perpendicular lines. A red grid, labelled S-prime, with axes x-prime and ct-prime, has the horizontal lines the same, but the vertical lines skewed to the right.

This makes intuitive sense (our intuitions being very Newtonian): if someone is moving, as they pass by various things, those things are in the same place as them, so in their frame \(S’\) those things will have \(x’=0\). So if we draw the line \(x’=0\) (the time axis), it will have to be slanted to pass by each of those things at the right time.

Returning to the relativistic case, we can plot the same diagram so that the \(S’\) frame’s axes are orthogonal. This is the spacetime diagram the observer in \(S’\) would draw.

Illustration of two spacetime coordinate systems, related by a Lorentz transformation, but this time with S-prime treated as the rest frame. A blue grid, labelled S-prime, with horizontal axis labelled x-prime and vertical axis labelled ct-prime, has perpendicular lines. A blue grid, labelled S, with axes x and ct, has the horizontal lines skewed downwards, and the vertical lines skewed to the left.

OK, so what do those world lines look like in the \(S’\) frame, relativistically? Let’s have a look…

Illustration of the four world lines in the first picture, but transformed by a Lorentz transform in the positive x direction. See text for details.

We see…

OK, so, in summary, in Newtonian physics, we’d express object paths as a function of absolute time. In relativity, there’s no absolute time, just local time in different reference frames, called the coordinate time in each frame.

Proper time

The other important kind of time is the proper time, \(\tau\), for a particular object. This is the amount of time experienced by the particle as it traverses its path through spacetime. The proper time between two events on an object’s worldline can by calculated in any reference frame as the integral of the interval along the path.

The interval is a very useful quantity in special relativity, because it doesn’t change between inertial reference frames (we say it’s Lorentz invariant).

Let’s take take two events (points in spacetime). Say they have Cartesian coordinates \((t_0,x_0,y_0,z_0)\) and \((t_1,x_1,y_1,z_1)\) in some reference frame. The interval (in any reference frame) is given by (assuming a -+++ sign convention) $${\Delta s}^2 = -c^2{\Delta t}^2 + {\Delta x}^2 + {\Delta y}^2 + {\Delta z}^2$$ (where \(\Delta x = x_1 - x_0\) etc.)

Suppose you look at the same two events in the coordinates of a different inertial reference frame: the interval you calculate will be the same.

Why is this the same as the proper time? Well, in the particle’s rest frame, by definition \(\Delta x = \Delta y = \Delta z = 0\), so the interval is just \(-c^2 {\Delta t}^2\). If the particle’s accelerating, that will only be its rest frame for an instant, so you have to add up lots of little intervals for short bits of line to get the total interval.

Because of this, it’s natural to write particle paths by expressing each time and space coordinate as a function of the proper time.

Four-vectors!

There’s a bit more mathematical apparatus to show you before we can fly our spaceship. These are four-vectors.

In Newtonian mechanics, we make a lot of use of Euclidean vectors, which are objects with a magnitude and a direction. When you transform your Euclidean space (rotate it, scale it, whatever) vectors all get transformed in the same way.

In special relativity, we have basically the same idea, but because we are working in a four-dimensional spacetime instead of a three dimensional space, the vectors have an extra component corresponding to the time dimension. The defining property is that they all transform the same way under Lorentz transformations.

Four-vectors also have a magnitude, calculated in the same way as the interval: if you have a fourvector \(\mathbf{A}\) with components \(A^\mu = (A^0,A^1,A^2,A^3)\) then the magnitude squared is $$|\mathbf{A}|^2=A^\mu A_\mu=\eta_{\mu\nu} A^\mu A^\nu=-(A^0)^2+(A^1)^2+(A^2)^2+(A^3)^2$$Just like the interval, the magnitude of a four-vector doesn’t change under Lorentz transformations.

What’s with those funny superscript and subscript \(\mu\)s and \(\nu\)s? Those are the coordinate indices. In Euclidean space, you sometimes end up writing coordinates like \(x_1\), \(x_2\), \(x_3\) with an index in subscript instead of \(x\), \(y\) and \(z\). This lets you write things like $$r^2=\sum_{i=1}^3 {x_i}^2$$ instead of $$r^2=x^2+y^2+z^2$$, and comes in handy when you have other vectors like velocity and acceleration.

Using indices is standard in relativity, because often we want to add things over the quantities. In fact, since we do that literally all the time, we have a convention where we don’t even bother to write the summation signs. If you have two copies of the same index in a product of some quantities, and one is a subscript (like \(x_\mu\)) and one is a superscript (like \(x^\mu\)) that indicates that you need to add up the values of that product for index 0, 1, 2 and 3. This is called the Einstein summation convention. By convention, for spacetime indices like this, we use Greek letters (typically \(\mu\), \(\nu\), \(\lambda\)), and for 3-dimensional space indices, we use Latin letters (typically \(i\),\(j\),\(k\)).

So, for example, if you have some quantity \(x_\mu x^\mu\), that means the sum $$x_\mu x^\mu = \sum_{\mu=0}^3 x_\mu x^\mu = x_0 x^0 + x_1 x^1 + x_2 x^2 + x_3 x^3$$

The superscript and subscript indicate the difference between ‘covariant’ and 'contravariant’ components. Most of the time we don’t use those motuhfuls though, and just talk about 'downstairs’ and 'upstairs’ indices. That’s very important in general relativity, and more important if you’re building this rigorously from the ground up.

For our purposes, things are much simpler! We just need to know that you can raise or lower an index by multiplying it by a thing called the metric tensor which (in special relativity) is written \(\eta_{\mu\nu}\) or \(\eta^{\mu\nu}\). With the -+++ sign convention, the components (in both cases) have the value 0 if \(\mu\neq\nu\), -1 if \(\mu=\nu=0\) and +1 if \(\mu=\nu\neq0\). And we have, in general, \(x_\mu = \eta_{\mu\nu}x^\nu\) and \(x^\mu = \eta^{\mu\nu}x_\nu\).

To expand that out more clearly, what this amounts to is that \(x_0=-x^0\), \(x_1=x^1\), \(x_2=x^2\) and \(x_3=x^3\).

Another important thing about the metric tensor: \(\eta_{\mu\nu} \eta^{\mu\nu} = (-1)\times(-1) + (1\times1) + (1\times1) + (1\times1)=4\).

ok but why

The reason to use this notation is that, because of the way covariant components and contravariant components transform under the Lorentz transform, any expression you write following the rules will be true in every inertial reference frame, not just a particular frame!

The important rule: Every index…

either

or

So for example, here’s a complicated expression that obeys the rules…
$$A^\alpha B^\beta C^\gamma = {D^\alpha}_\delta E^\delta F^{\beta\gamma} + E^\alpha G^{\beta\epsilon} {H_\epsilon}^\gamma$$

All the terms have upstairs \(\alpha\), \(\beta\) and \(\gamma\) free indices, and every other index is in a matched upstairs/downstairs pair in the same term. (If we wanted to, we could use the same symbol for both pairs of dummy indices in the two different terms, or we could use different symbols to make it extra clear they’re distinct).

Here’s another example (the formula for Christoffel symbols, which you might use in general relativity - I’m not going to try to explain what that is used for here!):$$\Gamma^\lambda{}_{\mu\nu} = \frac{1}{2}(g^{\lambda \xi}\partial _\nu g_{\xi\mu} + g^{\lambda \xi}\partial _\mu g_{\xi\nu} - g^{\lambda \xi}\partial_\xi g_{\mu\nu} )$$

Now you know roughly what a four-vector is and the notation that we use for them. That’s not the same as getting familiar with using this notation, for which you would need to practice a lot.

OK, so, I’m defining four-vectors here because it will likely feel natural to use four-vector notation in upcoming relativistic rockets posts. I don’t know yet! This might not be needed at all. But you’ve learned something about relativity I hope? Hmm.