Tensors

Table of Contents

Preface

These are my notes on tensors. Like many pages on my site, they are not meant to be an exhaustive treatise, but instead, emphasize topics – including proofs – that I would like to remember or that are needed for other pages on the site.

This page draws from a number of resources. Here are the main three:

  • eigenchris. Tensors For Beginners . YouTube, 10 Dec. 2017, https://www.youtube.com/watch?v=8ptMTLzV4-I&list=PLJHszsWbB6hrkmmq57lX8BV-o-YIOFsiG&index=1. : This is an free 18 part video series that goes into considerable detail regarding the tensor theory. However, the explanations are excellent, moving methodically through topics step-by-step. It does, however, assume basic familiarity with linear algebra.
  • Collier, Peter. A Most Incomprehensible Thing. Incomprehensible Books, 2017. : Directed toward the interested amateur. Step-by-step derivations. Bargain price – Amazon Kindle version for $3.99.
  • Fleisch, Daniel A. A Student’s Guide to Vectors and Tensors. Cambridge University Press, 2011. : Great Book geared toward undergraduate math and science students. This article is based most on this reference. Amazon Kindle version for $23.99. Well worth the money.

To navigate this page, click on a link in the table of contents and it will bring you to the specified section. Clicking on the title of a section will bring you back to the table of contents.

Regarding explanatory information, clicking on the button (often labeled “here”) for the explanatory note will unhide it. To re-hide the information, click again on the button that opened it.

I. Introduction

Tensors are mathematical objects that are widely used largely because they do not change under coordinate changes. This is important in a number of disciplines but is particularly critical in physics. Why? Because the laws of physics are thought to be invariant irrespective of the coordinate system being employed. This is particularly crucial in special and general relativity. They are helpful in understanding complex (nonEuclidean) geometry as well.

So what are tensors? There are several definitions.

  • A multidimensional array (or grid) of numbers: Practical but misses out on geometric meaning of tensors.
  • A mathematical object that is invariant under coordinate changes and which transform under coordinate changes in specific, predicable ways
  • A collection of vectors and covectors combined together under the tensor product

Hopefully, the meaning of these definitions will become clear as we go along.

From our first definition, we can come up with a classification of tensors. Tensors can be classified by the number of indices contained by each entry in the grid that makes up the tensor. That number of indices is referred to as the tensor’s rank. Each index can specify an unlimited number of dimensions. Up to rank 3, we can think of the number of indices as the number of directions the grid extends into.

Figure I.1

Figure I.1 shows some examples of tensors that provides some intuition regarding how they are categorized.

A rank 0 tensor is just a number, also called a scalar. It doesn’t extend in any direction; it just sits at a single point on a graph.

Rank 1 tensors are vectors. They consist of just 1 column or 1 row of numbers extending out in one direction. The number of entries in that column or row specifies the number of dimensions of the vector.

Rank 2 tensors are analogous to matrices, arrays (or grids) of numbers that extend in two directions – along rows and columns. Note, however, that matrices are just collections of numbers. To specify a rank 2 tensor, we also need to specify the basis vectors we are using.

Rank 3 tensors would extend in three directions, like a Rubik’s Cube. Each block in the Rubik’s Cube can be thought of as containing a number.

Tensors of higher rank are harder to visualize (because we live in a 3-dimensional world) but you can imagine that we could put more numbers into more directions. For example, the Riemann curvature tensor used in general relativity has 4 indices and each index species 4 spacetime dimensions (1 dimension of time and 3 different dimensions of space). Thus, this mathematical object contains 256 pieces of information, or components – 4^4=256 – (although only 20 of those components are independent).

II. Vectors

Vectors – rank 1 tensors – are a good starting point to examine the properties of tensors.

I introduced vectors in my page on linear algebra; I won’t rehash that here. Instead, in this article, I’ll expand on what I’ve discussed previously. I talked about the dot product of vectors in the vector section of my linear algebra page. We’ll start here by discuss some other important operations that can be performed vectors. The first is the curl.

II.A Curl

In my page on linear algebra, I discussed one method of multiplying vectors, the dot product, which, when applied, yields a scalar. A second method of multiplying vectors, the curl, by contrast, yields another vector. The curl of two vectors \vec{A} and \vec{B}, is defined as follows:

\displaystyle \vec{A} \times \vec{B}  = (A_yB_z - A_zB_y)\hat{i} + (A_zB_x - A_xB_z)\hat{j} + (A_xB_y - A_yB_x)\hat{k}\quad \text{eq (1)}

where

î, ĵ and are unit vectors in the x, y and z directions of a classic Euclidean coordinate system.

The curl can also be written as a determinant:

    \[\vec{A} \times \vec{B} = \begin{vmatrix}\hat{i} & \hat{j} & \hat{k}\\ A_x & A_y & A_z\\ B_x & B_y & B_z\end{vmatrix}\quad \text{eq (2)}\]

To obtain the same expression as eq (1) using the determinant, we employed the cofactor formula. You can find more information on determinants in general, and the cofactor formula specifically, on my linear algebra page.

We said that the product of the curl is a vector. Therefore, it must have a direction. That direction is perpendicular to the plane of the vectors whose product is being taken. Of course, there are two possible directions that can be perpendicular to a plane. Which one is the correct direction? The correct direction is give by what’s called the right hand rule. The right hand rule has several forms. The one I like best is the one I consider the simplest and is depicted in figure II.A.1:

Righthand rule for curl
Figure II.A.1

To determine the direction of product of the curl, we curl our fingers from the first vector toward in the curl expression toward the second. For example, if our curl is \vec{A} \times \vec{B}, as in figure II.A.1a, we place the pinky-side of part of our right hand on \vec{A} and curl the tips of our fingers toward \vec{B}. Our thumb points upward; that’s the direction of the curl’s resultant vector. In figure 1b, we place the index finger-side of our hand on the first vector in our curl expression, \vec{B}, and curl our finger tips toward \vec{A}. When we do this, we get our resultant vector direction: in this case, downward.

The magnitude of the curl is also given by:

    \[ \lvert \lvert \vec{A} \times  \vec{B} \rvert  \rvert= \lvert \lvert \vec{A}\rvert  \rvert\,\lvert \lvert \vec{B}\rvert  \rvert \sin\theta \quad \text{eq (3)}\]

A proof of this, taken from Khan Academy, can be seen by clicking .

Another interesting fact related to the curl is that the magnitude of the curl equals the area formed from a parallelogram made from the two vectors that contribute to the curl. Figure 2 helps us see why.

Curl = area of parollelogram
Figure II.A.2

In figure II.A.2, we have two vectors, \vec{A} and \vec{B} at angle \theta to each other. We can extent line segments outward from the tips of both vectors to form a parallelogram. We then drop a perpendicular from the tip of \vec{B} to \vec{A}. This perpendicular segment represents the height of the parallelogram we made. We know that the area of a parallelogram is base times height which, in this case is \lvert\lvert \vec{A} \rvert\rvert\,\lvert\lvert \vec{B} \rvert\rvert \sin \theta. But we saw above that \lvert\lvert \vec{A} \rvert\rvert\,\lvert\lvert \vec{B} \rvert\rvert \sin \theta is just \vec{A} \times  \vec{B}.

II.B Triple Scalar Product

The triple scalar product, for three vectors \vec{A}, \vec{B} and \vec{C} , is defined as

    \[\vec{A}\cdot(\vec{B}\times\vec{C})\]

Since the curl produces a vector, then we’re taking the dot product of that resultant vector with another vector (in this case, \vec{A}), the result is a scalar.

One way to express the triple scalar product is:

    \begin{align*} \vec{A} \cdot (\vec{B}\times\vec{C})&=A_x(B_yC_z-B_zC_y)\\ &+ A_y(B_zC_x-B_xC_z)\quad \text{eq (4)}\\ &+A_z(B_xC_y-B_yC_x) \end{align*}

Another handy way to write the triple scalar product is:

    \[ \vec{A} \cdot (\vec{B} \times \vec{C}) = \begin{vmatrix} A_x & A_y & A_z\\ B_x & B_y & B_z\\ C_x & C_y & C_z\\ \end{vmatrix}\quad \text{eq (5)}\]

Note, also, that the volume of the parallelepiped formed by the three vectors that make up the triple scalar product is given by the triple scalar product (figure II.B.1).

Triple scalar product equals area of parallelepiped
Figure II.B.1

II.C Triple vector product

To be added.

II.D Gradient

    \[ \nabla \phi = \frac{\partial \phi}{\partial x}\hat{e}_x + \frac{\partial \phi}{\partial y}\hat{e}_y + \frac{\partial \phi}{\partial z}\hat{e}_z  \]

where

\nabla is the symbol for the gradient
\phi is a scalar field
\hat{e}_i is a unit vector in the ith direction

The result of the gradient is a vector field.

II.E Divergence

For a vector field

    \[ \vec{V}=V^x\hat{e}_x + V^y\hat{e}_y + V^z\hat{e}_z +  \]

The divergence is:

    \[ \nabla \cdot \vec{V} =  \frac{\partial V^x}{\partial x} + \frac{\partial V^y}{\partial y} + \frac{\partial V^z}{\partial z} \]

The result of the divergence is a scalar.

II.F Laplacian

If \vec{V} is the gradient of a scalar field, \phi, then the Laplacian is:

    \[  \nabla \cdot \vec{V} = \nabla \cdot \nabla \phi = \nabla^2 \phi = \frac{\partial^2 \phi}{\partial x^2} +  \frac{\partial^2 \phi}{\partial y^2} + \frac{\partial^2 \phi}{\partial z^2} \]

The result of the Laplacian is a scalar.

II.G Vector Identities

II.G.1 CAB-BAC Identity

This identity states that the curl of the curl equals the gradient of the divergence minus the Laplacian. The formula is:

    \[ \nabla \times (\nabla \times \vec{V}) = \nabla(\nabla \cdot \vec{V}) - \nabla^2\vec{V} \]

Divergence of the Curl is Zero

\nabla \cdot (\nabla \times \vec{V}) = 0

Curl of the Divergence is Zero

II.H Contravariant vs Covariant

The vectors discussed on my linear algebra page and the vectors we’ve discussed so far here are the classic vectors you may have learned about in high school – arrows that have a magnitude and direction in Euclidean space – a space where coordinate axes are at right angles and where spacing between units on these axes is the same everywhere. We can describe such vectors with components, numbers that tell us how many units we have to travel in each direction to get from the beginning of the vector to its end, as shown in figure II.H.1.

Furthermore, we can write these vectors as follows :

    \[ \vec{A} = \vec{A^x} + \vec{A^y}  = A^x\vec{e_x} + A^y\vec{e_y}\quad \text{eq (6)}\]

   where

  • \vec{A^x} and \vec{A^y} are component vectors of \vec{A} which, when added tip to tail, yield \vec{A}
  • \vec{e_x} and \vec{e_x} are basis vectors, 1 unit in length, pointing in the x and y directions, respectively
  • A^x and A^y are the magnitudes of the component vectors \vec{A^x} and \vec{A^y}; we simply call them the components of \vec{A}
Vector components in Euclidean coordinates
Figure II.H.1

Figure II.H.1, patterned after a diagram in Dan Fleisch’s text, shows how we can find the coordinates of a vector in Euclidean space. To find A^x, we imagine shining a light perpendicular to the x-axis, or equivalently, parallel to the y-axis. The shadow it makes on the x-axis is \vec{A}^x. And since A^x is the magnitude of \vec{A}^x, A^x is the length of \vec{A}^x.

Likewise, to find A^y, we imagine shining a light perpendicular to the y-axis, or equivalently, parallel to the x-axis. The shadow it makes on the y-axis is \vec{A}^y. And since A^y is the magnitude of \vec{A}^y, A^y is the length of \vec{A}^y.

Notice that the components and component vectors contain superscripts and basis vectors have a subscripts. We’ll come back to this.

For now, let’s see how the components of \vec{A} transform under coordinate transformations. Let’s look specifically at the case of coordinate system rotation and its effect on components and basis vectors.

Effect of rotation on vector components and bases in Euclidean space
Figure II.H.2

In figure II.H.2, the black axes are the coordinates before transformation and the red axes are the coordinate system after a 30° counterclockwise rotation. Figure II.H.2a shows the effect of coordinate rotation on vector components while figure II.H.2b shows the effect on basis vectors. Two things are evident.

  • The vector \vec{A} does not change with coordinate rotation
  • The components of \vec{A} do change with coordinate rotation

It turns out that vector components transform by what Dan Fleisch calls the indirect matrix and what the eigenchris website calls the backward transformation matrix:

\begin{bmatrix}A^{{\prime}x}\\A^{{\prime}y} \end{bmatrix}=\begin{bmatrix}\,\,\,\,\cos\theta&\sin\theta\\-\sin\theta&\cos\theta\end{bmatrix}=\begin{bmatrix}A^x\\A^y \end{bmatrix}\quad \text{eq (7)}

One can find additional information elsewhere on this website regarding the effect of coordinate rotation on vector components and matrix multiplication.1

Of course, we know that vectors aren’t made of just components; they’re a combination of components and basis vectors. Therefore, when a vector is transformed from one coordinate system to another, one must transform not only the components, but also the basis vectors. In contrast to the way new vector components transform, basis vectors are transformed by multiplication by the inverse of the matrix used for the indirect or backward transformation. We’ll call this matrix the direct (per Fleisch) or forward (per eigenchris) transformation matrix. In the case of coordinate axis rotation, as we might expect, this matrix is the same as the matrix that finds new coordinates for a vector that’s rotated while keeping the coordinates still.

\displaystyle \begin{bmatrix}\cos \theta & -\sin \theta\\ \sin \theta & \cos \theta \end{bmatrix}\quad \text{eq (8)}

And this makes sense because that’s exactly what we’re doing: rotating the basis vectors. More details about this can be found in the section entitled Basis vector rotation in Cartesian coordinates on my page on coordinate transformations.

I mentioned above that the direct/forward transformation matrix is the inverse of the indirect/backward matrix:

\begin{bmatrix}\,\,\,\,\cos\theta&\sin\theta\\-\sin\theta&\cos\theta\end{bmatrix}\begin{bmatrix}\cos \theta & -\sin \theta\\ \sin \theta & \cos \theta \end{bmatrix} = \begin{bmatrix} \cos^2\theta + \sin^2\theta &  -\sin\theta\cos\theta + \sin\theta\cos\theta\\ -\sin\theta\cos\theta + \sin\theta\cos\theta & \cos^2\theta + \sin^2\theta\end{bmatrix}

= \begin{bmatrix}1&0\\0&1 \end{bmatrix} \quad \text{eq (9)}

Why is this important? Because when we transform both the components and the basis vectors of a vector using these matrices, we get the same vector from which we started:

\vec{A^{\prime}}=\begin{bmatrix} A^{{\prime}x}\vec{e_x}^{\prime} \\  A^{{\prime}y}\vec{e_y}^{\prime} \end{bmatrix}=\begin{bmatrix} A^x\\A^y\end{bmatrix}\begin{bmatrix}\,\,\,\,\cos\theta&\sin\theta\\-\sin\theta&\cos\theta\end{bmatrix}\begin{bmatrix}\cos \theta & -\sin \theta\\ \sin \theta & \cos \theta \end{bmatrix}\begin{bmatrix} e_x\\e_y\end{bmatrix}=\begin{bmatrix}\text{Transformation}\\ \text{Matrix}\end{bmatrix}\vec{A}

=\begin{bmatrix} A^x\\A^y\end{bmatrix}\begin{bmatrix}1&0\\0&1 \end{bmatrix}\begin{bmatrix} e_x\\e_y\end{bmatrix}=\begin{bmatrix} A^x\vec{e_x}\\A^y\vec{e_y}\end{bmatrix}=\vec{A}\quad \text{eq (10)}

In short, this means that while vector components and basis vectors are modified by a coordinate transformation, the vector, itself, remains unchanged – just as figure II.H.2 suggests.

Notice that we’ve been using superscripts for vector components and subscripts for basis vectors. To see why, consider situations where the axes of the coordinate system being used are not parallel to each other or the length of units on these axes are not uniform from place to place. Let’s look at the first situation (figure II.H.3).

Parallel projection for vector components
Figure II.H.3

Unlike the so-called orthonormal basis we’ve been working with so far, when axes are not orthogonal, there are two ways one can determine vector components. The first is shown in figure II.H.3. As described in Dan Fleisch’s book, in what we’ll call the parallel method, the x-component is represented by the shadow made when one shines a light parallel to the y-axis (figure II.H.3a). And the y-component of a vector is represented by the shadow formed when one shines a light parallel to the x-axis (figure II.H.3b). Figure II.H.3c shows that the components formed in this manner add together to form \vec{A} in the same way that vector components added to make a vector when we used an orthonormal basis. If we were to change coordinate systems, the components would transform using the backward transformation matrix and the basis vectors would change using the forward matrix. The ensuing proof of this is patterned after the eignenchris tensor video series, especially https://www.youtube.com/watch?v=bpuE_XmWQ8Y.

Basis vector transformations
Figure II.H.4

We consider two vector bases, as depicted in figure II.H.4. We start with the basis \displaystyle \vec{e}_1=\begin{bmatrix}1\\0 \end{bmatrix} and \displaystyle \vec{e}_1=\begin{bmatrix}0\\1 \end{bmatrix}, shown in red. We want to describe the basis \displaystyle \widetilde{\vec{e}}_{\,1} and \displaystyle \widetilde{\vec{e}}_{\,2} (displayed in blue) in terms of \vec{e}_1 and \vec{e}_2. We can see from figure II.H.4a that:

\displaystyle \widetilde{\vec{e}}_{\,1} =\,\,\,\, 2\vec{e}_1+1\vec{e}_2

\displaystyle \widetilde{\vec{e}}_{\,2}=-1\vec{e}_1+1\vec{e}_2

We can write this in matrix form:

\begin{bmatrix} \displaystyle \widetilde{\vec{e}}_{\,1} & \displaystyle \widetilde{\vec{e}}_{\,2}\end{bmatrix}=\begin{bmatrix}\displaystyle \vec{e}_1 & \displaystyle \vec{e}_2 \end{bmatrix} \begin{bmatrix} 2 & -1\\ 1 & \,\,\,\,1\end{bmatrix}

We’ll call F=\begin{bmatrix} 2 & -1\\ 1 & \,\,\,\,1\end{bmatrix} the forward transformation matrix.

We can generalize the transformation equation for the basis vectors. For the 2-dimensional case:

\widetilde{\vec{e}}_{\,1}=F_{11}\vec{e}_{\,1} + F_{21}\vec{e}_{\,2}
\widetilde{\vec{e}}_{\,2}=F_{12}\vec{e}_{\,1} + F_{22}\vec{e}_{\,2}

where F=\begin{bmatrix} F_{11}&F_{12}\\F_{21}&F_{22}\end{bmatrix}

For the n-dimensional case:

\widetilde{\vec{e}}_{\,1}=F_{11}\vec{e}_{\,1} + F_{21}\vec{e}_{\,2}+\ldots+F_{n1}\vec{e}_{\,n}
\widetilde{\vec{e}}_{\,2}=F_{12}\vec{e}_{\,1} + F_{22}\vec{e}_{\,2}+\ldots+F_{n2}\vec{e}_{\,n}
              \displaystyle \vdots
\widetilde{\vec{e}}_{\,n}=F_{1n}\vec{e}_{\,1} + F_{2n}\vec{e}_{\,2}+\ldots+F_{nn}\vec{e}_{\,n}

where F=\begin{bmatrix} F_{11}&F_{12}&\dots&F_{1n}\\F_{21}&F_{22}&\dots&F_{2n}\\ \vdots & \vdots & \ddots & \vdots \\ F_{n1}&F_{n2}&\dots&F_{nn}\end{bmatrix}

We can write this in more compact form using summation notation:

\widetilde{\vec{e}}_{\,j}=\displaystyle \sum_{i=0}^n F_{ij}\vec{e}_{\,i}

Next, we want to get the \vec{e} components in terms of the \displaystyle \widetilde{\vec{e}} components. From figure II.H.4b, we can see that:

\displaystyle \vec{e}_1=\displaystyle \frac13\widetilde{\vec{e}}_{\,1}+(-\frac13)\displaystyle \widetilde{\vec{e}}_{\,2}

\displaystyle \vec{e}_2=\displaystyle \frac13\widetilde{\vec{e}}_{\,1}+ \frac23\displaystyle \widetilde{\vec{e}}_{\,2}

We can put this in matrix form as well:

\begin{bmatrix}\displaystyle \vec{e}_1 & \displaystyle \vec{e}_2 \end{bmatrix}=\begin{bmatrix} \displaystyle \widetilde{\vec{e}}_{\,1}& \displaystyle \widetilde{\vec{e}}_{\,2}\end{bmatrix}\begin{bmatrix} \displaystyle \,\,\,1/3 & \displaystyle 1/3\\ \\ \displaystyle -1/3 & \,\,\,\displaystyle 2/3\end{bmatrix}

We’ll refer to B=\begin{bmatrix} \displaystyle \,\,\,1/3 & \displaystyle 1/3\\ \\ \displaystyle -1/3 & \,\,\,\displaystyle 2/3\end{bmatrix} as the backward transformation matrix. We can generalize the \widetilde{\vec{e}}\rightarrow\vec{e} transformation by using the same process as we used above and write it utilizing summation notation, as follows:

\vec{e}_{\,j}=\displaystyle \sum_{i=0}^n B_{ij}\widetilde{\vec{e}}_{\,i}

If we multiply the forward and backward matrices together, the result is the identity matrix:

\displaystyle \begin{bmatrix} 2 & -1\\ 1 & \,\,\,\,1\end{bmatrix} \begin{bmatrix} \displaystyle \,\,\,1/3 & \displaystyle 1/3\\ \displaystyle -1/3 & \displaystyle 2/3\end{bmatrix}=\begin{bmatrix} 2(\frac13) - 1(-\frac13) & 2(\frac13) - 1(\frac23) \\ 1(\frac13) + 1(-\frac13) & 1(\frac13) + 1(\frac23)\end{bmatrix}

                  =\begin{bmatrix} \frac23 + \frac13 & \frac23 - \frac23 \\ \frac13 - \frac13 & \frac13 + \frac23\end{bmatrix}

                  =\begin{bmatrix} 1 & 0 \\ 0 & 1\end{bmatrix}

That means that the forward and backward matrices are inverses of each other.

Old and new vector components
Figure II.H.5

Our next tasks are 1) to find the components of vector \vec{V} in the \vec{e} basis (we’ll call them the old componenents) and the \displaystyle \widetilde{\vec{e}} basis (we’ll call them the new componenents) and 2) to see how the two sets of components relate to each other

From figure II.H.5a, we see that the components of vector \vec{V} in the \vec{e} basis are \vec{V}=\begin{bmatrix} V_1 \\ V_2\end{bmatrix}=\begin{bmatrix}1 \\ 2 \end{bmatrix}.

From figure II.H.5b, we see that the components of vector \vec{V} in the \widetilde{\vec{e}} basis are \vec{V}=\begin{bmatrix} \widetilde{V}_1 \\ \widetilde{V}_2\end{bmatrix}=\begin{bmatrix}1 \\ 1 \end{bmatrix}.

So how are the two sets of components related? Well, to make things work out, if we start with components in the \vec{e} basis, and want to know the components in the \widetilde{\vec{e}} basis, we have:

\vec{V} = \widetilde{V}^1\widetilde{\vec{e}} + \widetilde{V}^2\widetilde{\vec{e}}_2 = V^1e_1 + V^2e_2 = V^1(\displaystyle \frac13\widetilde{\vec{e}}_{\,1}+(-\frac13)\displaystyle \widetilde{\vec{e}}_{\,2}) + \displaystyle V_2(\displaystyle \frac13\widetilde{\vec{e}}_{\,1}+ \frac23\displaystyle \widetilde{\vec{e}}_{\,2})

                =V^1\frac13\widetilde{\vec{e}}_{\,1} - V^2\frac13\widetilde{\vec{e}}_{\,1} + V^1(-\frac13)\widetilde{\vec{e}}_{\,2} + V^2\frac23\widetilde{\vec{e}}_{\,2}

                =\left[V^1(\frac13) + V^2(\frac13)\right]\widetilde{\vec{e}}_{\,1} + \left[V^1(-\frac13) + V^2(\frac23)\right]\widetilde{\vec{e}}_{\,2}

We can break this down into two equations. First:

\widetilde{V}^1\widetilde{\vec{e}}_{\,1}= \left[V^1(\frac13) + V^2(\frac13)\right]\widetilde{\vec{e}}_{\,1}\,\,\Rightarrow\,\,\widetilde{V}^1=V^1(\frac13) + V^2(\frac13)

and

\widetilde{V}^2\widetilde{\vec{e}}_2=\left[V^1(-\frac13) + V^2(\frac23)\right]\widetilde{\vec{e}}_{\,2}\,\,\Rightarrow\,\,\widetilde{V}^2=V^1(-\frac13) + V^2(\frac23)

We can recombine these equations into matrix form:

\begin{bmatrix}\widetilde{V}^1 \\ \widetilde{V}^2 \end{bmatrix}=\begin{bmatrix} \displaystyle \,\,\,1/3 & \displaystyle 1/3\\ \\ \displaystyle -1/3 & \,\,\,\displaystyle 2/3\end{bmatrix}\begin{bmatrix} V^1 \\ V^2\end{bmatrix}

We can check to see if this agrees with the values for V^1, V^2, \widetilde{V}^1 and \widetilde{V}^2 we got from figure 6.2:

\begin{bmatrix} 1 \\ 1 \end{bmatrix}=\begin{bmatrix} \displaystyle \,\,\,1/3 & \displaystyle 1/3\\ \\ \displaystyle -1/3 & \,\,\,\displaystyle 2/3\end{bmatrix}\begin{bmatrix} 1 \\ 2 \end{bmatrix}

Indeed they do.

We can generalize the above transformation equation in a manner similar to what we’ve done before and express it with summation notation like so:

\widetilde{V^i}=\displaystyle \sum_{i=0}^n B_{ij}V^j

Transformation of the new components into the old ones follows an analogous process. The steps are:

Write \vec{V} as a linear combination of the \vec{e} and \widetilde{\vec{e}} bases, then make the appropriate substitutions:

\vec{V}=V^1\vec{e}_1 + V^2\vec{e}_2=\widetilde{V}^1\widetilde{\vec{e}} + \widetilde{V}^2\widetilde{\vec{e}}_2=\widetilde{V}^1(2\vec{e}_1+1\vec{e}_2) + \widetilde{V}^2(-1\vec{e}_1+1\vec{e}_2)

              =\widetilde{V}^1 2\vec{e}_1+\widetilde{V}^2 1\vec{e}_2 + \widetilde{V}^2(-1)\vec{e}_1+\widetilde{V}^2 1\vec{e}_2)

              =\left[\widetilde{V}^1 (2) + \widetilde{V}^2(-1)\right]\vec{e}_1 + \left[\widetilde{V}^1 (1) +  \widetilde{V}^2 (1)\right]\vec{e}_2

Break this equation down into two equations:

V^1\vec{e}_1=\left[\widetilde{V}^1 (2) + \widetilde{V}^2(-1)\right]\vec{e}_1\,\,\Rightarrow\,\,V^1=\widetilde{V}^1 (2) + \widetilde{V}^2(-1)

and

V^2\vec{e}_2=\left[\widetilde{V}^1 (1) +  \widetilde{V}^2 (1)\right]\vec{e}_2\,\,\Rightarrow\,\,V^2=\widetilde{V}^1 (1) +  \widetilde{V}^2 (1)

Recombine these two equations into matrix form:

\begin{bmatrix} V^1 \\ V^2\end{bmatrix}=\begin{bmatrix} 2 & -1\\ 1 & \,\,\,\,1\end{bmatrix}\begin{bmatrix} \widetilde{V}^1 \\ \widetilde{V}^2 \end{bmatrix}

Confirm that this agrees with the values for the components we got from figure II.H.5:

\begin{bmatrix} 1 \\ 2\end{bmatrix}=\begin{bmatrix} 2 & -1\\ 1 & \,\,\,\,1\end{bmatrix}\begin{bmatrix} 1 \\ 1 \end{bmatrix}

The summation equation for the \widetilde{V} \rightarrow V transformation is:

V^i=\displaystyle \sum_{i=0}^n F_{ij}\widetilde{V^j}

I think it’s instructive to look at the long versions of the proofs about how basis vectors and components transform. However, there’s a much easier way to derive these transformations using linear algebra. If we start with the vector, \vec{V}, we know that to obtain \vec{V}^{\prime} we apply the backward transformation matrix, B:

\vec{V}^{\prime}=B\vec{V}

Applying the inverse of B to both sides of this equation, we obtain:

B^-\vec{V}^{\prime}=B^-B\vec{V}

But we know that B^-B=I and B^- is the forward matrix, F. So

F\vec{V}^{\prime}=\vec{V}

Notice that the transformation matrices used to relate old and new components are opposite to the transformation matrices used to transform their corresponding basis vectors:

Old basis vectors \xrightarrow{F} New basis vectors
Old components \xrightarrow{B} New components
New basis vectors \xrightarrow{B} Old basis vectors
New components \xrightarrow{F} Old components

That’s why they’re called contravariant components.

Note, also, that we use a row vector and subscripts when manipulating the basis vectors as opposed to column vectors and superscripts when dealing with components, suggesting the two are intrinsically different entities.

As a consequence of components and basis vectors transforming in opposite ways, the length of components decrease as basis vector component length increases, and vice versa. This is depicted in figure II.H.6a and II.H.6b. Figure II.H.6c and II.H.6d provide further intuition regarding the origin of the name “contravariant.” In figure II.H.6c, we can see that angles \theta_1 and \theta_2 are equal. If the coordinate system is rotated clockwise (figure II.H.6d), then, relative to the new coordinate axes, \vec{V} appears to rotate counterclockwise i.e., contrary to the basis vector rotation.

Intuition as to why contravariant components are named as they are
Figure II.H.6

By convention, superscripts are used to designate contravariant components and subscripts are used for the basis vectors by which they are multiplied.

All of the component transformations we’ve talked about so far apply when the parallel method of evaluating vector components is employed. But the parallel method isn’t the only method available. As shown in figure II.H.7, we can also obtain x-components by shining a light perpendicular to the x-axis. Likewise, to find the y-components, we shine our light perpendicular to the y-axis.

Problems with perpendicular projection method to find vector components
Figure II.H.7

We can see that components created in this way are different than those created by the parallel method. In addition, figure II.H.7 shows us that if we try to add vector components – made by the perpendicular method – by applying the technique that we used for vectors in Cartesian coordinates or those derived from the parallel projection method, it doesn’t work. Still, we can’t help but believe that there has to be some way to make “components” from the perpendicular method. After all, from what vantage point we look at the axes shouldn’t matter. In fact, there is.

What we need to do is to use different basis vectors. This is shown in figure II.H.8.

Perpendicular projection leading to covariant components
Figure II.H.8

There are 2 defining characteristics of these new basis vectors which are called dual basis vectors:

  1. Each dual basis vector must be be perpendicular to all original basis vectors with different indices. This is evident in figure II.H.8c (i.e., \vec{e}^y is perpendicular to \vec{e}_x and \vec{e}^x is perpendicular to \vec{e}_y). Another way to say this is \vec{e}^y \cdot \vec{e}_x = 0 and \vec{e}^x \cdot \vec{e}_y = 0.
  2. The dot product of a dual basis vector with a original basis vector with the same index equals 1. Thus, in the figure II.H.8c \vec{e}^x \cdot \vec{e}_x = 1 and \vec{e}^y \cdot \vec{e}_y = 1.
  3. More generally, we can write this as e^i\cdote_j=\delta^i_j where \delta^i_j  = \begin{cases}1 & \text{if } i = j\\ 0 & \text{if }i \neq j\end{cases}

From figure II.H.8c, it’s harder to see, at least for me, why this second definition is true. We can see that \vec{e}^y and \vec{e}_y are separated by an angle \theta_2 and can be thought of as sides of a right triangle. From this, we know that:

\lvert\lvert \vec{e}^y \rvert\rvert \cos\theta = \lvert\lvert \vec{e}_y \rvert\rvert \quad \text{eq (11)}

From the second definition above, we have:

\displaystyle \vec{e}^y \cdot \vec{e}_y = \lvert\lvert \vec{e}^y \rvert\rvert \, \lvert\lvert \vec{e}_y \rvert\rvert \cos\theta = 1 \quad \text{eq (12)}

These two equations are difficult to reconcile except when e_y = 1. If anyone can better explain the connection between the second definition for dual basis vectors and the perpendicular method for obtaining these vectors, please leave a comment.

Evident from figure II.H.8d, though, is that if we make our perpendicular projections onto the direction defined by the dual basis vectors, the resulting vectors add up to make \vec{A}. That is, they add as components. To see an example with calculations, click .

The components combined with dual basis vectors to create vector components are called covariant components, and to distinguish them from contravariant components, subscripts are used to designate them. Similarly, to distinguish dual basis vectors from the basis vectors that multiply contravariant components, superscripts are used.

To understand why covariant components get their name, we need to see how covector components and the basis vectors they’re multiplied by to obtain a covector actually transform. Here’s a summary:

\displaystyle \widetilde{\epsilon}\,\,^{\prime \,i}=\sum_{i=0}^n B_{ij}\epsilon ^j\quad \displaystyle \epsilon ^j=\sum_{i=0}^n F_{ij}\widetilde{\epsilon}\,\,^{\prime \,i}

\displaystyle \widetilde{\alpha}_j^{\prime }=\sum_{i=0}^n F_{ij}\alpha ^j\quad \displaystyle \alpha _j=\sum_{i=0}^n B_{ij}\widetilde{\alpha^{\prime }}_i

where

  • \alpha\text{'s} and \widetilde{\alpha}\text{'s} are the covector components
  • \epsilon\text{'s} and \widetilde{\epsilon}\text{'s} are the basis vectors for the dual vector space
  • F_{ij} are the components of the forward transformation matrix
  • B_{ij} are the components of the forward transformation matrix

[For a proof of these expressions, patterned after the eigenchris videos (especially https://www.youtube.com/watch?v=d5da-mcVJ20), click .]

Components of covectors are called covariant because their manner of transformation from old to new components “coincides” with (transforms in the same way as) that of basis vectors; they both use the forward transformation:

\widetilde{\vec{e}}_j=\displaystyle \sum_{i=1}^n F_{ij}\vec{e}_i   and   \widetilde{\alpha}_j=\displaystyle \sum_{i=1}^n F_{ij}\alpha_i

And in contrast to contravariant components, if dual basis vector length is increased, then the length of its associated covariant component is also increased. Likewise, when dual basis vectors are rotated in a given direction, the vector appears to rotate in the same direction (versus what’s shown in figure II.H.6c and II.H.6d).

Note that many authors refer to vectors as being contravariant or covariant, but strictly speaking, the vector itself (being a tensor) is an invariant object which can be described using either contravariant or covariant components.

Many matheticians now consider the terms contravariant and covariant to be outdated. More modern terminologies include calling vectors with contravariant components vectors and vectors with covariant components covectors, linear functionals or one-forms. The reason for this, they say, is that the mathematical objects they are calling covectors or one-forms are fundamentally different from vectors. They see them as operators that take in a vector as an argument and spit out a scalar:

O(\vec{v}) = s where O is the covector/one-form, acting as an operator, and s is a scalar (i.e. a number).

They also follow linearity:

  • \alpha(\vec{v}+\vec{w})=(\alpha\vec{v}+(\alpha\vec{w})
  • \alpha(n\vec{v})=n\alpha(\vec{v})

where

\alpha is a covector
\vec{v} and \vec{w} are vectors on which the covector operates
n is a scalar

In this scheme, these operators (or covectors or one-forms or whatever you want to call them) form a separate a separate vector space that’s referred to as a dual space. Furthermore, they visualize these covectors as a series of hyperplanes – or simple 2-dimensional planes if we’re dealing with just 2 dimensions, like we’ve been dealing with in the examples I’ve given – perpendicular to the direction of the dual basis vectors. The components of the covector, then, are given by the number of hyperplanes that are “pierced” in moving in the direction of the dual basis vector to the point where the perpendicular projection of the vector intersects the axes formed by the dual basis vector. Now, that’s a mouthful. Hopefully, figure II.H.9, which applies this alternate viewpoint to the example given in figure II.H.8, will clarify things.

Covector addition is given by the number of hyperplanes pierced
Figure II.H.9

In figure II.H.9, we see \vec{A} noted in figure II.H.8 expressed with covariant components and dual basis vectors. The parallel black lines represent hyperplanes running in and out of the plane of the screen. The number of hyperplanes traversed by the component vectors represent the component vector magnitudes.

There are definitely situations where this hyperplane picture is helpful such as application of Stokes theorem. However, for me at least, seeing the hyperplanes as tick marks on an axis works fairly well.

Another way to view the difference between contravariant and covariant components may be even more useful. It’s simply to define the way each type of component changes under coordinate transformations. Contravariant components are said to transform as follows:

\displaystyle A^{\prime}^i = a_{ij}A^j\quad \text{eq (13)}

Here I’ve used notation I haven’t used before – Einstein summation notation. We’ll also switch from using x, y … as subscripts and superscript to the more commonly-used notation: numbers. Let me explain with an example. For simplicity, we’ll take \vec{A} to be a 3-dimensional vector. In keeping with the way we’ve been transforming vectors, we’ll start by using a transformation matrix:

\displaystyle \begin{bmatrix}  A^{\prime}^1\\A^{\prime}^2\\A^{\prime}^3\end{bmatrix} = \begin{bmatrix} a_{11} & a_{12}&a_{13}\\a_{21} & a_{22} &a_{23}\\a_{31} & a_{32} &a_{33}\end{bmatrix}\begin{bmatrix}  A^1\\A^2\\A^3\end{bmatrix} \quad \text{eq (14)}

This matrix equation translates into 3 simultaneous equations:

\displaystyle A^{\prime}^1 = a_{11}A^1 + a_{12}A^2 + a_{13}A^3
\displaystyle A^{\prime}^2 = a_{21}A^1 + a_{22}A^2 + a_{23}A^3 \quad \text{eq (15)}
\displaystyle A^{\prime}^3 = a_{31}A^1 + a_{32}A^2 + a_{33}A^3

Of course, this is a lot of writing so mathematicians developed summation notation to make things shorter:

\displaystyle A^{\prime}^1 = \sum_{j=1}^3a_{1j}A^j
\displaystyle A^{\prime}^2 = \sum_{j=1}^3a_{2j}A^j \quad \text{eq (16)}
\displaystyle A^{\prime}^3 = \sum_{j=1}^3a_{3j}A^j

But we still have 3 equations. We can get it down to one equation as follows:

\displaystyle A^{\prime}^i = \sum_{j=1}^3a_{ij}A^j \quad \text{eq (17)}

Einstein made the notation even more compact by noting that when there’s a term where an index is used as both a superscript (“upstairs” index) and a subscript (“downstairs” index) – like j is in eq (17) – then one can assume that this index is summed over and the summation sign, \sum, can be dropped. Applying Einstein notation to eq (17), we have:

\displaystyle A^{\prime}^i = a_{ij}A^j \quad \text{eq (18)}

The j in eq (17), the index being summed over, is called a dummy index. That’s because we could replace it with any symbol without changing the meaning of the equation.

The classic example of where contravariant components are used is in transforming the differential length element, ds=dx+dy+dz, from one coordinate system to another. Again, we’ll use the commonly-employed convention where x=x^1, y=x^2 and z=x^3. We’ll start with a matrix equation:

\displaystyle \begin{bmatrix}  dx^{\prime}^1\\ \\dx^{\prime}^2\\ \\dx^{\prime}^3\end{bmatrix} = \begin{bmatrix} \displaystyle \frac{\partial x^{\prime}^1}{\partial x^1} &  \displaystyle \frac{\partial x^{\prime}^1}{\partial x^2} & \displaystyle \frac{\partial x^{\prime}^1}{\partial x^3} \\ & \\ \displaystyle \frac{\partial x^{\prime}^2}{\partial x^1} &  \displaystyle \frac{\partial x^{\prime}^2}{\partial x^2} &  \displaystyle \frac{\partial x^{\prime}^2}{\partial x^3} \\  & \\ \displaystyle \frac{\partial x^{\prime}^3}{\partial x^1} &  \displaystyle \frac{\partial x^{\prime}^3}{\partial x^2} &  \displaystyle \frac{\partial x^{\prime}^3}{\partial x^3} \end{bmatrix}\begin{bmatrix}  dx^1\\ \\dx^2\\ \\dx^3\end{bmatrix} \quad \text{eq (19)}

This matrix equation can be written as 3 simultaneous equations:

\displaystyle dx^{\prime}^1=\frac{\partial x^{\prime}^1}{\partial x^1}dx^1 +  \displaystyle \frac{\partial x^{\prime}^1}{\partial x^2}dx^2 + \displaystyle \frac{\partial x^{\prime}^1}{\partial x^3}dx^3

\displaystyle dx^{\prime}^2=\frac{\partial x^{\prime}^2}{\partial x^1}dx^1 +  \displaystyle \frac{\partial x^{\prime}^2}{\partial x^2}dx^2 + \displaystyle \frac{\partial x^{\prime}^2}{\partial x^3}dx^3 \quad \text{eq (20)}

\displaystyle dx^{\prime}^3=\frac{\partial x^{\prime}^3}{\partial x^1}dx^1 +  \displaystyle \frac{\partial x^{\prime}^3}{\partial x^2}dx^2 + \displaystyle \frac{\partial x^{\prime}^3}{\partial x^3}dx^3

As in eq (16), we can reduce eq (20) to 3 summation equations:

\displaystyle dx^{\prime}^1=\sum_{j=1}^3\frac{\partial x^{\prime}^1}{\partial x^j}dx^j,\quad dx^{\prime}^2=\sum_{j=1}^3\frac{\partial x^{\prime}^2}{\partial x^j}dx^j,\quad dx^{\prime}^3=\sum_{j=1}^3\frac{\partial x^{\prime}^3}{\partial x^j}dx^j,\quad \text{eq (21)}

We can consolidate the 3 equations of eq (21) into one:

\displaystyle dx^{\prime}^i=\sum_{j=1}^3\frac{\partial x^{\prime}^i}{\partial x^j}dx^j\quad \text{eq (22)}

Finally, we can drop the summation sign and wind up with eq (22) in Einstein summation form:

\displaystyle dx^{\prime}^i=\frac{\partial x^{\prime}^i}{\partial x^j}dx^j\quad \text{eq (23)}

Indeed, many authors define the contravariant components of vector \vec{A} as components that transform as:

\displaystyle A^{\prime}^i=\frac{\partial x^{\prime}^i}{\partial x^j}A^j\quad \text{eq (24)}

Figure II.H.10 is intended to give some intuition regarding how eq (24) might be interpreted.

Meaning of contravariant component transformation equation
Figure II.H.10

As you might guess, covariant components can also be defined by the way it transforms under coordinate change. The equation for covariant components is:

\displaystyle A^{\prime}_i = a^{ij}A_j\quad \text{eq (25)}

where a^{ij} is a transformation matrix. Notice that whereas superscripts were used for the pre- and post-transformation vectors and subscripts for the transformation matrix with contravariant vector components, the opposite is true for covariant components.

The prototypical example of covariant component use involves the gradient of a scalar field, f(x,y,z) (such as room temperature). The change of the field with position is given by the gradient, \displaystyle \frac{df}{dx^i}, where again, we use numerical indices instead of x, y, z. Notice that the gradient (per unit length) is the inverse of that of the line element ds which has units of length. We can follow a procedure similar to the one we utilized to derive the contravariant transformation equation for the line element.

First, write a matrix equation:

\displaystyle \begin{bmatrix}  \displaystyle \frac{\partial f}{\partial x^{\prime}^1}\\ \\\displaystyle \frac{\partial f}{\partial x^{\prime}^2}\\ \\\displaystyle \frac{\partial f}{\partial x^{\prime}^3}\end{bmatrix} = \begin{bmatrix} \displaystyle \frac{\partial x^1}{\partial x^{\prime}^1} &  \displaystyle \frac{\partial x^2}{\partial x^{\prime}^1} & \displaystyle \frac{\partial x^3}{\partial x^{\prime}^1} \\ & \\ \displaystyle \frac{\partial x^1}{\partial x^{\prime}^2} &  \displaystyle \frac{\partial x^2}{\partial x^{\prime}^2} & \displaystyle \frac{\partial x^3}{\partial x^{\prime}^2} \\  & \\ \displaystyle \frac{\partial x^1}{\partial x^{\prime}^3} &  \displaystyle \frac{\partial x^2}{\partial x^{\prime}^3} & \displaystyle \frac{\partial x^3}{\partial x^{\prime}^3} \end{bmatrix}\begin{bmatrix}  \displaystyle \frac{\partial f}{\partial x^1}\\ \\\displaystyle \frac{\partial f}{\partial x^2}\\ \\\displaystyle \frac{\partial f}{\partial x^3}\end{bmatrix} \quad \text{eq (26)}

We write eq (26) as 3 simultaneous equations:

\displaystyle \frac{\partial f}{\partial x^{\prime}^1}=\displaystyle \frac{\partial f}{\partial x^1}\displaystyle \frac{\partial x^1}{\partial x^{\prime}^1}+\displaystyle \frac{\partial f}{\partial x^2}\displaystyle \frac{\partial x^2}{\partial x^{\prime}^1}+\displaystyle \frac{\partial f}{\partial x^3}\displaystyle \frac{\partial x^3}{\partial x^{\prime}^1}

\displaystyle \frac{\partial f}{\partial x^{\prime}^2}=\displaystyle \frac{\partial f}{\partial x^1}\displaystyle \frac{\partial x^1}{\partial x^{\prime}^2}+\displaystyle \frac{\partial f}{\partial x^2}\displaystyle \frac{\partial x^2}{\partial x^{\prime}^2}+\displaystyle \frac{\partial f}{\partial x^3}\displaystyle \frac{\partial x^3}{\partial x^{\prime}^2}\quad \text{eq (27)}

\displaystyle \frac{\partial f}{\partial x^{\prime}^3}=\displaystyle \frac{\partial f}{\partial x^1}\displaystyle \frac{\partial x^1}{\partial x^{\prime}^3}+\displaystyle \frac{\partial f}{\partial x^2}\displaystyle \frac{\partial x^2}{\partial x^{\prime}^3}+\displaystyle \frac{\partial f}{\partial x^3}\displaystyle \frac{\partial x^3}{\partial x^{\prime}^3}

We can write eq (27) as 3 summation equations:

\displaystyle \frac{\partial f}{\partial x^{\prime}^1}=\sum_{j=1}^3\displaystyle \frac{\partial x^j}{\partial x^{\prime}^1}\displaystyle \frac{\partial f}{\partial x^j},   \displaystyle \frac{\partial f}{\partial x^{\prime}^2}=\sum_{j=1}^3\displaystyle \frac{\partial x^j}{\partial x^{\prime}^2}\displaystyle \frac{\partial f}{\partial x^j},   \displaystyle \frac{\partial f}{\partial x^{\prime}^3}=\sum_{j=1}^3\displaystyle \frac{\partial x^j}{\partial x^{\prime}^3}\displaystyle \frac{\partial f}{\partial x^j}\quad \text{eq (28)}

Eq (28) can be reduced to 1 summation equation:

\displaystyle \frac{\partial f}{\partial x^{\prime}^i}=\sum_{j=1}^3\displaystyle \frac{\partial x^j}{\partial x^{\prime}^i}\displaystyle \frac{\partial f}{\partial x^j}\quad \text{eq (29)}

The Einstein summation formula reduces this further to:

\displaystyle \frac{\partial f}{\partial x^{\prime}^i}=\displaystyle \frac{\partial x^j}{\partial x^{\prime}^i}\displaystyle \frac{\partial f}{\partial x^j}\quad \text{eq (30)}

In fact, many authors define the covariant components of vector \vec{A} as components that transform as:

\displaystyle A_i=\displaystyle \frac{\partial x^j}{\partial x^{\prime}^i}\displaystyle A_j \quad \text{eq (31)}

The intuition for the meaning of this equation is shown in figure II.H.11:

Meaning of covariant component transformation equation
Figure II.H.11

Having at our fingertips 1) these definitions of contravariant and covariant components given by transformation equations eq (24) and eq (31) and 2) Einstein summation technique, we are now ready to tackle tensors of rank > 1.

III. Tensors of Rank > 1

We spent a significant amount of time talking about vectors because, in this section, we’re going to 1) build rank 2 tensors from vectors and 2) show that the transformation characteristics that define vectors are equally applicable to these rank 2 tensors. Subsequently, we’ll show that higher ranking tensors can be created from lower ranking tensors and follow the same transformation equations. In so doing, this will establish that tensors, in general, are mathematical objects that remain unchanged under coordinate transformations despite changes in components and basis vectors that make them up.

III.A Tensor Creation and Transformations

So how can vectors be used to build rank 2 tensors? By using a vector multiplication method called the outer product. In my mind, the easiest way to visualize the outer product is via matrix multiplication. Specifically:

We start with 2 3-dimensional column vectors

\vec{A}^m=\begin{bmatrix} a^1 \\ a^2 \\ a^3 \end{bmatrix}   and   \vec{B}^n=\begin{bmatrix} b^1 \\ b^2 \\ b^3 \end{bmatrix}

The outer (or tensor) product is defined as:

\vec{A}^m \otimes \vec{B}^n^T = \begin{bmatrix} a^1 \\ a^2 \\ a^3 \end{bmatrix} \begin{bmatrix} b^1 & b^2 & b^3 \end{bmatrix} = \begin{bmatrix} a^1b^1 & a^1b^2 & a^1b^3 \\  a^2b^1 & a^2b^2 & a^2b^3 \\ a^3b^1 & a^3b^2 & a^3b^3 \end{bmatrix}=S^{mn}   eq (32)

In eq (32), the superscript, T, associated with \vec{B}^n, is called the transpose. It means make a column a row or vice versa. Once \vec{B}^n is converted from a column vector to a row vector, it multiplies each component of \vec{A}^m to yield a matrix that represents the rank 2 tensor, S^{mn}. Rather than writing \vec{A}^m \otimes \vec{B}^n^T or \begin{bmatrix} a^1 \\ a^2 \\ a^3 \end{bmatrix} \begin{bmatrix} b^1 & b^2 & b^3 \end{bmatrix}, it’s customary to write the tensor product as:

A^mB^n = S^{mn}   eq (33)

With its upper indices, it looks like S^{mn} is a contravariant object. Let’s see how S^{mn} transforms under coordinate transformation. We know that S^{mn} is made from \vec{A}^m and \vec{B}^n. Thus:

\displaystyle S^{{\prime}mn}=A^{{\prime}m}B^{{\prime}n}   eq (34)

We know how \vec{A}^m and \vec{B}^n transform so:

\displaystyle S^{{\prime}mn}=\displaystyle \frac{\partial x^{{\prime}m}}{\partial x^p}A^p \displaystyle \frac{\partial x^{{\prime}n}}{\partial x^q}B^q   eq (35)

    =\displaystyle \frac{\partial x^{{\prime}m}}{\partial x^p} \displaystyle \frac{\partial x^{{\prime}n}}{\partial x^q}A^pB^q   eq (36)

But

A^pB^q=T^{pq}   eq (37)

So

\displaystyle S^{{\prime}mn}=\displaystyle \frac{\partial x^{{\prime}m}}{\partial x^p} \displaystyle \frac{\partial x^{{\prime}n}}{\partial x^q}T^{pq}   eq (38)

From this, we can say that mathematical objects like T^{pq} or T^{mn} are tensors – rank 2 tensors with covariant indices.

We can make other types of rank 2 tensors from vectors with other types/combinations of components. Specifically,

A_mB_n=T_m_n   eq (39)

which, by the same arguments presented for eq (33), transforms as

\displaystyle S^{\prime}_{mn}=\displaystyle \frac{\partial x^p}{\partial x^{{\prime}m}} \displaystyle \frac{\partial x^q}{\partial x^{{\prime}n}}T_{pq}   eq (40)

or

A^mB_n=T^m_n   eq (41)

which transforms as

\displaystyle S^{{\prime}m}_n=\displaystyle \frac{\partial x^{{\prime}m}}{\partial x^p} \displaystyle \frac{\partial x^q}{\partial x^{{\prime}n}}T^p_q   eq (42)

or

A_mB^n=T_m^n   eq (43)

which transforms as

\displaystyle S_m^{{\prime}n}=\displaystyle \frac{\partial x^p}{\partial x^{{\prime}m}} \displaystyle \frac{\partial x^{{\prime}n}}{\partial x^q}T_p^q   eq (44)

Perhaps you’re beginning to see a pattern.

  • If the vectors we use to make the tensor has 2 contravariant indices, then the tensor it creates has the same 2 contravariant indices
  • If the vectors we use to make the tensor has 2 covariant indices, then the tensor it creates has the same 2 covariant indices
  • If the vectors we use to make the tensor has 1 contravariant and 1 covariant index, then the tensor it creates has the same contravariant and covariant indices

And since we can make a rank 2 tensor out of two rank 1 tensors, as might guess, we can make any higher rank tensor from other lower rank tensors, the higher rank tensor acquiring the indices of the lower rank tensors from which it’s created. For example:

A_hB^iC^j = T_h^i^j  eq (45)

A^h_iB_j^k_m = T^h_i_j^k_m  eq (46)

And if we wanted to see how such tensors transform, we would simply insert the appropriate partial derivative for each index. For example, the tensor in eq (45) would transform as follows:

T^{\prime}_h^i^j = \displaystyle \frac{\partial x^k}{\partial x^{{\prime}h}} \displaystyle \frac{\partial x^{{\prime}i}}{\partial x^l} \displaystyle \frac{\partial x^{{\prime}j}}{\partial x^m} T_k^l^m  eq (47)

But note that

T^m^n \neq T^n^m  eq (48)

To see why, click .

III.B Tensor Properties

III.B.1 Invariance of Tensor Equations

One of the most important properties of tensors is obvious from their transformation properties. Suppose all of the components of a tensor are 0 in one coordinate system. Then by the transformation equations we worked with in the previous section, that tensor must be zero in all coordinate systems:

T^{\prime}_h^i^j = \displaystyle \frac{\partial x^k}{\partial x^{{\prime}h}} \displaystyle \frac{\partial x^{{\prime}i}}{\partial x^l} \displaystyle \frac{\partial x^{{\prime}j}}{\partial x^m} T_k^l^m

If T_k^l^m=0, then the transformed tensor in the new coordinate system, T^{\prime}_h^i^j must also be 0.

Now suppose T_k^l^m is not 0 but that

T_k^l^m=U_k^l^m  eq (49)

Then

T_k^l^m-U_k^l^m=0  eq (50)

But we just showed that a tensor that’s 0 in one coordinate system is 0 in all coordinate systems. That means that tensors are invariant and tensor equations are the same in all coordinate systems. The implications of this are critical, especially in physics and engineering, most notably special and general relativity.

III.B.2 Addition and Subtraction

We know we can add vectors (i.e., rank 1 tensors):

\vec{C} = \vec{A} + \vec{B}  eq (51)

We can do this by adding components:

C_x = A_x +B_x
C_y = A_y +B_y  eq (52)
C_z = A_z +B_z

We can do something similar with higher ranking tensors:

C^{ij} = A^{ij} +B^{ij}
C_{ij} = A_{ij}+B_{ij}  eq (53)
C^i_j = A^i_j +B^i_j

To prove that the tensor we created by adding two tensors is itself a tensor, we check to see how it transforms. We’ll use C_{ij} as an example. It’s made by the addition of A_{ij} and B_{ij}. We’ve seen previously seen how A_{ij} and B_{ij} transform:

\displaystyle A^{\prime}_{kl}=\displaystyle \frac{\partial x^i}{\partial x^{{\prime}k}} \displaystyle \frac{\partial x^j}{\partial x^{{\prime}l}}A_{ij}   eq (54)

and

\displaystyle B^{\prime}_{kl}=\displaystyle \frac{\partial x^i}{\partial x^{{\prime}k}} \displaystyle \frac{\partial x^j}{\partial x^{{\prime}l}}B_{ij}   eq (55)

So then, A_{ij} + B_{ij} transforms as follows:

\displaystyle A^{\prime}_{kl}+\displaystyle B^{\prime}_{kl}=\displaystyle \frac{\partial x^i}{\partial x^{{\prime}k}} \displaystyle \frac{\partial x^j}{\partial x^{{\prime}l}}A_{ij} + \displaystyle \frac{\partial x^i}{\partial x^{{\prime}k}} \displaystyle \frac{\partial x^j}{\partial x^{{\prime}l}}B_{ij}

        =\displaystyle \frac{\partial x^i}{\partial x^{{\prime}k}} \displaystyle \frac{\partial x^j}{\partial x^{{\prime}l}}(A_{ij} + B_{ij})  eq (56)

But C_{ij} = A_{ij} + B_{ij}. Substituting this into eq (54), we see that C_{ij} transforms like a tensor:

\displaystyle C^{\prime}_{kl}=\displaystyle \frac{\partial x^i}{\partial x^{{\prime}k}} \displaystyle \frac{\partial x^j}{\partial x^{{\prime}l}}C_{ij}   eq (57)

Thus, the sum of A_{ij} and B_{ij} is another tensor.

We won’t bother to do it here, but I think it’s clear that we could show that the subtraction of a tensor from another is another tensor as well simply be replacing the + sign with a minus sign.

Note that to add or subtract tensors, these tensors have to have the same indices.

III.B.3 Multiplication

We’ve already seen one type of tensor multiplication – the outer product. When we performed the operations that allowed creation of higher ranking tensors from tensors of lower rank, it was the outer product that we used. The mechanics of this procedure is difficult to visualize for higher ranking tensors but we saw how it worked in our discussion of how vectors can be used to make rank 2 tensors.

If you want to see that the outer product of higher ranking tensors transforms like a tensor, click .

Another way to multiply tensors is to take the inner product which we can think of as the generalization of the dot (or scalar) product, which is discussed in the dot product section of my linear algebra page.

When we take the dot product of two vectors, we get a scalar. We can think of the dot product as matrix multiplication:

Let \vec{A}^m=\begin{bmatrix} a^1 \ a^2 \ a^3 \end{bmatrix}   and   \vec{B}^n=\begin{bmatrix} b^1 \ b^2 \ b^3 \end{bmatrix}. Then the dot product is given by:

\vec{A} \cdot \vec{B} = \vec{A}^T \vec{B} = \begin{bmatrix} a^1 & a^2 & a^3 \end{bmatrix}\begin{bmatrix} b^1 \\ b^2 \\ b^3 \end{bmatrix} = a^1b^1 + a^2b^2 +a^3b^3   eq (58)

Putting some numbers in, we get:

\begin{bmatrix} 3 & -2 & 1 \end{bmatrix}\begin{bmatrix} 1 \\ 2 \\ 4 \end{bmatrix} = 3 \cdot 1 + -2 \cdot 2 +1 \cdot 4 = 3 +(-4) +4 = 3   eq (59)

So we start with two rank 1 tensors (total “rank” of 2) and end up with a scalar (rank 0).

We also note that a^1b^1 + a^2b^2 +a^3b^3 = \displaystyle \sum_{i=1}^3 a_ib^i. In Einstein notation, \vec{A} \cdot \vec{B} = a_ib^i.   eq (60)

The inner product of a rank 2 tensor with a rank 1 tensor (i.e., vector) can be represented as:

\displaystyle \begin{bmatrix} a_1_1 & a_1_2 & a_1_3 \\  a_2_1 & a_2_2 & a_2_3 \\ a_3_1 & a_3_2 & a_3_3 \end{bmatrix} \begin{bmatrix} b^1 \\ b^2 \\ b^3 \end{bmatrix} = \begin{bmatrix} b^1a_1_1 + b^2a_1_2 + b^3a_1_3 \\  b^1a_2_1 + b^2a_2_2 + b^3a_2_3 \\ b^1a_3_1 + b^2a_3_2 + b^3a_3_3 \end{bmatrix} = \begin{bmatrix} b_1 \\ b_2 \\ b_3\end{bmatrix}   eq (61)

The Einstein notation for this equation is:

c_i = a_{ij}b^j   eq (62)

Notice two things:

  • We start with a total rank of 3 (i.e., a rank 2 and a rank 1 tensor) and up with a rank 1 tensor. Thus, like with the dot product of vectors (where the rank went from 2 to 0), the rank, after applying the inner product, is reduces by 2
  • When there is a covariant (downstairs) index and a contravariant (upstairs) index in the same expression, then, by Einstein summation convention, that index is summed over, becomes a scalar (i.e., a number) and “disappears” from the expression

This procedure of setting a covariant and contravariant index in the same expression equal to each other, applying the inner product (i.e., summing over that index) and producing a tensor of rank 2 less than the products with which you began is called tensor contraction.

These rules of tensor contraction also apply to higher rank tensors. For example:

S^i_j^j_l = S^i_1^1_l + S^i_2^2_l + S^i_3^3_l  = T^i_l   eq (63)

So far, I’ve tried to give some of the theory behind tensors and derive some rules by which they are manipulated. I’ve done this so that, when readers see tensors in action, they will recognize that there are valid reasons for the way they’re being manipulated – that it’s not just magic. However, once established , the most efficient way to work with tensors is simply to apply the rules. Accordingly, at this point, let me summarize a few key points about these rules:

  • Tensors are, in part, collections of numbers (components), but what determines these components is the coordinate system in which one is working.
  • There are two types of coordinates: contravariant and covariant. Contravariant components are represented with superscripted indices; covariant with subscripted indices.
  • The rank of a tensor is given by the number of indices it has.
  • Contravariant components are multiplied with superscripted indices to make tensors of rank \geq 1. Covariant components are multiplied with superscripted indices.
  • Tensors can be multiplied via the outer product to make new tensors. The new tensors inherit the indices of the tensors that are multiplied to make it:

    Tensor Outer Product

  • Tensors can be multiplied via the inner product, a generalization of the vector dot product, to yield a tensor of the original rank – 2. This occurs when the same index occurs as a subscript and superscript in the same expression. In this case, that identical upper and lower index are summed over and dropped from the final expression:



  • Tensors with the same indices can be added and subtracted, vector addition and subtraction being the model for how it’s done.

III.C Metric Tensor

III.C.1 Formation

One of the most important tensors in physics is the metric tensor. It’s used to define the geometry of spacetime. Let’s look at how it’s formed.

Consider a vector, \vec{V}. To find its length, we take the dot product with itself:

\displaystyle \vec{V} \cdot \vec{V} = V^i e_i \cdot V^j e_j
      =(e_i \cdot e_j) V^i \cdot V^j  eq (64)

Suppose we make \vec{V} very small. As it approaches 0, we’ll call this short length \vec{dr}. If we take its length, what we get is the infinitesimal length element ds. We find the length of ds in a manner similar to the way we found vector V’s length; we take the dot product with itself. But we know that:

d \vec{r} =  dx^i e_i  eq (65)

Thus,

\displaystyle ds^2  = dx^i e_i \cdot dx^j e_j
     =(e_i \cdot e_j) dx^i \cdot dx^j
     =g_{ij} dx^i \cdot dx^j   eq (66)

where g_{ij} are the components of the metric tensor. Note that the upstairs indices of the dx\text{'s} and the downstairs indices of the metric tensor create a summation and tensor contraction to a tensor of rank 0 – a scalar (i.e., a number).

For more intuition as to why e_i \cdot e_j is the metric tensor, click .

We can do the same for vectors with covariant component indices:

d \vec{r} =  dx_i e^i  eq (67)

Thus,

\displaystyle ds^2  = dx^i e_i \cdot dx_j e^j
    =(e^i \cdot e^j) dx_i \cdot dx_j
    =g^{ij} dx_i \cdot dx_j  eq (68)

When mixed index vectors are used, we have:

\displaystyle ds^2  = dx^i e_i \cdot dx_j e^j
    =(e_i \cdot e^j) dx^i \cdot dx_j
    =dx^i \cdot dx_j  eq (69)

The metric isn’t needed to obtain ds^2 in this case because, by the definition of dual basis vectors: \displaystyle e_i \cdot e^j=\delta_i^j.

In general, the metric helps define the inner product of two tensors (including the dot product of two vectors). And if we define the metric tensor at each point in space, what it does is determine the infinitesimal length element at each point in space, and therefore, the space’s geometry . But more on this in my page on general relativity. For now, let’s start by proving that the metric is indeed a tensor.

III.C.2 Symmetry

We need to show that the metric transforms like the rank 2 tensors we’ve seen so far. Although observers using different coordinate systems won’t agree on the components of an infinitesimal vector dx, we know that observers in all coordinates systems will agree on its length ds. That length is given by:

ds^2=g_{ij}dx^i dx^j

Now let’s change coordinates. The transformation equation is:

g_{ij}dx^i dx^j=g_{ij}^{\prime}}dx^m dx^n  eq (70)

The transformations for the infinitesimal length vectors go like this:

\displaystyle dx^i dx^j=\displaystyle \frac{\partial x^i}{\partial x^m}\frac{\partial x^i}{\partial x^m}dx^m dx^n  eq (71)

Using eq (71) in eq (70), we have:

g_{ij}\displaystyle \frac{\partial x^i}{\partial x^m}\frac{\partial x^i}{\partial x^m}dx^m dx^n=g_{ij}^{\prime}}dx^m dx^n  eq (72)

For both sides of eq (72) to be equal, the coefficients of dx^m dx^n have to be equal. That means that:

g_{ij}\displaystyle \frac{\partial x^i}{\partial x^m}\frac{\partial x^i}{\partial x^m}=g_{ij}^{\prime}}

So, the metric transforms like a tensor. Therefore, it is a tensor.

Next, let’s look at an important property of the metric: symmetry. The metric tensor is a symmetric tensor. That is,

g_{ij }= g_{ji}, g^{ij} = g^{ji} and g_i^j = g_j^i  eq (70)

For those interested, here’s the :

III.C.3 Raising and Lowering Indices

Another important application of the metric tensor: raising and lowering indices. It works like this:

We’ve seen previously that:

\vec{T}\cdot\vec{W}=(T^ue_u)(W^ve_v)=e_u\cdote_vT^uW^v=g_{uv}T^uW^v \quad \text{eq (III.C.3.1)}

A metric that multiplies only one vector can be regarded as a covector in that it

  • takes in a vector to produce a scalar
  • obeys linearity

Call that covector-like entity \tilde{T}. Using this, we can write eq (III.C.3.1) as:

\tilde{T}(\vec{W})=\vec{T} \cdot \vec{W} \quad \text{eq (III.C.3.2)}

The components of \tilde{T} are given by:

T_v=\tilde{T}(e_v)\quad \text{eq (III.C.3.3)}

We substitute e_v for \vec{W} in eq (III.C.3.2). We get:

T_v=\tilde{T}(e_v)=\vec{T}\cdot e_v = e_v \cdot T^u e_u = e_v \cdot e_u T^u = g_{uv}T^u. Therefore:

T_v=g_{uv}T^u \quad \text{eq (III.C.3.4)}

In other words, the metric converts the contravariant components of a vector to covariant components i.e., it lowers the contravariant component’s index.

If we start with:

T_v=g_{uv}T^u

Now multiply both sides by g^{uv}. We get:

T_vg^{uv}=g_{uv}g^{uv}T^u \quad \text{eq (III.C.3.5)}

But g_{uv}g^{uv}, by definition, is the identity matrix. Therefore:

T_vg^{uv}=IT^u which means:

T_vg^{uv}=T^u \quad \text{eq (III.C.3.6)}

In other words, the metric converts the covariant components of a vector to contravariant components i.e., it raises the contravariant component’s index.

As you might expect, since we can make more complex tensors as the tensor product of simple tensors like vectors, we can raise or lower each index of the complex tensor with a separate instance of the metric in appropriate form. Here are some examples:

g^{uv}A_{uk}=A^v_k \quad \text{eq (III.C.3.7)}

D^i_{jk}=g_{js} D^{is}_k \quad \text{eq (III.C.3.8)}

S^{ijk}=g^{il}S^{jk}_l \quad \text{eq (III.C.3.9)}

When we work with such expressions, from a purely mechanistic point of view, we simply apply the following rules:

  • Cancel any index that appears as a superscript and subscript in the same expression
  • Indices that aren’t cancelled are the indices that remain in the final expression

III.C.4 Examples

In this section, we’ll examine the metric for commonly used coordinate systems, namely polar, cylindrical and spherical coordinates. Therefore, for coordinate systems where axes are orthogonal (like the ones I just mentioned), off-diagonal elements of the metric are zero because, in such cases, e_i \cdot e_j = 0 unless i=j. Thus, in these cases, to get the metric, we simply put the terms of the line element along the diagonal elements and put zero everywhere else. Fortunately, I’ve derived the line elements for the coordinate systems under consideration elsewhere so the exercise that follows should be simple.

III.C.4.a Polar Coordinates

The line element for polar coordinates is:

ds^2 = d\vec{l} \cdot d\vec{l} = dr^2 + r^2 d\theta^2 \quad \text{eq (III.C.4.a.1)}

Therefore, the metric in polar coordinates is:

\begin{bmatrix} 1 & 0 \\ 0 & r^2\end{bmatrix} \quad \text{eq (III.C.4.a.2)}

III.C.4.b Cylindrical Coordinates

The line element for cylindrical coordinates is:

ds^2 = dr^2 + r^2 d\theta^2 + dz^2 \quad \text{eq (III.C.4.b.1)}

Therefore, the metric in polar coordinates is:

\begin{bmatrix} 1 & 0 & 0\\ 0 & r^2 &0\\ 0&0&1\end{bmatrix} \quad \text{eq (III.C.4.b.2)}

III.C.4.c Spherical Coordinates

The line element in spherical coordinates is:

ds^2 = dr^2 + r^2\,d\theta^2 + r^2\sin^2\,d\phi^2 \quad \text{eq (III.C.4.c.1)}

Thus, the metric in spherical coordinates looks like this:

\begin{bmatrix} 1 & 0 & 0\\ 0 & r^2 &0\\ 0&0&r^2\sin^2\theta\end{bmatrix} \quad \text{eq (III.C.4.c.2)}

IV. Covariant Derivative

IV.A Motivation

Vector fields describe many phenomena in science (e.g., wind speed, fluid movement, the electromagnetic force). We often need to know how these fields are changing at different points in time and space. This is a straightforward process in Euclidean space. One simply takes derivatives of vector components. There’s no need to worry about the basis vectors because, in Euclidean space, they have the same magnitude and direction everywhere. But what happens if the coordinate system we’re using is nonEuclidean i..e., the basis vectors vary from place to place, like spherical coordinates, for example. In that case, we need to take the change in basis vectors, as well as the change in coordinates, into consideration when we take a derivative.

Thus, if we have a 3-dimensional coordinate system where basis vectors e_1, e_2 and e_3 vary at different points, then, for a vector

\vec{A} = A^1e_1  + A^2e_2 + A^3e_3 \quad \text{eq (IV.A.1)}

To obtain the correct derivative, we need to apply the product rule:

\displaystyle \frac{d \vec{A}}{dx}=\displaystyle \frac{(\partial A^ie_i)}{\partial x^j} \quad \text{eq (IV.A.2)}

    =\displaystyle \frac{\parital A^i}{\partial x^j}\vec{e_i}+A^i\displaystyle \frac{\partial \vec{e}_i}{\partial x^j} \quad \text{eq (IV.A.3)}

This process of accounting for the change in basis vectors when taking a derivative is called covariant differentiation. Of course, since vectors are tensors, and higher rank tensors can be made out of vectors, this process can be generalized to higher rank tensors as well. However, we’ll stick with vectors to illustrate the principles.

The reason we need to take the basis vectors into account when we take the derivative of a tensor is that, if we don’t, we won’t get a tensor back. This creates significant problems for – say physics – where the laws of physics, represented by tensor equations, should be the same in all coordinate systems. For a more in depth illustration of this problem, click .

IV.B Christoffel Symbols

IV.B.1 Definition

The Christoffel symbol is a part of an alternative representation of the righthand-most term in eq (IV.A.3), \displaystyle \frac{\partial \vec{e}_i}{\partial x^j}:

\Gamma^k_{ij}\vec{e}_k=\displaystyle \frac{\partial \vec{e}_i}{\partial x^j} \quad \text{eq (IV.B.1.1)}

It can be thought of as the component of the vector \vec{e}_k. And like other vector components, it can change under coordinate transformations. Therefore, it’ not a tensor. Click on the link for a .

An explanation of what the indices in the Christoffel symbol mean is shown in figure IV.B.1.1.

Christoffel symbol explanation
Figure IV.B.1.1

A useful way to express the Christoffel symbol is in terms of the metric. There are two types of Christoffel symbols:

  • Christoffel symbols of the first kind: gives the change in dual basis vectors (i.e., those with a superscript index used to multiply covariant components to make a covector).
  • Christoffel symbols of the second kind: gives the change in ordinary basis vectors (i.e., those with a subscript index used to multiply contravariant components to make a vector).

In terms of the metric, Christoffel symbols of the first kind are written as:

\Gamma_{lij}=\displaystyle \frac12 \left[\displaystyle \frac{\partial g_{li}}{\partial x^j} +  \frac{\partial g_{lj}}{\partial x^i} + \frac{\partial g_{ij}}{\partial x^l}\right] \quad \text{eq (IV.B.1.2)}

In terms of the metric, Christoffel symbols of the second kind are written as:

\Gamma^l_{ij}=\displaystyle \frac12 g^{kl}\left[\frac{\partial g_{ik}}{\partial x^j} +  \frac{\partial g_{jk}}{\partial x^i} - \frac{\partial g_{ij}}{\partial x^k}\right] \quad \text{eq (IV.B.1.3)}

In this article, we’ll concern ourselves mainly with Christoffel symbols of the second kind. Derivations of eq (IV.B.1.3) can be seen by clicking .

IV.B.2 Example

Because of the tedium involved in calculating Christoffel symbols, we’ll take a simple case involving only 2 dimensions, that of polar coordinates.

Because there are 3 indices in Christoffel symbols, the potential number of symbols one might have to calculate is N^3 where N is the number of dimensions with which you’re dealing. However, it turns out the Christoffel symbols are symmetric in their lower indices (i.e., \Gamma^l_{ij}=\Gamma^l_{ji}. This is because Christoffel symbols are what’s called torsion-free connection coefficients. I won’t explain this here but a conceptual explanation can be found at https://profoundphysics.com/christoffel-symbols-a-complete-guide-with-examples/.

In our case, there are 2^3 possible combinations of indices but, because of the symmetry of the lower indices, only 6 independent combinations need to be calculated:

\Gamma^1_{11}
\Gamma^1_{12}=\Gamma^1_{21}
\Gamma^1_{22}
\Gamma^2_{11}
\Gamma^2_{12}=\Gamma^1_{21}
\Gamma^2_{22}

As preparation for performing these calculations, recall the formula for the Christoffel symbol in terms of the metric:

    \[ \Gamma^l_{ij}=\displaystyle \frac12 g^{kl}\left[\frac{\partial g_{ik}}{\partial x^j} +  \frac{\partial g_{jk}}{\partial x^i} + \frac{\partial g_{ij}}{\partial x^k}\right] \quad \text{eq (IV.B.1.3)} \]

Remember also that we are summing over k. Therefore, any term \Gamma^l_{ij}=\displaystyle \frac12 g^{kl} where k \neq l equals zero. We also know that the metric in polar coordinates is

\begin{bmatrix} 1&0\\0&r^2\end{bmatrix} \quad \text{eq (III.C.4.a.2)}

Thus, the only partial derivative that’s nonzero is \displaystyle \frac{\partial g^{22}}{\partial x^1}=\frac{\partial r^2}{\partial r}.

Finally, the inverse metric (that appears in the term \displaystyle \frac12 g^{kl}) is

\begin{bmatrix} 1&0\\0&\displaystyle \frac{1}{r^2}\end{bmatrix} \quad \text{eq (IV.B.1.4)}

Considering these facts, we have:

\Gamma^1_{11}=\displaystyle \frac12 g^{11}\left[\displaystyle \frac{\partial g_{11}}{\partial x^1} +  \frac{\partial g_{11}}{\partial x^1} + \frac{\partial g_{11}}{\partial x^1}\right] + \cancel{g^{21}[\dots]}
  =\,\displaystyle \frac12 (1)\left[ \displaystyle \frac{\partial (1)}{\partial r} +  \frac{\partial (1)}{\partial r} - \frac{\partial(1)}{\partial r}\right]
  =0

\Gamma^1_{12}=\displaystyle \frac12 g^{11}\left[\displaystyle \frac{\partial g_{11}}{\partial x^2} +  \frac{\partial g_{21}}{\partial x^1} - \frac{\partial g_{12}}{\partial x^1}\right] + \cancel{g^{21}[\dots]}
  =\displaystyle \frac12 (1)\left[\displaystyle \frac{\partial (1)}{\partial \theta} +  \frac{\partial (0)}{\partial r} - \frac{\partial (0)}{\partial r}\right]
  =0

\Gamma^1_{22}=\displaystyle \frac12 g^{11}\left[\frac{\partial g_{11}}{\partial x^2} +  \frac{\partial g_{21}}{\partial x^1} - \frac{\partial g_{22}}{\partial x^1}\right] + \cancel{g^{21}[\dots]}
   =\displaystyle \frac12 (1)\left[\frac{\partial (1)}{\partial x\theta} +  \frac{\partial (0)}{\partial r} - \frac{\partial r^2}{\partial r}\right]
  =\displaystyle (\frac12) \left[0 +  0 - 2r\right]
  =-r

\Gamma^2_{11}=\cancel{g^{12}[\dots]} + \displaystyle \frac12 g^{22}\left[\frac{\partial g_{12}}{\partial x^1} +  \frac{\partial g_{12}}{\partial x^1} - \frac{\partial g_{11}}{\partial x^2}\right]
  =\displaystyle \frac{1}{2r^2}\left[\frac{\partial (0)}{\partial r} +  \frac{\partial (0)}{\partial r} - \frac{\partial (1)}{\partial \theta}\right]
  =0

\Gamma^2_{12}=\cancel{g^{12}[\dots]} + \displaystyle \frac12 g^{22}\left[\frac{\partial g_{12}}{\partial x^2} +  \frac{\partial g_{22}}{\partial x^1} - \frac{\partial g_{12}}{\partial x^2}\right]
  =\displaystyle \frac{1}{2r^2}\left[\frac{\partial (0)}{\partial \theta} +  \frac{\partial r^2}{\partial r} - \frac{\partial (0)}{\partial \theta}\right]
  =\displaystyle \frac{1}{2r^2}\left[ 0+2r+0 \right]
  =1/r

\Gamma^2_{21}=\cancel{g^{12}[\dots]} + \displaystyle \frac12 g^{22}\left[\frac{\partial g_{22}}{\partial x^1} +  \frac{\partial g_{22}}{\partial x^2} - \frac{\partial g_{22}}{\partial x^2}\right]
  =\displaystyle \frac{1}{2r^2}\left[\frac{\partial r^2}{\partial r} +  \frac{\partial r^2}{\partial \theta} - \frac{\partial r^2}{\partial \theta}\right]
  =\displaystyle \frac{1}{2r^2}\left[ 2r + 0 + 0\right]
  =1/r

I include this calculation here to show that, indeed, \Gamma^2_{12}=\Gamma^2_{21}.

\Gamma^2_{22}=\cancel{g^{12}[\dots]} + \displaystyle \frac12 g^{22}\left[\frac{\partial g_{22}}{\partial x^2} +  \frac{\partial g_{22}}{\partial x^2} - \frac{\partial g_{22}}{\partial x^2}\right]
  =\displaystyle \frac{1}{2r^2}\left[\frac{\partial r^2}{\partial \theta} +   \frac{\partial r^2}{\partial \theta} +   \frac{\partial r^2}{\partial \theta}\right]
  =\displaystyle \frac{1}{2r^2}\left[0+0+0 \right]
  =0

From these calculations, we can see that only 3 terms are nonzero:

\Gamma^1_{22}=-r \quad \text{eq (IV.B.1.5)}
\Gamma^2_{12}=1/r \quad \text{eq (IV.B.1.6)}
\Gamma^2_{12}=1/r \quad \text{eq (IV.B.1.7)}

This is typical; in most cases, the majority of terms equal zero. We can also see how laborious even the simplest case is. Normally, such calculations are done by a computer. There is actually a quicker way to do these calculations that involves the Euler-Lagrange equation and the geodesic equation (which we’ll discuss shortly). I won’t go into this method now. However, it is discussed in the reference I alluded to previously: https://profoundphysics.com/christoffel-symbols-a-complete-guide-with-examples

IV.C Covariant Derivative Derivation

Now that we know something about Christoffel symbols, we can move on to deriving the covariant derivative.

We’ve seen how to take the derivative of a vector in non-Cartesian coordinates (where basis vectors are not the same everywhere):

\displaystyle \frac{\partial \vec{V}}{\partial x^j}=\frac{\partial V^i}{\partial x^j}e_i+V^i\frac{\partial e_i}{\partial x^j} \quad \text{eq (IV.C.1)}

We defined the Christoffel symbol as:

\displaystyle \frac{\partial e_i}{\partial x^j}=\Gamma^k_{ij}\,e_k \quad \text{eq (IV.C.2)}

Substituting eq (IV.C.2) into eq (IV.C.1) yields:

\displaystyle \frac{\partial \vec{V}}{\partial x^j}=\frac{\partial V^i}{\partial x^j}e_i+V^i\Gamma^k_{ij}\,e_k\quad \text{eq (IV.C.3)}

i and k are dummy indices, and therefore, can be swapped. We have:

\displaystyle \frac{\partial \vec{V}}{\partial x^j}=\frac{\partial V^i}{\partial x^j}e_i+V^k\Gamma^i_{kj}\,e_i\quad \text{eq (IV.C.4)}

We factor out e_i and get:

\displaystyle \frac{\partial \vec{V}}{\partial x^j}=\left(\frac{\partial V^i}{\partial x^j}+V^k\Gamma^i_{kj}\right)e_i\quad \text{eq (IV.C.5)}

The term in parentheses is called the covariant derivative:

\displaystyle \frac{\partial V^i}{\partial x^j}+V^k\Gamma^i_{kj}\quad \text{eq (IV.C.6)}

It represents the component of the derivative of vector \vec{V} in the i direction and – as opposed to Christoffel symbols alone – it is a tensor. Click on the link for a .

Various notations are in use to denote the covariant dervative. \displaystyle \frac{\partial V^i}{\partial x^j}, V^i_{;j}, \nabla_jV^i and D_jV^i are all equivalent expressions.

For covectors we have:

\displaystyle \frac{\partial V_i}{\partial x^j}=V_{i;j}=\displaystyle \frac{\partial V_i}{\partial x^j}-V_k\Gamma^k_{ij}\quad \text{eq (IV.C.7)}

To see why we have a minus sign in this equation, click .

To better see how covariant differentiation works in practice, we can take the example of the covariant derivative of \vec{V} with respect to \theta in polar coordinates:

\displaystyle V^i_{;j}=\frac{\partial V^i}{\partial x^j}+V^k\Gamma^i_{kj}\quad \text{eq (IV.C.8)}

In this case, x^1=r and x^2=\theta. Thus we have:

\displaystyle V^r_{; \theta}=\frac{\partial V^r}{\partial \theta} + V^r\Gamma^r_{r \theta} + V^{\theta}\Gamma^r_{\theta \theta} \quad \text{eq (IV.C.9)}

In the previous section on Christoffel symbols, we calculated that:

\Gamma^r_{r \theta}=\Gamma^1_{12}=0
\Gamma^r_{\theta \theta}=\Gamma^1_{22}=-r
\displaystyle\Gamma^{\theta}_{r \theta}=\Gamma^2_{12}=\frac1r
\Gamma^{\theta}_{\theta \theta}=0

So

\displaystyle V^r_{; \theta}=\frac{\partial V^r}{\partial \theta} + V^r(0) + V^{\theta}(-r)

   =\displaystyle \frac{\partial V^r}{\partial \theta} + V^{\theta}(-r) \quad \text{eq (IV.C.10)}

For the second term involving a Christoffel symbol, we have:

\displaystyle V^{\theta}_{; \theta}=\frac{\partial V^{\theta}}{\partial \theta} + V^r\Gamma^r_{r \theta} + V^{\theta}\Gamma^{\theta}_{\theta \theta} \quad \text{eq (IV.C.11)}

So

\displaystyle V^{\theta}_{; \theta}=\frac{\partial V^{\theta}}{\partial \theta} + V^r(\frac1r) + V^{\theta}(0)

   =\displaystyle \frac{\partial V^{\theta}}{\partial \theta} + V^r(\frac1r) \quad \text{eq (IV.C.12)}

The complete covariant derivative of \vec{V} with respect to \theta is then:

\displaystyle \frac{\partial\vec{V}}{\partial \theta}=\left( \frac{\partial V^r}{\partial \theta} + V^{\theta}(-r) \right)\vec{e}_r + \left( \frac{\partial V^{\theta}}{\partial \theta} + V^r(\frac1r) \right)\vec{e}_{\theta} \quad \text{eq (IV.C.13)}

To apply covariant differentiation to higher rank tensors, on simply adds a Christoffel symbol for each index on the tensor, using a + sign for upper indices and a – sign for lower indices. For example:

\displaystyle A^{ij}_{;k}=\frac{\partial A ^{ij}}{\partial x^k}+ A^{lj}\Gamma^i_{lk} + A^{il}\Gamma^j_{lk} \quad \text{eq (IV.C.14)}

\displaystyle B_{ij;k}=\frac{\partial B _{ij}}{\partial x^k} -B_{lj}\Gamma^l_{ik} - B_{il}\Gamma^l_{jk} \quad \text{eq (IV.C.15)}

\displaystyle C^i_{j;k} = \frac{\partial C ^i_j}{\partial x^k} + C^l_j\Gamma^i_{lk} - C^i_l\Gamma^l_{jk} \quad \text{eq (IV.C.16)}

IV.D Covariant Derivative of the Metric

Method 1

We can use a method similar to the Leibniz product rule (called metric compatibility) to get a general expression for the covariant derivative of the metric tensor:

\nabla_{\vec{w}}(T \otimes S) = (\nabla_{\vec{w}}T) \otimes S + T \otimes  (\nabla_{\vec{w}}S) \quad \text{eq (IV.D.1)}

So

\nabla_{\partial_i}(g_{rs}) = \nabla_{\partial_i}(g_{rs}\epsilon^r \otimes \epsilon^s)

      =\nabla_{\partial_i}(g_{rs})(\epsilon^r \otimes \epsilon^s) + g_{rs}  \nabla_{\partial_i}(\epsilon^r \otimes \epsilon^s)

      =\partial_i(g_{rs})(\epsilon^r \otimes \epsilon^s) + g_{rs} \Bigl( \bigl(\nabla_{\partial_i} \epsilon^r \bigr) \otimes \epsilon^s + \epsilon^r \otimes \bigl(\nabla_{\partial_i} \epsilon^s \bigr) \Bigr)

      =\partial_i(g_{rs})(\epsilon^r \otimes \epsilon^s) + g_{rs} \Bigl( \bigl(-\Gamma^r_{ik} \epsilon^k \bigr) \otimes \epsilon^s + \epsilon^r \otimes \bigl(-\Gamma^s_{ik} \epsilon^k \bigr)  \Bigr)

      =\partial_i(g_{rs})(\epsilon^r \otimes \epsilon^s) -\Bigl( g_{rs} \Gamma^r_{ik} (\epsilon^k \otimes \epsilon^s) + g_{rs} \Gamma^s_{ik} (\epsilon^r \otimes \epsilon^k) \Bigr)

We change the names of dummy indices so, ultimately, we can factor out a common tensor product:

      =\partial_i(g_{rs})(\epsilon^r \otimes \epsilon^s) -\Bigl( g_{ks} \Gamma^k_{ir} (\epsilon^r \otimes \epsilon^s) + g_{rk} \Gamma^k_{is} (\epsilon^r \otimes \epsilon^s) \Bigr)

      =\displaystyle \left[ \frac{g_{rs}}{\partial u^i} - g_{ks} \Gamma^k_{ir} -  g_{rk} \Gamma^k_{is} \right] (\epsilon^r \otimes \epsilon^s) \quad \text{eq (IV.D.2)}

By metric compatibility:

\nabla_{\vec{w}}(\vec{v} \cdot \vec{u}) = (\nabla_{\vec{w}}\vec{v}) \cdot \vec{u} + \vec{v}(\nabla_{\vec{w}}\vec{u}) \quad \text{eq (IV.D.3)}

Because \vec{v} \cdot \vec{u} is a scalar, we can convert the covariant derivative to the regular partial derivative. We know that:

g_{rs} = \vec{e}_r \cdot \vec{e}_s  \quad \text{eq (IV.D.4)}

Thus, applying metric compatibility, we get:

\displaystyle \frac{\partial g_{rs}}{\partial u^i}  = \frac{\partial (\vec{e}_r \cdot \vec{e}_s )}{\partial u^i}

    = \displaystyle \frac{\partial \vec{e}_r }{\partial u^i} \cdot \vec{e}_s + \vec{e}_r  \cdot \frac{\partial \vec{e}_s }{\partial u^i} \quad \text{eq (IV.D.5)}

But, by definition:

\displaystyle \frac{\partial \vec{e}_r }{\partial u^i} = \Gamma^k_{ir}\cdot e_k   and   \displaystyle \frac{\partial \vec{e}_s }{\partial u^i} = \Gamma^k_{is}\cdot e_k \quad \text{eq (IV.D.6)}

Substituting eq (IV.D.6) into eq (IV.D.5), we obtain:

\displaystyle \frac{\partial g_{rs}}{\partial u^i}  = \displaystyle \Gamma^k_{ir}\cdot (e_k \cdot \vec{e}_s) + \Gamma^k_{is}\cdot (\vec{e}_r \cdot e_k)

    =\displaystyle \Gamma^k_{ir} g_{ks} + \Gamma^k_{is} g_{rk} \quad \text{eq (IV.D.7)}

When we put eq (IV.D.7) back into eq (IV.D.2), we’re left with:

\nabla_{\partial_i}(g)  =  \displaystyle \left[ \frac{g_{rs}}{\partial u^i} - (\underbrace{g_{ks} \Gamma^k_{ir} + g_{rk} \Gamma^k_{is}}_{\displaystyle \frac{\partial g_{rs}}{\partial u^i}}) \right] (\epsilon^r \otimes \epsilon^s)

    \displaystyle = 0(\epsilon^r \otimes \epsilon^s)

    = 0 \quad \text{eq (IV.D.8)}

In other words, the covariant derivative of the metric is zero in all directions.

Method 2

From eq. IV.C.15, we know that the covariant derivative of a tensor like the metric is given by:

    \begin{align*} g_{\alpha \beta ; \gamma} & =g_{\alpha \beta, \gamma}-\Gamma_{\alpha \gamma}^{\mu} g_{\mu \beta}-\Gamma_{\beta \gamma}^{\mu} g_{\alpha \mu} \\ \\ & =g_{\alpha \beta, \gamma}-\frac{1}{2} g^{\mu \nu}\left(-g_{\alpha \gamma, \nu}+g_{\alpha \nu, \gamma}+g_{\gamma \nu, \alpha}\right) g_{\mu \beta} \\ &\quad -\frac{1}{2} g^{\mu \nu}\left(-g_{\beta \gamma, \nu}+g_{\beta \nu, \gamma}+g_{\gamma \nu, \beta}\right) g_{\alpha \mu} \\ \\ & =g_{\alpha \beta, \gamma}-\frac{1}{2} \delta^{\nu}_{\beta}\left(-g_{\alpha \gamma, \nu}+g_{\alpha \nu, \gamma}+g_{\gamma \nu, \alpha}\right) g_{\mu \beta} \\ &\quad -\frac{1}{2} \delta^{\nu}_{\alpha}\left(-g_{\beta \gamma, \nu}+g_{\beta \nu, \gamma}+g_{\gamma \nu, \beta}\right) g_{\alpha \mu} \\ \\ & =g_{\alpha \beta, \gamma}-\frac{1}{2}\left(-g_{\alpha \gamma, \beta}+g_{\alpha \beta, \gamma}+g_{\gamma \beta, \alpha}\right) \\  &\quad -\frac{1}{2}\left(-g_{\beta \gamma, \alpha}+g_{\beta \alpha, \gamma}+g_{\gamma \alpha, \beta}\right) \\ \\ & =0 \end{align*}