Bell’s Inequality 2

Contents

Introduction

As alluded to in the first installment on Bell’s inequality, the debate between those who believed in local realism/local hidden variables (championed by Einstein) and quantum mechanics (championed by Bohr) raged on for several decades, beginning in the late 1920’s, seemingly without any hope for resolution. Then, in 1964, Irish physicist John Bell wrote a paper (published in 1966) outlining a method that promised to break this impasse, a paper that physicist Henry Stapp called “the most profound discovery of science.”1  A description of this paper, attempting to explain the mathematics it contains, step-by-step, so that those without a mathematical background might still understand it, is the goal of this article.

Spin

Figure 1

Bell considers a thought experiment designed by Bohm and Aharanov utilizing particles entangled with regards to a property called spin. Spin is a purely quantum phenomenon akin to angular momentum that, along with charge, gives particles their magnetic properties. The closest thing to it in classical mechanics is to image a spherical charged particle spinning on its axis. Such a changing electrical field will create a magnetic field that makes the particle behave in a magnetic field like a tiny bar magnet. The direction of the magnetic field (or magnetic moment) so-created can be determined by a thing called the right-hand rule: curl your fingers in the direction of a positively charged particle’s rotation and stick your thumb upward. The direction in which your thumb is pointing is the direction of the north pole of the magnet. When placed in a magnetic field,  the north pole of the miniature bar magnet (or magnetic moment) will align with the magnetic field. The spin of a particle can be measured by a process called a Stern-Gerlach experiment. A Stern-Gerlach experiment employs an apparatus (we’ll call it an S-G device) consisting of a tunnel with two parallel magnetic plates, one “north-pointing” portion on one side and a “south-pointing” portion on the opposite side. In the original experiment described by Stern and Gerlach, a neutron oven shoots neutrons into the S-G device. The neutrons have a magnetic moment. This may be surprising since, even if neutrons have spin, they are electrically neutral; they have no net electrical charge. Therefore, a spinning neutron should‐at least according to classical physics‐have no magnetic moment. But it does. That’s because it’s made up of quarks which have spin and charge, and thus, magnetic moments. If you combine them, you get a net magnetic moment. At any rate, the magnetic moments of the neutrons that are shot out of the neutron oven are initially oriented in random directions. They pass through the S-G device. The magnets in the device deflect the neutrons and they are detected onto a screen at the other end of the apparatus. Since the distribution of magnetic moment directions is random, all directions should have an equal probability of being present. According to classical mechanics, the degree of deflection of any given neutron should be proportional to the magnitude of the magnetic field within the S-G device and the magnitude of the component of the magnetic moment in the direction of the magnetic field within the S-G device:

Figure 2

Therefore, the pattern of neutron impressions on the detection screen should be a uniform straight line. What is actually found, however, are two lumps of neutron impressions, one toward the north pole side of the S-G device and one toward the south pole side of the S-G device:

Figure 32

This is because the laws of quantum mechanics, not classical mechanics, govern the behavior of such subatomic particles. According to quantum mechanics, each neutron spin is in a superimposition of states that includes all possible angles-until it interacts with the S-G device and a measurement is made. At that time, because we’re measuring using a basis defined by the magnetic field in the S-G device, the neutron spin must assume one of two states: spin up (with its north pole aligned  with the magnetic field of the S-G device) or spin down (with its north pole aligned  against the magnetic field of the S-G device). Similar to the polarization of photons discussed in “Bell’s Inequality 1,” the probability that any given neutron will align with or against the magnetic field of the the S-G device is determined by the square of the probability amplitudes associated with the spin up and spin down neutron spin states. These, in turn, are determined by the angle between the magnetic field of the S-G device and the direction of the magnetic moment of the neutron. So the likelihood that a given neutron will be deflected upward or downward depends on the angle of its magnetic moment. However, in any given experiment, it will be deflected only in the upward or downward directions, toward the same two spots on the detection screen, never at any other angle.

Experimental set-up

The hypothetical experiment that Bell considers in his paper uses entangled electrons-spin 1/2 particles that are anti-correlated. Spin 1/2 means that they can assume only one of two possible states in the presence of an external magnetic field. Anti-correlated means that each entangled particle will always have a spin that is opposite to its partner. For example, if electron A is in the spin up state, then the spin state of its entangled partner-electron B-will, with absolute certainty, be spin down, and vice versa.

The experiment that Bell considered is as follows. An entangled electron pair is created and sent off in opposite directions, to Stern-Gerlach devices and detection screens that are set up far enough apart such that, if measured simultaneously, within limits of error, no signal sent at the speed of light could possibly get from one to the other to “tell” its entangle partner how to behave (i.e, they are what’s called “space-like separated”).

That’s simple enough. Now on to the math.

Proof

Bell begins by expressing a measurement, mathematically, as \vec{\sigma}_i\cdot \vec{a}\,\,\, where \,\vec{\sigma}_i\, represents the spin of an entangled electron and \,\vec{a}\, is a unit vector at some specific angle of measurement. \vec{\sigma}_1\, and \,\vec{\sigma}_2\, then, are an entangled electron pair. Because the electrons are entangled, if \,\vec{\sigma}_1\cdot \vec{a}\, is measured as +1, then, with 100% certainty, \,\vec{\sigma}_2\cdot \vec{a}\, will be measured as -1, and vice versa. Bell notes that  because 1) the measuring devices are far enough apart 2) the measurement are performed near simultaneously, and 3)nothing in the universe can go faster than the speed of light, this result could not have come about due to one particle influencing the other. Then, for the purposes of his discussion, he takes the viewpoint of Einstein, Podolsky and Rosen (EPR), reasoning that there must have been that some hidden variable (or collection of hidden variables), acting at the time the particles interacted, “programming” the particles to behave as they did. He calls such hidden variables(s) \lambda. Thus, he says, the result, A, of measuring \,\vec{\sigma}_1\cdot\vec{a}\, depends on \vec{a} and \lambda; and the result, B, of measuring \,\vec{\sigma}_2\cdot\vec{b}\, depends on \vec{b} and \lambda. In addition, as discussed in our brief introduction, the results of any measurement, A or B, can only be +1 or -1 (i.e., spin up or spin down). Mathematically, this is expressed as

A(\vec{a},\lambda) = \pm1,\ B(\vec{b},\lambda) = \pm1

And as discussed above, an assumption vital to this article is that the result B for particle 2 does not depend on the setting, \vec{a}, of the magnet for particle 1, nor A on \vec{b}.

He goes on to make the following arguments:

If \rho(\lambda) is the probability distribution of \lambda then the expectation value (or average) of the product of the two components \vec{\sigma}_1\cdot\vec{a} and \vec{\sigma}_1\cdot\vec{b} is

P(\vec{a},\vec{b})=\int{d\lambda\rho(\lambda)A(\vec{a},\lambda)B(\vec{b},\lambda)}

This should be equivalent to what quantum mechanics predicts will happen-the so-called expectation value (i.e., the average or mean)

\left\langle\vec{\sigma}_1\cdot\vec{a},\vec{\sigma}_2\cdot\vec{b}\right\rangle=-\vec{a}\cdot\vec{b}

Before proceeding, it may be helpful to say a few words about probability distributions and their properties.

Probability distributions

We said that \rho(\lambda) is a probability distribution. A probability distribution is a plot of some entity versus its probability of occurrence. For example, say nine students take a test. A probability distribution of their scores are as follows:

ScoreProbability
x_1 = 601/9= 0.11 = 11.1%
x_2 = 702/9 = 0.22 = 22.2%
x_3 = 803/9 = 0.33 = 33.3%
x_4 = 902/9 = 0.22 = 22.2%
x_5 = 1001/9 = 0.11 = 11.1%
Probability distribution of test scores
Figure 4

There are two things to note about the probability distribution:

First, the probabilities add up to 1 (since we’re expressing them as a decimal; if we were expressing them as percentages, then they would have to add up to 100%). 

\frac{1}{9}+\frac{2}{9}+\frac{3}{9}+\frac{2}{9}+\frac{1}{9}=1

Second, if we multiply the value of the test score by it’s probability (expressed as a decimal) then we’ll wind up with the average (or mean) of the scores.

The mathematical shorthand for this is:

\displaystyle\sum_{i=1}^n\rho(x_i)\cdot{x_i}    where

\rho(x_i)\, represents the probability of x_i
x_i represents the values of the test scores
i is an index corresponding to one of the values in the table above
n is the maximum value of i = the total # of test scores

So \displaystyle\sum_{i=1}^n\rho(x_i)\cdot{x_i}=\rho(x_1)\cdot{x_1}+\rho(x_2)\cdot{x_2}\rho(x_3)\cdot{x_3}\rho(x_4)\cdot{x_4}\rho(x_5)\cdot{x_5}

Putting in some numbers:

\begin{array}{rcl}\text{mean}&=&\frac19\cdot60\,+\,\frac29\cdot70\,+\,\frac39\cdot80\,+\,\frac29\cdot90\,+\,\frac19\cdot100\\&=&\,\frac{1\cdot60}{9}\,\,\,\,+\,\,\frac{2\cdot70}{9}\,\,\,\,+\,\,\,\frac{3\cdot80}{9}\,\,\,\,+\,\,\,\,\frac{2\cdot90}{9}\,\,\,+\,\,\,\frac{1\cdot100}{9}\\&=&\,\,\frac{60}{9}\,\,\,\,\,\,+\,\,\,\frac{140}{9}\,\,\,\,\,+\,\,\,\frac{240}{9}\,\,\,\,\,+\,\,\,\,\frac{180}{9}\,\,\,\,\,+\,\,\,\,\frac{100}{9}\\&=&\frac{\,\,\,60\,\,\,\,\,\,\,\,\,\,+\,\,\,\,\,\,\,\,\,140\,\,\,\,\,\,\,\,+\,\,\,\,\,\,\,240\,\,\,\,\,\,\,\,\,\,+\,\,\,\,\,\,\,\,180\,\,\,\,\,\,\,\,\,\,+\,\,\,\,\,\,\,\,100}{9}\\&=&\frac{720}{9}\\&=&80\end{array}

In Bell’s paper, the entity that’s being considered is \lambda, the values of the hidden variables. \lambda corresponds to x in the above formulas. In the discussion found in Bell’s Inequality 1, look at the distribution of results from that article (second column from the right). A summary is as follows:

From the above table, we can see that there are 24 possibilities, 3 from each of the hidden variable programs (e.g., A+B-C+). A plot of probability of each of the hidden variable programs analogous to the probability plot for the test scores is shown below:

Figure 5

The probability plot shown above is for discrete variables. In the above chart, there are 8 discrete categories for which probabilities are plotted. In Bell’s paper, the variable for which the probabilities are plotted are continuous. In the experimental setup in the paper, the variable under consideration, \lambda, would be the hidden “program” that “tells” an entangled electron what measurement to assume at each angle at which a Stern-Gerlach device is pointed to measure it. It would be a number between 0˚ and 360˚-any number, not just integers-an infinite number of numbers. A plot of such a variable might look something like this:


Figure 6

Since it’s a probability distribution, the total probability of occurrence of the hidden programs associated with “each degree” in the graph above, like the total probability in all of the other previously considered diagrams, add up to 1. But as we noted, the number of degrees and their associated probabilities that need to be considered with a continuous variable are infinite. So how do we calculate that total probability? The answer is “by use of an integral.” 

Mathematical Interlude: The Integral

So what is an integral? An integral is a mathematical tool to find the area under a curve. To see how it works, consider the following diagram:

Figure 7

We can at least get an estimate of the area under the curve in the graph shown above by drawing rectangles. Note that, in the graph, there is a considerable amount of “white” between the blue rectangles and the curve. This indicates that the accuracy of our estimate is limited.

We can make our estimate better by making the rectangles we use to measure narrower, like so:

Figure 8

The above estimate looks better, with less white space. This suggests that if we continue to make our rectangles narrower and narrower, eventually making them infinitesimally small, we should eventually get the true area under the curve:

Figure 9

The above graphs can be described mathematically as follows:

\displaystyle\int\limits_a^b f(x)dx\approx\sum_{i=1}^{n}f(x_i)\Delta x_i   where \Delta x = \frac{b-a}{n}

In the above equation,

  • f(x) is the value of the curve, specified on the y-axis.
  • dx is an infinitesimal displacement along the x-axis.
  • f(x)dx means multiply f(x) and dx together.
  • \int is the integral sign. It means, find the area under the curve specified by f(x) and the x axis that lies between x=b and x=a.
  • The right side of the equation tells us how to find the integral on the left side of the equation.
  • \approx means that the right side of the equation is an approximation of the left side.
  • The manner in which this approximation is carried out is by adding up the area of a bunch of very narrow rectangles together-that is, take their sum. The mathematical symbols that tells us to take a sum is \sum.
  • The thing that we’re to take the sum of is to the right of the \sum symbol. In this case, the equation says to take the sum of the product of f(x) times \Delta x at multiple sites along the x axis.
  • The product f(x)\Delta x gives the area under the very narrow rectangles.
  • i tells us where along the x axis to take those products.
  • \Delta x is the width of the rectangles whose areas we’re going to add up to approximate the integral. This is given by dividing b-a (the length along the x-axis for which we’re going to measure the area under f(x)) by n, the number of tiny rectangles whose areas we’re going to add together.
  • The number below and on top of the \sum sign are references to the various tiny rectangles we’re going to add up. i=1 refers to the first rectangle we’re going to use to contribute to our sum, under the left-most aspect of the curve.
  • n is an integer that refers to the right-most rectangle we’re going to use. 

The expression \displaystyle\sum_{i=1}^{n}f(x_i)\Delta x_i is called a Riemann sum (named after Bernhard Riemann, the eighteenth century mathematician who invented it). Application of the above mathematical methods to figure 7 may make the way that these methods work more clear.

Figure 7 can be expressed mathematically as follows:

Here are a few comments of explanation regarding the above equations:

  •  f(x_1) is the value on the vertical (y) axis corresponding to the left-hand margin of the left-most first rectangle. It is the height of the rectangle.
  • \Delta x_1 is the width of the base of that rectangle.
  • The area of that rectangle is the height times the base, i.e., f(x_1)\Delta x_1.
  • Add up the areas of all of the rectangles and we have an estimate of the area under the curve.

We could go through the same exercise with figure 8 but it would be tedious and essentially the same as that given for figure 7. Therefore, let’s move on to figure 9 and give the mathematical expression for the exact value of the area under a curve:

\displaystyle\int\limits_a^b f(x)dx=\lim_{n\to\infty}\sum_{i=1}^{n}f(x_i)\Delta x_i   where \Delta x = \frac{b-a}{n}

In this expression, \displaystyle\lim_{n\to\infty} means that, in order to get an exact value for the area under the curve, f(x), we would need to take the sum of the area of an infinite number of very narrow rectangles. As the above diagrams show, as the number of rectangles becomes very large-and their widths become very narrow-they “point to” the exact value.

We can apply the above mathematical technique to a probability distribution. By definition, a probability distribution is a cataloguing of the chances that a single event will unfold in a specific way. The event is destined to occur with 100% certainty. The percentages associated with each of the ways that the event can unfold, therefore, have to add up to 100%. Percentages can be converted to decimals by dividing the percentage by 100. The total percentage of the event happening can also be converted to a decimal: 100/100 = 1. The area of each of the little rectangles that we add up to to get the area under the probability distribution curve correspond to the probabilities we add up to get the total probability. The total probability, expressed as a decimal, is 1. Therefore, the area under the probability distribution curve is 1. I mention this fact now because it will come in handy at a later time in this article.

Proof (cont’)

Of course, the reason we went through the above primer on integrals is because they are an integral (pun intended) part of the remainder of the proof. 

Let’s go back to the integral with which Bell begins:

P(\vec{a},\vec{b})=\int{d\lambda\rho(\lambda)A(\vec{a},\lambda)B(\vec{b},\lambda)}

Recall that A and B are measurements made widely separated in space. A depends on the vector \vec{a}, the angle at which the measurement at A is made, and \lambda, the hidden variable. B depends on the vector \vec{b}, the angle at which the measurement at B is made, and \lambda, the hidden variable. These measurements are both either plus or minus 1. And whatever the measurement at A is, if the angle of measurement at both sites is the same, then the measurement at B will be the opposite of that at A. (As we shall see, fortunately, this is not necessarily the case if the angle of measurement at A and B are different.)

The above integral is analogous to a combination of 1) the calculation of the mean of a discrete probability distribution and 2) calculation of the area under a continuous probability distribution, described previously:  

  • \rho(\lambda) is analogous to the value of f(x) on those probability distributions (i.e., the height of the infinitesimal rectangles that need to be added up)
  • d\lambda is the width of the base of those tiny rectangles
  • \rho(\lambda)d\lambda is the area of the infinitesimal rectangles; if we add them all up we’ll get 1
  • A and B are measurements; \pm1; we’re working under the assumption that there is some program specified by the hidden variables, \lambda, that determines what A and B will be for each angle of measurement, a and b
  • For each value of \lambda, we multiply the product of A and B by the area of each of the little rectangles (which represents the probability of occurrence for each value of lambda). Like in the mean calculation of the discrete probability distribution, this gives us a weighted average, the average (or mean) of all of the A's and B's.
  • That mean is the value on that left side of the equation, P(\vec{a},\vec{b}) 

Because the particles being measured at A and B are entangled, if measured at the same angle (i.e., \vec{a}=\vec{b}), the measurements should be the inverse of each other. If that angle of measurement is \vec{b}, then -A(\vec{b},\lambda)=B(\vec{b},\lambda)

Substituting this result into our original integral, we get

P(\vec{a},\vec{b})=\int{d\lambda\rho(\lambda)A(\vec{a},\lambda)\cdot -A(\vec{b},\lambda)}\quad\Rightarrow

P(\vec{a},\vec{b})= -\int{d\lambda\rho(\lambda)A(\vec{a},\lambda)\cdot A(\vec{b},\lambda)}

Now consider measuring at another angle given by another vector, \vec{c}. Similar to the equation

P(\vec{a},\vec{b})=\int{d\lambda\rho(\lambda)A(\vec{a},\lambda)B(\vec{b},\lambda)}       is the equation

P(\vec{a},\vec{c})=\int{d\lambda\rho(\lambda)A(\vec{a},\lambda)C(\vec{c},\lambda)}

For the same reasons that B(\vec{b},\lambda) was found to equal -A(\vec{b},\lambda)  so  C(\vec{c},\lambda)=-A(\vec{c},\lambda). And thus,

P(\vec{a},\vec{c})= -\int{d\lambda\rho(\lambda)A(\vec{a},\lambda)\cdot A(\vec{c},\lambda)}

Now subtract P(\vec{a},\vec{c}) from both sides of  P(\vec{a},\vec{b})= -\int{d\lambda\rho(\lambda)A(\vec{a},\lambda)\cdot A(\vec{b},\lambda)}

\begin{array}{rcl}P(\vec{a},\vec{b})-P(\vec{a},\vec{c})&=&-\int{d\lambda\rho(\lambda)A(\vec{a},\lambda)A(\vec{b},\lambda)}\,-\,\left[-\int{d\lambda\rho(\lambda)A(\vec{a},\lambda)A(\vec{c},\lambda)}\right]\\&=&-\int{d\lambda\rho(\lambda)A(\vec{a},\lambda)A(\vec{b},\lambda)}\,+\,\int{d\lambda\rho(\lambda)A(\vec{a},\lambda)A(\vec{c},\lambda)}\right]\\&=&\int{d\lambda\rho(\lambda)\left[-A(\vec{a},\lambda)A(\vec{b},\lambda)}+A(\vec{a},\lambda)A(\vec{c},\lambda)}\right]\\&=&\int{d\lambda\rho(\lambda)\left[A(\vec{a},\lambda)A(\vec{c},\lambda)}-A(\vec{a},\lambda)A(\vec{b},\lambda)}\right]\end{array}

Note that A(\vec{b},\lambda)(A(\vec{b},\lambda)=1. This is because, if A(\vec{b},\lambda)=1, then A(\vec{b},\lambda)A(\vec{b},\lambda)=1\cdot1=1. Likewise, if A(\vec{b},\lambda)=-1, then A(\vec{b},\lambda)A(\vec{b},\lambda)=-1\cdot -1=1.

Therefore,

\begin{array}{rcl}A(\vec{a},\lambda)A(\vec{c},\lambda)&=&A(\vec{a},\lambda)\left[1\right]A(\vec{c},\lambda)\\&=&A(\vec{a},\lambda)\left[A(\vec{b},\lambda)(A(\vec{b},\lambda)\right]A(\vec{c},\lambda)\end{array}

Now substitute the lower row of the right-hand side of the above equation into our previous equation:

\begin{array}{rcl}P(\vec{a},\vec{b})-P(\vec{a},\vec{c})&=&\int{d\lambda\rho(\lambda)\left[A(\vec{a},\lambda)A(\vec{c},\lambda)}-A(\vec{a},\lambda)A(\vec{b},\lambda)}\right]\\&=&\int{d\lambda\rho(\lambda)\left[A(\vec{a},\lambda)\left[A(\vec{b},\lambda)A(\vec{b},\lambda)\right]A(\vec{c},\lambda)-A(\vec{a},\lambda)A(\vec{b},\lambda)}\right]\\&=&\int{d\lambda\rho(\lambda)\left[\left(A(\vec{a},\lambda)A(\vec{b},\lambda)\right)\left(A(\vec{b},\lambda)A(\vec{c},\lambda)\right)-A(\vec{a},\lambda)A(\vec{b},\lambda)}\right]\end{array}

Next factor out A(\vec{b},\lambda)A(\vec{b},\lambda) from the above equation. We get:

P(\vec{a},\vec{b})-P(\vec{a},\vec{c})&=&\int{d\lambda\rho(\lambda)A(\vec{a},\lambda)A(\vec{b},\lambda)\left[A(\vec{b},\lambda)A(\vec{c},\lambda)}-1\right]

Factor our -1. That leaves

P(\vec{a},\vec{b})-P(\vec{a},\vec{c})&=&-\int{d\lambda\rho(\lambda)A(\vec{a},\lambda)A(\vec{b},\lambda)\left[1-A(\vec{b},\lambda)A(\vec{c},\lambda)}\right]

Recall that A(\vec{a},\lambda)=\pm1 and B(\vec{b},\lambda)=\pm1. Below is a table that depicts what happens when we multiply A(\vec{a},\lambda) and B(\vec{b},\lambda) together:

From the table, we can see that the maximum that A(\vec{a},\lambda)\cdot B(\vec{b},\lambda) can be is +1 and the minimum it can be is -1. If A(\vec{a},\lambda) or B(\vec{b},\lambda) are less than +1 or greater than -1, then A(\vec{a},\lambda)\cdot B(\vec{b},\lambda) is between +1 and -1. Mathematically, this is expressed as:

-1\leq A(\vec{a},\lambda)\cdot B(\vec{b},\lambda)\geq+1

There’s a function called the absolute value. It’s mathematical symbol is a vertical line to the left and right of the expression we want the absolute value of . When you take the absolute value of an expression, it makes the expression non-negative. So positive expressions remain positive, negative expressions turn positive, and expressions that evaluate to 0 remain 0. Another way to say this is that, when we take the absolute value of a number, we’re asking “on a number line, how far away from zero do we have to travel to get to the number.” For example, if you want to take the absolute value of 3 or -3, you would start at 0 and travel 3 units along the number line either in the positive or negative direction. The amount or magnitude of units you travel, disregarding the direction, is the absolute value:

Figure 10

In the left-hand portion of the equation above, -1\leq A(\vec{a},\lambda)\cdot B(\vec{b},\lambda), we could make the equation true by traveling a distance less than or equal to 1 unit. Therefore, the absolute value of  A(\vec{a},\lambda)\cdot B(\vec{b},\lambda) is less than 1. In mathematical terms:

\mid A(\vec{a},\lambda)\cdot B(\vec{b},\lambda)\leq1 \mid

Similarly, we could make the right-hand side of equation,  A(\vec{a},\lambda)\cdot B(\vec{b},\lambda)\geq+1, true by traveling less than or equal to 1 unit. Therefore, the absolute value of  A(\vec{a},\lambda)\cdot B(\vec{b},\lambda) is also less than or equal to 1:

\midA(\vec{a},\lambda)\cdot B(\vec{b},\lambda)\leq1 \mid 

So, starting from the equation -1\leq A(\vec{a},\lambda)\cdot B(\vec{b},\lambda)\geq+1, we’ve established that \mid A(\vec{a},\lambda)\cdot B(\vec{b},\lambda)\mid \leq 1. By similar reasoning, \mid A(\vec{b},\lambda)\cdot A(\vec{c},\lambda)\mid \leq 1. Therefore, (1-A(\vec{b},\lambda)\cdot A(\vec{c},\lambda)\geq 0.

Next, take the absolute value of

P(\vec{a},\vec{b})-P(\vec{a},\vec{c})&=&-\int{d\lambda\rho(\lambda)A(\vec{a},\lambda)A(\vec{b},\lambda)\left[1-A(\vec{b},\lambda)A(\vec{c},\lambda)}\right]

That gives us

\mid P(\vec{a},\vec{b})-P(\vec{a},\vec{c}) \mid\,\,&=&\,\,\mid-\int{d\lambda\rho(\lambda)A(\vec{a},\lambda)A(\vec{b},\lambda)\left[1-A(\vec{b},\lambda)A(\vec{c},\lambda)}\right] \mid

The absolute value sign makes both sides of the equation positive. 1-A(\vec{b},\lambda)A(\vec{c},\lambda)} is positive. If the magnitude of A(\vec{a},\lambda)A(\vec{b},\lambda) is less than one then that means that the absolute value of P(\vec{a},\vec{b})-P(\vec{a},\vec{c}) is some fraction of 1-A(\vec{b},\lambda)A(\vec{c},\lambda)}. Which means that 1-A(\vec{b},\lambda)A(\vec{c},\lambda)} must be greater than the absolute value of P(\vec{a},\vec{b})-P(\vec{a},\vec{c}). On the other hand, if the magnitude of A(\vec{a},\lambda)A(\vec{b},\lambda) equals 1, then that means that the absolute value of P(\vec{a},\vec{b})-P(\vec{a},\vec{c}) is equal to 1-A(\vec{b},\lambda)A(\vec{c},\lambda)}. It follows, then, that

\begin{array}{rcl}\mid P(\vec{a},\vec{b})-P(\vec{a},\vec{c})\mid \,\,&\leq&\,\,\int{d\lambda\rho(\lambda)\left[1-A(\vec{b},\lambda)A(\vec{c},\lambda)}\right]}\\&\leq& \int{d\lambda\rho(\lambda)(1)-\int\{d\lambda\rho(\lambda)A(\vec{b},\lambda)A(\vec{c},\lambda)}\\&\leq&1-\int\{d\lambda\rho(\lambda)A(\vec{b},\lambda)A(\vec{c},\lambda)}\end{array}

Now P(\vec{b},\lambda,\vec{c},\lambda)=\int\{d\lambda\rho(\lambda)B(\vec{b},\lambda)C(\vec{c},\lambda)}. And for the same reasons that  P(\vec{a},\vec{b})= -\int{d\lambda\rho(\lambda)A(\vec{a},\lambda)\cdot A(\vec{b},\lambda)},  P(\vec{b},\vec{c})= -\int{d\lambda\rho(\lambda)A(\vec{b},\lambda)\cdot A(\vec{c},\lambda)}.

So

\mid P(\vec{a},\vec{b})-P(\vec{a},\vec{c})\mid\,\,\leq 1+P(\vec{b},\lambda,\vec{c},\lambda)

That’s the famous Bell inequality. It describes the expectation values anticipated from measurement of the spin of entangled electron pairs, under conditions where hidden variables predetermine the outcome of these measurements.

Significance of Bell’s Inequality

In the last part of his paper, Bell proves that the expectation values predicted by quantum mechanics differ from those predicted by his inequality (which, of course, is based on the kind of classical hidden variables theory that Einstein advocated). He does this by developing expressions for the difference in expectation values given by the Bell’s inequality and those given by quantum mechanics, then proves that that difference can never be zero. The proof is long and difficult. Its explanation would take longer than the one already provided above and that explanation is already too long. Therefore, we’ll consider a shorter, informal demonstration given by Griffin in his classic textbook on quantum physics3.

Imagine making measurements at 3 angles, \vec{a}=0\,^\circ\vec{b}=90\,^\circ\vec{c}=45\,^\circ. As Bell points out in the beginning of his paper, the expectation value for two such events occurring simultaneously is given by dot product between the angle at which the 2 measurements are made. And the dot product equals the cosine of the angle between the 2 measurements. This follows from the fact that the average value of a probability function (i.e., the mean, the expectation value) is equal to the sum (for discrete variables) or integral (for continuous values) of each measurement times the probability of the occurrence of that measurement. Remember? We had a discussion about this in the sections of this paper entitled Probability Distribution:

\left\langle M \right\rangle = \displaystyle\int\rho(M)\cdot M_i} where

  • \left\langle M \right\rangle is the expectation value of M
  • For our purposes, M is the product of the measurements that result when one entangled electron is measured at A at angle \vec{a} and its entangled partner is measured at B at angle \vec{b}; or \vec{a} at A and \vec{c} at B; or \vec{b} at A and \vec{c} at B; or in each case, vice versa
  • \rho(M) is the probability that M occurs

From here, through a series of steps, we end up with the following relationships:

\left\langle a\cdot b \right\rangle=-\vec{a}\cdot\vec{b}; \left\langle ac \right\rangle=-\vec{a}\cdot\vec{c}; \left\langle bc \right\rangle=-\vec{b}\cdot\vec{c}; where, for example

  • \left\langle a\cdot b \right\rangle represents the expectation value (i.e., average or mean) of the product of the measurements when one entangled electron is measured at A at angle \vec{a} and its entangled partner is measured at B at angle \vec{b}
  • \vec{a}\cdot\vec{b} represents the dot product of \vec{a} and \vec{b}

It so happens that \vec{a}\cdot\vec{b}=\| \vec{a} \| \|\vec{b} \|\cos{\theta} where

  • \| \vec{a} \| and \|\vec{b} \| are the magnitude of \vec{a} and \vec{b}
  • \theta is the angle between \vec{a} and \vec{b}
  • The magnitude of a vector equals its absolute value; thus, \| \vec{a} \|=\mid \vec{a} \mid = \| \vec{b} \|=\mid \vec{b} \mid = 1

That means that \vec{a}\cdot\vec{b}=\| \vec{a} \| \|\vec{b} \|\cos{\theta}=1\cdot 1 \cdot \cos{\theta}=\cos{\theta}

It follows, then, that

\left\langle a\cdot b \right\rangle=\cos{\theta}

But the entangled electrons we’re measuring are anti-correlated. Therefore, we need to add a minus sign to the right side of the equation:

\left\langle a\cdot b \right\rangle=-\cos{\theta}

The proof of this is somewhat involved, requiring considerable background on quantum mechanics and the math associated with it. This is more than I wish to take on in this article. However, this information will be discussed in a subsequent article and a reference to that information will be subsequently be left here. In the meantime, the following diagram may provide an intuitive feel for this:

Figure 11

In the above diagram, vectors a, b and c have a magnitude of 1. \vec{a} represents measurement in the 0^\circ direction. \vec{b}  represents measurement in the 90^\circ direction. \vec{c}  represents measurement in the 45^\circ direction. The vertical dotted line represents the projection of \vec{c} on \vec{a} (i.e., the component of \vec{c} in the 0^\circ direction). The horizontal dotted line represents the projection of \vec{c} on \vec{b} (i.e., the component of \vec{c} in the 90^\circ direction). And the projection of one vector on another is the dot products of those two vectors, which is the cosine of the angle between the vectors.

So, from the diagram

  • The cosine of the angle between \vec{a} and \vec{b} equals zero. That’s because there is no component of \vec{a} in the direction of \vec{b}
  • The cosine of the angle between \vec{a} and \vec{c} equals \frac{\sqrt2}2=0.707. That’s because the component of \vec{a} in the direction of \vec{c}=\cos45^\circ=\frac{\sqrt2}2
  • The cosine of the angle between \vec{a} and \vec{c} equals \frac{\sqrt2}2=0.707. =\frac{\sqrt2}2. That’s because the component of \vec{b} in the direction of \vec{c}=\cos45^\circ=\frac{\sqrt2}2

Of course, as stated above, the spin of the entangled particles in Bell’s paper are anti-correlated. Therefore,

  • \left\langle \vec{a}\cdot \vec{c} \right\rangle=-\cos{45^\circ}=-\frac{\sqrt2}2=-0.707
    \left\langle \vec{b}\cdot \vec{c} \right\rangle=-\cos{45^\circ}=-0.707

In Bell’s paper

  • the expectation value of the product of measurements made of entangled photons at A in the \vec{a} direction and at B in the \vec{c} direction (\left\langle \vec{a}\cdot \vec{c} \right\rangle) is given by the expression P(\vec{b}, \vec{c})
  • the expectation value of the product of measurements made of entangled photons at A in the \vec{b} direction and at B in the \vec{c} direction (\left\langle \vec{b}\cdot \vec{c} \right\rangle) is given by the expression P(\vec{a}, \vec{c})

From this, we get

  • P(\vec{a}, \vec{c})=-\frac{\sqrt2}2=-0.707
  • P(\vec{b}, \vec{c})=-\frac{\sqrt2}2=-0.707

According to Bell’s inequality

\mid P(\vec{a}, \vec{b})-P(\vec{a}, \vec{c})\mid\,\,\leq \,\,1+P(\vec{b}, \vec{c})

Putting in the above numbers:

\mid 0-(-0.707)\mid\leq 1+(-0.707)

\mid 0.707\mid \leq 1-0.707=0.293                This is NOT true!

That means that the predictions of the classical local hidden variables theory on which Bell’s inequality is based do not agree with the predictions of quantum mechanics.

That was good news when it was discovered for it provided a way to test whether classical physics or quantum physics is correct. The above experiment could be done, and if results agreed with Bell’s inequality, then it would prove that there are local hidden variables at work. On the other hand, if results agreed with the predictions of quantum mechanics, it would indicate that quantum mechanics is correct.

So which is it? Which theory do results of the above experiment support? Well, now you’re probably going to throw eggs and rotten tomatoes at your screen because, at least to my knowledge, the above experiment has not been done. However, variants of the above experiment have been done that test different forms of Bell’s inequality. The results of these experiments are discussed in the next installment of this series, Bell’s Inequality III.

References

1.  https://link.springer.com/article/10.1007%2FBF02728310

2.  >https://en.wikipedia.org/wiki/Stern–Gerlach_experiment

3. Griffin, David J.“Introduction to Quantum Mechanics.” Introduction to Quantum Mechanics, Prentice Hall, 1995, p. 379.

Leave a Reply

Your email address will not be published. Required fields are marked *