Borel's_paradox

Borel–Kolmogorov paradox

Add article description

In probability theory, the Borel–Kolmogorov paradox (sometimes known as Borel's paradox) is a paradox relating to conditional probability with respect to an event of probability zero (also known as a null set). It is named after Émile Borel and Andrey Kolmogorov.

A great circle puzzle

Suppose that a random variable has a uniform distribution on a unit sphere. What is its conditional distribution on a great circle? Because of the symmetry of the sphere, one might expect that the distribution is uniform and independent of the choice of coordinates. However, two analyses give contradictory results. First, note that choosing a point uniformly on the sphere is equivalent to choosing the longitude $\lambda$ uniformly from $[-\pi ,\pi ]$ and choosing the latitude $\varphi$ from ${\textstyle [-{\frac {\pi }{2}},{\frac {\pi }{2}}]}$ with density ${\textstyle {\frac {1}{2}}\cos \varphi }$ .^[1] Then we can look at two different great circles:

If the coordinates are chosen so that the great circle is an equator (latitude $\varphi =0$ ), the conditional density for a longitude $\lambda$ defined on the interval $[-\pi ,\pi ]$ is $f(\lambda \mid \varphi =0)={\frac {1}{2\pi }}.$
If the great circle is a line of longitude with $\lambda =0$ , the conditional density for $\varphi$ on the interval ${\textstyle [-{\frac {\pi }{2}},{\frac {\pi }{2}}]}$ is $f(\varphi \mid \lambda =0)={\frac {1}{2}}\cos \varphi .$

One distribution is uniform on the circle, the other is not. Yet both seem to be referring to the same great circle in different coordinate systems.

Many quite futile arguments have raged — between otherwise competent probabilists — over which of these results is 'correct'.
— E.T. Jaynes^[1]

Mathematical explication

Measure theoretic perspective

To understand the problem we need to recognize that a distribution on a continuous random variable is described by a density f only with respect to some measure μ. Both are important for the full description of the probability distribution. Or, equivalently, we need to fully define the space on which we want to define f.

Let Φ and Λ denote two random variables taking values in Ω₁ = ${\textstyle \left[-{\frac {\pi }{2}},{\frac {\pi }{2}}\right]}$ respectively Ω₂ = [−π, π]. An event {Φ = φ, Λ = λ} gives a point on the sphere S(r) with radius r. We define the coordinate transform

{\begin{aligned}x&=r\cos \varphi \cos \lambda \\y&=r\cos \varphi \sin \lambda \\z&=r\sin \varphi \end{aligned}}

for which we obtain the volume element

\omega _{r}(\varphi ,\lambda )=\left\|{\partial (x,y,z) \over \partial \varphi }\times {\partial (x,y,z) \over \partial \lambda }\right\|=r^{2}\cos \varphi \ .

Furthermore, if either φ or λ is fixed, we get the volume elements

{\begin{aligned}\omega _{r}(\lambda )&=\left\|{\partial (x,y,z) \over \partial \varphi }\right\|=r\ ,\quad {\text{respectively}}\\[3pt]\omega _{r}(\varphi )&=\left\|{\partial (x,y,z) \over \partial \lambda }\right\|=r\cos \varphi \ .\end{aligned}}

Let

\mu _{\Phi ,\Lambda }(d\varphi ,d\lambda )=f_{\Phi ,\Lambda }(\varphi ,\lambda )\omega _{r}(\varphi ,\lambda )\,d\varphi \,d\lambda

denote the joint measure on ${\mathcal {B}}(\Omega _{1}\times \Omega _{2})$ , which has a density $f_{\Phi ,\Lambda }$ with respect to $\omega _{r}(\varphi ,\lambda )\,d\varphi \,d\lambda$ and let

{\begin{aligned}\mu _{\Phi }(d\varphi )&=\int _{\lambda \in \Omega _{2}}\mu _{\Phi ,\Lambda }(d\varphi ,d\lambda )\ ,\\\mu _{\Lambda }(d\lambda )&=\int _{\varphi \in \Omega _{1}}\mu _{\Phi ,\Lambda }(d\varphi ,d\lambda )\ .\end{aligned}}

If we assume that the density $f_{\Phi ,\Lambda }$ is uniform, then

{\begin{aligned}\mu _{\Phi \mid \Lambda }(d\varphi \mid \lambda )&={\mu _{\Phi ,\Lambda }(d\varphi ,d\lambda ) \over \mu _{\Lambda }(d\lambda )}={\frac {1}{2r}}\omega _{r}(\varphi )\,d\varphi \ ,\quad {\text{and}}\\[3pt]\mu _{\Lambda \mid \Phi }(d\lambda \mid \varphi )&={\mu _{\Phi ,\Lambda }(d\varphi ,d\lambda ) \over \mu _{\Phi }(d\varphi )}={\frac {1}{2r\pi }}\omega _{r}(\lambda )\,d\lambda \ .\end{aligned}}

Hence, $\mu _{\Phi \mid \Lambda }$ has a uniform density with respect to $\omega _{r}(\varphi )\,d\varphi$ but not with respect to the Lebesgue measure. On the other hand, $\mu _{\Lambda \mid \Phi }$ has a uniform density with respect to $\omega _{r}(\lambda )\,d\lambda$ and the Lebesgue measure.

Proof of contradiction

Consider a random vector $(X,Y,Z)$ that is uniformly distributed on the unit sphere $S^{2}$ .

We begin by parametrizing the sphere with the usual spherical polar coordinates:

{\begin{aligned}x&=\cos(\varphi )\cos(\theta )\\y&=\cos(\varphi )\sin(\theta )\\z&=\sin(\varphi )\end{aligned}}

where ${\textstyle -{\frac {\pi }{2}}\leq \varphi \leq {\frac {\pi }{2}}}$ and $-\pi \leq \theta \leq \pi$ .

We can define random variables $\Phi$ , $\Theta$ as the values of $(X,Y,Z)$ under the inverse of this parametrization, or more formally using the arctan2 function:

{\begin{aligned}\Phi &=\arcsin(Z)\\\Theta &=\arctan _{2}\left({\frac {Y}{\sqrt {1-Z^{2}}}},{\frac {X}{\sqrt {1-Z^{2}}}}\right)\end{aligned}}

Using the formulas for the surface area spherical cap and the spherical wedge, the surface of a spherical cap wedge is given by

\operatorname {Area} (\Theta \leq \theta ,\Phi \leq \varphi )=(1+\sin(\varphi ))(\theta +\pi )

Since $(X,Y,Z)$ is uniformly distributed, the probability is proportional to the surface area, giving the joint cumulative distribution function

F_{\Phi ,\Theta }(\varphi ,\theta )=P(\Theta \leq \theta ,\Phi \leq \varphi )={\frac {1}{4\pi }}(1+\sin(\varphi ))(\theta +\pi )

The joint probability density function is then given by

f_{\Phi ,\Theta }(\varphi ,\theta )={\frac {\partial ^{2}}{\partial \varphi \partial \theta }}F_{\Phi ,\Theta }(\varphi ,\theta )={\frac {1}{4\pi }}\cos(\varphi )

Note that $\Phi$ and $\Theta$ are independent random variables.

For simplicity, we won't calculate the full conditional distribution on a great circle, only the probability that the random vector lies in the first octant. That is to say, we will attempt to calculate the conditional probability $\mathbb {P} (A|B)$ with

{\begin{aligned}A&=\left\{0<\Theta <{\frac {\pi }{4}}\right\}&&=\{0<X<1,0<Y<X\}\\B&=\{\Phi =0\}&&=\{Z=0\}\end{aligned}}

We attempt to evaluate the conditional probability as a limit of conditioning on the events

B_{\varepsilon }=\{|\Phi |<\varepsilon \}

As $\Phi$ and $\Theta$ are independent, so are the events $A$ and $B_{\varepsilon }$ , therefore

P(A\mid B)\mathrel {\stackrel {?}{=}} \lim _{\varepsilon \to 0}{\frac {P(A\cap B_{\varepsilon })}{P(B_{\varepsilon })}}=\lim _{\varepsilon \to 0}P(A)=P\left(0<\Theta <{\frac {\pi }{4}}\right)={\frac {1}{8}}.

Now we repeat the process with a different parametrization of the sphere:

{\begin{aligned}x&=\sin(\varphi )\\y&=\cos(\varphi )\sin(\theta )\\z&=-\cos(\varphi )\cos(\theta )\end{aligned}}

This is equivalent to the previous parametrization rotated by 90 degrees around the y axis.

Define new random variables

{\begin{aligned}\Phi '&=\arcsin(X)\\\Theta '&=\arctan _{2}\left({\frac {Y}{\sqrt {1-X^{2}}}},{\frac {-Z}{\sqrt {1-X^{2}}}}\right).\end{aligned}}

Rotation is measure preserving so the density of $\Phi '$ and $\Theta '$ is the same:

f_{\Phi ',\Theta '}(\varphi ,\theta )={\frac {1}{4\pi }}\cos(\varphi )

.

The expressions for A and B are:

{\begin{aligned}A&=\left\{0<\Theta <{\frac {\pi }{4}}\right\}&&=\{0<X<1,\ 0<Y<X\}&&=\left\{0<\Theta '<\pi ,\ 0<\Phi '<{\frac {\pi }{2}},\ \sin(\Theta ')<\tan(\Phi ')\right\}\\B&=\{\Phi =0\}&&=\{Z=0\}&&=\left\{\Theta '=-{\frac {\pi }{2}}\right\}\cup \left\{\Theta '={\frac {\pi }{2}}\right\}.\end{aligned}}

Attempting again to evaluate the conditional probability as a limit of conditioning on the events

B_{\varepsilon }^{\prime }=\left\{\left|\Theta '+{\frac {\pi }{2}}\right|<\varepsilon \right\}\cup \left\{\left|\Theta '-{\frac {\pi }{2}}\right|<\varepsilon \right\}.

Using L'Hôpital's rule and differentiation under the integral sign:

{\begin{aligned}P(A\mid B)&\mathrel {\stackrel {?}{=}} \lim _{\varepsilon \to 0}{\frac {P(A\cap B_{\varepsilon }^{\prime })}{P(B_{\varepsilon }^{\prime })}}\\&=\lim _{\varepsilon \to 0}{\frac {1}{\frac {4\varepsilon }{2\pi }}}P\left({\frac {\pi }{2}}-\varepsilon <\Theta '<{\frac {\pi }{2}}+\varepsilon ,\ 0<\Phi '<{\frac {\pi }{2}},\ \sin(\Theta ')<\tan(\Phi ')\right)\\&={\frac {\pi }{2}}\lim _{\varepsilon \to 0}{\frac {\partial }{\partial \varepsilon }}\int _{{\pi }/{2}-\epsilon }^{{\pi }/{2}+\epsilon }\int _{0}^{{\pi }/{2}}1_{\sin(\theta )<\tan(\varphi )}f_{\Phi ',\Theta '}(\varphi ,\theta )\mathrm {d} \varphi \mathrm {d} \theta \\&=\pi \int _{0}^{{\pi }/{2}}1_{1<\tan(\varphi )}f_{\Phi ',\Theta '}\left(\varphi ,{\frac {\pi }{2}}\right)\mathrm {d} \varphi \\&=\pi \int _{\pi /4}^{\pi /2}{\frac {1}{4\pi }}\cos(\varphi )\mathrm {d} \varphi \\&={\frac {1}{4}}\left(1-{\frac {1}{\sqrt {2}}}\right)\neq {\frac {1}{8}}\end{aligned}}

This shows that the conditional density cannot be treated as conditioning on an event of probability zero, as explained in Conditional probability#Conditioning on an event of probability zero.

Share this article:

This article uses material from the Wikipedia article Borel's_paradox, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.

[Jaynes-1] [1]
Jaynes 2003, pp. 1514–1517

[2] [2]
Originally Kolmogorov (1933), translated in Kolmogorov (1956). Sourced from Pollard (2002)

[1]

[2]

Borel's_paradox

Borel–Kolmogorov paradox

A great circle puzzle

Explanation and implications

Mathematical explication

Measure theoretic perspective

Proof of contradiction

See also

Notes

References

Share this article: