Basel Problem Solution

April 22nd, 2017

Hey folks, I’ve been pretty busy lately, but I have an incredibly elegant proof to show you today. In 1734, a man named Leonard Euler gained immediate recognition around the world of mathematics for solving the Basel problem. The problem was to find the following


Euler’s initial solution relied on the manipulation of the Taylor series for $sin(x)$, and some algebraic wizardry that was relatively unproven at the time. Euler’s solution is incredibly elegant and cool, but we aren’t going to show his proof, because I don’t think it’s as cool as this solution. This solution uses Cauchy’s Residue Theorem. I will explain briefly how this theorem works before using it extensively to solve this problem.

I. Residue Theorem

In complex analysis, it is easy to show that the integral of a function over a closed curve is determined only by the singularities, or undefined points, of the function on the interior (or boundary) of the curve. The integral of a function over a region with no singularities is 0, so we can essentially shrink our region down to a bunch of small integrals around the singular points. These integrals are called the residues of f at a point. Here is a sketch of the idea

Since our curve encloses no singular points, the integral must be 0. As we let the gaps shrink to infinity, we get that the integral around the entire region (black), – the integrals around the singular points (pink), must be 0. Therefore, the integrals around the singular points is equal to the entire integral. The integrals around the singular points are called the residues. We can use Cauchy’s Residue Theorem to write

$$\oint_\gamma f(z) \text{d}z = 2\pi i \sum_k\underset{z\rightarrow z_k}{\text{Res}}\hspace{2mm}f(z)$$

This is a fancy way of writing that the integral of $f(z)$ over a closed curve $\gamma$ is equal to $2\pi i$ times the sum of all the residues inside $\gamma$. For a simple pole, we can compute the residue using

$$\underset{z\rightarrow z_o}{\text{Res}}\hspace{2mm}f(z) = \underset{z\rightarrow z_k}{\text{lim}}f(z)(z-z_o)$$

II. Adaptation to Basel Problem

Well, that’s a 5 minute course in complex analysis. How can this possibly help us with the Basel problem, a real valued problem which appears to want nothing to do with complex numbers, is surely the question you are asking. Let’s consider the function

$$f(z) = \pi\cot(\pi z) = \frac{\pi\cos(\pi z)}{\sin(\pi z)}$$

$\sin(z)$ is 0 at any integer multiple of $\pi$. So $\sin(\pi z)$ is 0 at any integer z.

$$ \sin(\pi z) = 0 \hspace{10mm} z = n \in \mathbb{Z}$$

This means that for any integer $n$, our function has a singular point at $z=n$. Since $\sin(\pi z)$ can be written as an infinite product of all the singular points with each point order 1, all of the singular points are simple poles. This is crucial. Let’s calculate the residues using the limit definition.

$$\underset{z\rightarrow n}{\text{Res}}\hspace{2mm}f(z) = \underset{z\rightarrow n}{\text{lim}}\frac{\pi\cos(\pi z)(z-n)}{\sin(\pi z)}$$

Using L’Hopital’s rule

$$ = \pi\cdot \underset{z\rightarrow n}{\text{lim}}\frac{\frac{\text{d}}{\text{d}z}\hspace{2mm} z\cos(\pi z) -n\cos(\pi z)}{\frac{\text{d}}{\text{d}z}\hspace{2mm}\sin(\pi z)}$$

$$ = \pi\cdot \underset{z\rightarrow n}{\text{lim}}\frac{\cos(\pi z) – \pi z \sin(\pi z) + n\pi\sin(\pi z)}{\pi \cos(\pi z)}$$

$$ = \pi\cdot \frac{\cos(\pi n) – 0 + 0}{\pi\cos(\pi n)}$$

$$ = 1 $$

So, the residue at each pole is 1. Now we will use the residue theorem in a very odd way.

III. Kill The Integral

We showed that $$\underset{z\rightarrow n}{\text{Res}}\hspace{2mm}\pi\cot(\pi z) = 1\hspace{5mm} n\in\mathbb{Z}$$

Using the residue theorem, let’s integrate over a circle of radius $R$ centered at the origin. We will call our curve $\gamma_R$. Let $R$ be slightly bigger than some integer $n$ but less than $n+1$

$$\oint_{\gamma_R} \pi\cot(\pi z) \hspace{1mm}\text{d}z = 2\pi i \sum_{k}\underset{z\rightarrow z_k}{\text{Res}}\hspace{2mm}\pi\cot(\pi z)$$

We just showed that all of the residues are 1, so we can rewrite as

$$ = 2\pi i \sum_k 1$$

The trick now is that we actually know exactly how many residues we are summing. Here’s a picture.

We have exactly $2n+1$ poles inside our circle. Therefore

$$\oint_{\gamma_R} \pi\cot(\pi z) \hspace{1mm}\text{d}z = 2\pi i \sum_{k=1}^{2n+1}1$$

$$\Big|\oint_{\gamma_R} \pi\cot(\pi z) \hspace{1mm}\text{d}z \Big|= 4\pi n + 2\pi $$

$$\Big|\oint_{\gamma_R} \pi\cot(\pi z) \hspace{1mm}\text{d}z \Big| < 4\pi R + 2\pi$$

So, we have found that our integral is $O(R)$. Therefore,

$$\Big|\oint_{\gamma_R} \frac{\pi\cot(\pi z)}{z^2} \hspace{1mm}\text{d}z \Big| < \frac{4\pi R + 2\pi}{R^2}$$

As $R\rightarrow\infty$

$$\underset{R\rightarrow \infty}{\text{lim}}\Big|\oint_{\gamma_R} \frac{\pi\cot(\pi z)}{z^2} \hspace{1mm}\text{d}z \Big| <\underset{R\rightarrow \infty}{\text{lim}}\Big|\oint_{\gamma_R} \frac{\pi\cot(\pi z)}{|z^2|} \hspace{1mm}\text{d}z \Big|$$

$$< \underset{R\rightarrow \infty}{\text{lim}}\Big|\oint_{\gamma_R} \frac{\pi\cot(\pi z)}{R^2} \hspace{1mm}\text{d}z \Big|< \underset{R\rightarrow \infty}{\text{lim}}\frac{1}{R^2}\Big|\oint_{\gamma_R} \pi\cot(\pi z)\hspace{1mm}\text{d}z \Big|$$

$$< \underset{R\rightarrow \infty}{\text{lim}}\frac{4\pi R + 2\pi}{R^2} = 0$$

As we let $R$ go to $\infty$, our integral vanishes, so we have

$$ \underset{R\rightarrow \infty}{\text{lim}}\Big|\oint_{\gamma_R} \frac{\pi\cot(\pi z)}{z^2} \hspace{1mm}\text{d}z \Big| = 0$$

$$ \underset{R\rightarrow \infty}{\text{lim}}\oint_{\gamma_R} \frac{\pi\cot(\pi z)}{z^2} \hspace{1mm}\text{d}z = 0$$

IV. Reformulation of Residues

Simple Poles

We started by finding the poles of $f(z) = \pi\cot(\pi z)$, and finding the residues to be 1. What about the poles of our new function

$$ f(z) = \frac{\pi \cot(\pi z)}{z^2}$$

Well, since the denominator is analytic (well defined) at all $z\in\mathbb{Z}, z\neq 0$, the poles, other than z = 0, are still simple. Let’s compute the residues

$$\underset{z\rightarrow n}{\text{Res}}\hspace{2mm}f(z) = \underset{z\rightarrow n}{\text{lim}}\frac{\pi\cos(\pi z)(z-n)}{\sin(\pi z)z^2}$$

Using L’Hopital’s rule

$$ = \pi\cdot \underset{z\rightarrow n}{\text{lim}}\frac{\frac{\text{d}}{\text{d}z}\hspace{2mm} z\cos(\pi z) -n\cos(\pi z)}{\frac{\text{d}}{\text{d}z}\hspace{2mm}\sin(\pi z)z^2}$$

$$ = \pi\cdot \underset{z\rightarrow n}{\text{lim}}\frac{\cos(\pi z) – \pi z \sin(\pi z) + n\pi\sin(\pi z)}{\pi \cos(\pi z)z^2 + 2z\sin(\pi z)}$$

$$ = \pi\cdot \frac{\cos(\pi n) – 0 + 0}{\pi n^2\cos(\pi n)}$$

$$ = \frac{1}{n^2}$$

The residue at every pole other than $z=0$ is equal to $\frac{1}{n^2}$.

At z = 0

We now have to find the residue at $z=0$. Since $z=0$ was a pole of order 1 for $f(z)=\pi\cot(\pi z)$, we know that it is a pole of order 3 for $f(z) \frac{\pi\cot(\pi z)}{z^2}$

We will use the limit definition for a 3rd order pole

$$\underset{z\rightarrow 0}{\text{Res}}f(z) = \frac{1}{2}\cdot\underset{z\rightarrow 0}{\text{lim}}\hspace{2mm}\frac{\text{d}^2}{\text{d}z^2}\hspace{2mm}\frac{\pi\cos(\pi z)z^3}{\sin(\pi z)z^2}$$

$$\underset{z\rightarrow 0}{\text{lim}}\hspace{2mm}\frac{\text{d}^2}{\text{d}z^2}\hspace{2mm}\frac{\pi\cot(\pi z)z}{2}$$

$$ = \underset{z\rightarrow 0}{\text{lim}}\hspace{2mm}\frac{\text{d}}{\text{d}z}\hspace{2mm}\frac{-\pi^2z\csc^2(\pi z)+\pi\cot(\pi z)}{2}$$

$$ = \underset{z\rightarrow 0}{\text{lim}}\hspace{2mm}\frac{-\pi^2\csc^2(\pi z) + 2\pi^3z\csc^2(\pi z)\cot(\pi z)+-\pi^2\csc^2(\pi z)}{2}$$

$$ = \underset{z\rightarrow 0}{\text{lim}}\hspace{2mm}\pi^2\frac{\pi z \cos(\pi z) – \sin(\pi z)}{\sin^3(\pi z)}$$

Using L’Hopital’s rule

$$ = \underset{z\rightarrow 0}{\text{lim}}\hspace{2mm}\pi^2\frac{-\pi^2 z \sin(\pi z) + \pi\cos(\pi z) – \pi\cos(\pi z)}{3\pi\sin^2(\pi z)\cos(\pi z)}$$

$$ = \underset{z\rightarrow 0}{\text{lim}}\hspace{2mm}\pi^2\frac{-\pi z}{3\sin(\pi z)\cos(\pi z)}$$

$$ = \underset{z\rightarrow 0}{\text{lim}}\hspace{2mm}\pi^2\frac{-2\pi z}{3\sin(2\pi z)}$$

Using L’Hopital’s rule again

$$ = \underset{z\rightarrow 0}{\text{lim}}\hspace{2mm}\frac{-2\pi^3}{6\pi\cos(2\pi z)}$$

$$ = -\frac{\pi^2}{3}$$

Wow, alright, that was pretty long. But I promise we are almost there. We just need to put it all together.

IV. The Big Reveal

All in all, we’ve established the following

$$\underset{z\rightarrow n}{\text{Res}}\hspace{2mm}\frac{\pi\cot(\pi z)}{z^2}= \frac{1}{n^2} \hspace{7mm} n\in\mathbb{Z}, n\neq 0$$

$$\underset{z\rightarrow n}{\text{Res}}\hspace{2mm}\frac{\pi\cot(\pi z)}{z^2}= -\frac{\pi^2}{3} \hspace{7mm} n = 0$$

$$ \underset{R\rightarrow \infty}{\text{lim}}\oint_{\gamma_R} \frac{\pi\cot(\pi z)}{z^2} \hspace{1mm}\text{d}z= 0$$

We know that the integral is equal to $2\pi i$ times the sum of the residues

$$ \underset{R\rightarrow \infty}{\text{lim}}\oint_{\gamma_R} \frac{\pi\cot(\pi z)}{z^2} \hspace{1mm}\text{d}z= 2\pi i \sum \underset{z\rightarrow n}{\text{Res}}\hspace{2mm}f(z)$$

But we know the integral is 0, so

$$2\pi i \sum \underset{z\rightarrow n}{\text{Res}}\hspace{2mm}f(z) = 0$$

$$\sum \underset{z\rightarrow n}{\text{Res}}\hspace{2mm}f(z) = 0$$

Now, we can rewrite the sum of the residues, since we’ve computed them all.

$$\sum \underset{z\rightarrow n}{\text{Res}}\hspace{2mm}f(z) = \sum_{n\neq 0}\underset{z\rightarrow n}{\text{Res}}\hspace{2mm}f(z) \hspace{2mm} + \hspace{2mm} \underset{z\rightarrow 0}{\text{Res}}\hspace{2mm}f(z) = 0$$

$$\sum_{n=-\infty}^{\infty,  n\neq 0}\frac{1}{n^2} \hspace{2mm} – \hspace{2mm} \frac{\pi^2}{3}= 0$$

But, $\frac{1}{n^2}$ is an even function, so we can rewrite it as

$$\sum_{n=-\infty}^{\infty,  n\neq 0}\frac{1}{n^2} = 2\sum_{n=1}^{\infty}\frac{1}{n^2}$$


$$2\sum_{n=1}^{\infty}\frac{1}{n^2} \hspace{2mm} – \hspace{2mm} \frac{\pi^2}{3}= 0$$

$$\bbox[10px,border:1px solid black]{\sum_{n=1}^{\infty}\frac{1}{n^2} = \frac{\pi^2}{6}}$$

That’s it folks. As always, thanks for reading! I know this was much more mathematically intense than the rest of my posts, but this is such an incredible formula that I was left with no choice but to present it.





Memory-less Processes

January 10th, 2017

Today we will investigate a very interesting property of the exponential distribution. Often used to model waiting times, the exponential distribution has a probability density function of

$$P(x) = \lambda e^{-\lambda x}$$

It has an expected value of $\frac{1}{\lambda}$. If we use this model for a waiting time for a bus, then we expect to wait $\frac{1}{\lambda}$ minutes (for our model, we will use minutes, but it really doesn’t matter).


I. Expected Value

We start by computing the expected value. Let $T$ have a distribution of $$P(T) = \lambda e^{-\lambda T}$$

We compute the expected value of $T$, $\langle T\rangle$ via

$$\langle T\rangle = \int_{T_{\text{min}}}^{T_{\text{max}}}T\cdot P(T) \text{d}T$$

We could wait a minimum of 0 minutes (the bus is there when we arrive), and could wait at most, well, infinity, because the bus may break and never arrive. The exponential distribution is defined for $T\in[0,\infty)$, so our integral becomes

$$\langle T\rangle = \int_{0}^{\infty}T\lambda e^{-\lambda T} \text{d}T $$

$$= -Te^{-\lambda T}+\int e^{-\lambda T}\text{d}T\hspace{4mm}|_{T=0}^{T=\infty}$$

$$ = -Te^{-\lambda T}-\frac{1}{\lambda}e^{-\lambda T}\hspace{4mm}|_{T=0}^{T=\infty}$$

$$ = 0 – (-\frac{1}{\lambda}) = \frac{1}{\lambda}$$

II. Conditional Distributions Intro

Now we will examine a conditional distribution. This sounds confusing, but it’s a pretty easy concept to understand. Let’s say you have a red ball and a yellow ball, and you pick a ball from a hat, then pick the other ball out of the hat. What’s the probability that you pick the red ball second? Hopefully it’s obvious that it’s 50%. Well, what if I tell you that you pick the yellow ball first, now, what’s the probability that you pick the red ball second given that you pick the yellow ball first? 100%. This is what a conditional distribution is; the probability of some event occurring, given some prior knowledge.

In our case, our event is the probability that we wait $T$ minutes, and the knowledge we are given is that we’ve already waited $W$ minutes. Let’s say you are waiting for a bus, how long do you expect until the next bus comes given that you have already waited $W$ minutes and no bus has arrived?

We will use Bayes’ Theorem, which states that the probability of $A$ given $B$, or $P(A|B)$, can be expressed as

$$P(A|B) = \frac{P(A\hspace{1mm}\text{and}\hspace{1mm}B)}{P(B)}$$

III. Conditional Distribution Calculations

In our case, we want the probability that we wait $T$ minutes given we’ve waited $W$ minutes already. It is very important to understand how to interpret the information given. By waiting $W$ minutes, we know that $T\geq W$. We know nothing else, other than the fact that $T$ must be greater than or equal to $W$, since we have already waited that long. Therefore,

$$P(B) = P(T\geq W) = \int_{W}^{\infty} \lambda e^{-\lambda T}\text{d}T$$

$$ = -e^{-\lambda T}\hspace{4mm}|_{T=W}^{T=\infty}$$

$$ = e^{-\lambda W}$$

Now, we need to calculate $P(A\hspace{1mm}\text{and}\hspace{1mm}B)$. But in fact, we don’t, we already know it! The probability that we wait $T$ minutes AND $T\geq W$ is just the probability that we wait $T$ minutes for $T\geq W$ and 0 for $T<W$.

Therefore, we have

$$P(T|T\geq W) = \frac{e^{-\lambda T}}{e^{-\lambda W}} \hspace{5mm} T\geq W$$

$$= e^{-\lambda (T-W)}\hspace{5mm} T\geq W$$

IV. Conditional Expected Value

So, how long do we expect to wait, given that we’ve already waited $W$ minutes? The expected value can be calculated the same way as before.

$$\langle T|T\geq W\rangle = \int_{W}^{\infty}T\cdot P(T|T\geq W) \text{d}T$$

$$ = \int_{W}^{\infty}T\lambda  \frac{e^{-\lambda T}}{e^{-\lambda W}}\text{d}T $$

$$= e^{\lambda W}\int_{W}^{\infty}T\lambda e^{-\lambda T}\text{d}T $$

$$ = e^{\lambda W}\cdot (-Te^{-\lambda T}-\frac{1}{\lambda}e^{-\lambda T}\hspace{4mm}|_{T=W}^{T=\infty})$$

$$ = e^{\lambda W}\cdot (We^{-\lambda W}+ \frac{e^{-\lambda W}}{\lambda})$$

$$ = W + \frac{1}{\lambda}$$

This shows that we expect to wait $W + \frac{1}{\lambda}$ minutes. What’s significant about this? Well, given no information, we expected to wait $\frac{1}{\lambda}$ minutes. Given we’ve already waited $W$ minutes, we now expect to wait $W + \frac{1}{\lambda}$ minutes. This means that we simply expect to wait another $\frac{1}{\lambda}$ minutes. No matter how long we have already waited, we are no more likely to encounter a bus any quicker than if we hadn’t been waiting. Pretty interesting, I think.

That’s all folks. Thanks for reading!




←newer | older→