Distributions


Part of the Statistics for Machine Learning series.

Discrete Distributions

:::Definition (Bernoulli distribution).

A trial in an experiment that can result in $d = 2$ success or failure.

  • $X \sim \text{ber}(p)$ if $P(X = 1) = p$ and $P(X = 0) = 1-p$
  • $\mathbb{E}(X) = p$
  • $\mathbb{V}ar(X) = p-p^2 = p(1-p)$

:::

:::Definition (Binomial distribution).

Repeated $\text{ber}$ trials.

  • $X \sim \text{bin}(n,p)$ if for $x \in {0, 1, \cdots, n}$: $$P(X = x) = \binom{n}{x}p^x(1-p)^{n-x}$$
  • $\mathbb{E}(X) = np$
  • $\mathbb{V}ar(X) = np(1-p)$

:::

:::Definition (Multinomial distribution).

A trial in an experiment that can result in $d$ outcomes.

  • A random vector $\mathcal{X} = (X_1, \dots, X_d)$ is called multinomially distributed with $n$ number of trials and probabilities $p = (p_1, \dots, p_d) \in (0,1)^d$, denoted $\mathcal{X} \sim \text{mult}(n, p)$ if $$\begin{split} P(\mathcal{X} = (x_1, \dots, x_d)) &= \frac{n!}{x_1!\cdots x_d!} p_1^{x_1} \times \cdots \times p_k^{x_k} \ &= \binom{n}{x_1, \dots, x_d} \prod_{k=1}^d p_k^{x_k} \end{split}$$ where $(x_1, \dots, x_d) \in \mathbb{N}^d$ with $\sum_{k=1}^d x_k = n$ and $\sum_{k=1}^d p_k = 1$
  • $\mathbb{E}(X_k) = np_k$
  • $\mathbb{V}ar(X) = np_k(1-p_k)$

:::

:::Definition (Geometric distribution).

Total number of attempts before success.

  • $X \sim \text{geo}(p)$ if for $x \in {0, 1, \cdots, n}$: $$P(X = x) = (1-p)^x\cdot p$$
  • $\mathbb{E}(X) = \frac{1-p}{p}$
  • $\mathbb{V}ar(X) = \frac{1-p}{p^2}$

:::

:::Definition (Poisson distribution).

Intensity of something in time or space.

  • $X \sim \text{Poi}(\lambda)$ if for $x \in {0, 1, \cdots, n}$: $$P(X = x) = \frac{\lambda^x}{x!} \cdot e^{-\lambda}$$
  • $\mathbb{E}(X) = \lambda$
  • $\mathbb{V}ar(X) = \lambda$

:::

:::Definition (Discrete uniform distribution).

A finite number $n$ of values are equally likely to be observed.

  • $X \sim \mathcal{U}(a,b)$ for some integers $a, b$ with $a \leq b$
  • $\mathbb{E}(X = x) = 1/n$
  • $F(X) = \frac{\lfloor k \rfloor - a + 1}{n}$ for $k \in {a, a+1, \cdots, b-1, b}$
  • $\mathbb{V}ar(X) = \frac{(b-a+1)^2-1}{12}$

:::

Continuous Distributions

:::Definition (Continuous uniform distribution).

An experiment where there is an arbitrary outcome that lies between certain bounds.

  • $X \sim U(a,b)$ if its pdf is $$f(x) = \begin{cases} \frac{1}{b-a} & \text{if } x \in (a,b) \ 0 & \text{otherwise} \end{cases}$$
  • The cdf is given by $$F(x) = \begin{cases} 0 & x \leq a \ \frac{x-a}{b-a} & a < x < b \ 1 & x \geq b \end{cases}$$
  • $\mathbb{E}(X) = \frac{a+b}{2}$
  • $\mathbb{V}ar(X) = \frac{(b-a)^2}{12}$

:::

:::Definition (Exponential distribution).

Continuous analogue of geometric distribution.

  • $X \sim \exp(\lambda)$ for $\lambda > 0$
  • pdf is of the form $$f(x) = \begin{cases} \lambda e^{-\lambda x} & x \geq 0 \ 0 & \text{otherwise} \end{cases}$$
  • cdf is $$F(x) = \begin{cases} 1 - e^{-\lambda x} & x \geq 0 \ 0 & x < 0 \end{cases}$$
  • $\mathbb{E}(X) = 1/\lambda$
  • $\mathbb{V}ar(X) = 1/\lambda^2$

:::

:::Definition ($\chi^2$ distribution).

Let $Z_1, \dots, Z_d$ be i.i.d. random variables with $Z_i \sim N(0,1)$. A random variable $X$ is called $\chi^2$ distributed with $d$ degrees of freedom $X \sim \chi^2(d)$ if

$$X \sim Z_1^2 + \cdots + Z_d^2$$

where $X \geq 0$.

  • $\mathbb{E}(X) = d$
  • $\mathbb{V}ar(X) = 2d$

:::

Normal Distribution

:::Definition (Normal distribution).

A random variable is said to be (univariate) Gaussian or normal $X \sim N(\mu, \sigma^2)$ if its pdf is of the form

$$f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$$

Alternatively, we can use the following derivation by Herschel-Maxwell: The standard normal distribution $N(0, 1)$ has the form

$$\phi = \frac{1}{\sqrt{2\pi}}\exp \left (-\frac{1}{2}x^2 \right )$$

All normal distribution functions can be obtained by introducing the variance $\sigma$ and the mean $\mu$ as follows

$$P_{\text{norm}}(x\mid\mu,\sigma) = \frac{1}{\sigma}\phi\left(\frac{x-\mu}{\sigma}\right)$$

A random variable is called Gaussian $X \sim N(\mu, \sigma^2)$ if

$$P(a < X < b \mid \mu,\sigma) = \int_a^b P_{\text{norm}}(x\mid\mu,\sigma) dx$$

Some properties

  • Let $X \sim N(\mu, \sigma^2)$. If $Y = a + bX$ then $Y \sim N(a+b\mu, b^2\sigma^2)$. This is called an affine transformation.

:::

:::Theorem (Law of large numbers).

Let $X_1, X_2, \cdots, X_n$ be i.i.d. random variables with $\mathbb{E}(X_i) = \mu$ and $\mathbb{V}ar(X_i) = \sigma^2$. Then it holds with the sample mean

$$\overline{X_n} = \frac{1}{n} \sum_{i=1}^n X_i$$

and any $a > 0$ that we have

$$\lim_{n\to\infty} P(|\overline{X}_n - \mu| < a) = 1$$

:::

:::Theorem (Central limit theorem).

Let $X_1, X_2, \cdots, X_n$ be i.i.d. random variables with $\mathbb{E}(X_i) = \mu$ and finite $\mathbb{V}ar(X_i) = \sigma^2 < \infty$. Then it holds for the sum of these variables that

$$\lim_{n\to\infty} \sum_{1 \leq i \leq n} X_i \sim N(n\mu, n\sigma^2)$$

and for the sample mean it holds that

$$\lim_{n\to\infty} \frac{1}{n} \sum_{1 \leq i \leq n} X_i \sim N \left (\mu, \frac{\sigma^2}{n} \right )$$

:::

:::Lemma (Standardization).

A random variable follows the standard normal $Z \sim N(0,1)$ if $\mu = 0$ and $\sigma^2 = 1$. Its cdf is defined by $P(Z \leq z) = \phi(z)$. To standardize a random variable $X$ we calculate $Z = \frac{X-\mu}{\sigma}$. Then $Z \sim N(0,1)$.

:::

:::Lemma (68-95-99.7 rule).

Let $Z \sim N(0, 1)$. Then

  • $P(|Z| \leq 1\sigma) \approx 0.68$
  • $P(|Z| \leq 2\sigma) \approx 0.95$
  • $P(|Z| \leq 3\sigma) \approx 0.997$

:::

:::Lemma (Normal approximation of binomial).

Let $X \sim B(n,p)$ be a random variable following a binomial distribution where $\mu = np$ and $\sigma^2 = np(1-p)$. If $\mu$ and $\sigma^2$ are large (typically $\min{\mu, \sigma^2} \geq 10$), then probability $P(a \leq X \leq b)$ is fairly well approximated using continuity correction by

$$\begin{split} P\left(a - \frac{1}{2} \leq X \leq b + \frac{1}{2}\right) &= F\left(b + \frac{1}{2}\right) - F\left(a - \frac{1}{2}\right) \ &\approx \phi\left(\frac{b + \frac{1}{2}-\mu}{\sigma}\right) - \phi\left(\frac{a-\frac{1}{2}-\mu}{\sigma}\right) \ &\approx \phi\left(\frac{b + \frac{1}{2}-np}{\sqrt{np(1-p)}}\right) - \phi\left(\frac{a-\frac{1}{2}-np}{\sqrt{np(1-p)}}\right) \end{split}$$

We can use this for example if $X_1, \cdots, X_n \sim \text{ber}(p)$ and $S_n = \sum_{i=1}^n X_i \sim B(n,p)$, then

$$P(S_n \leq x) \approx \phi\left(\frac{x + \frac{1}{2}-np}{\sqrt{np(1-p)}}\right)$$

:::

:::Lemma (Normal approximation of Poisson).

Let $X \sim P(\lambda)$. Then if $\lambda > 15$ we can use the same method as above, apply a continuity correction and derive

$$P\left(a - \frac{1}{2} \leq X \leq b + \frac{1}{2}\right) \approx \phi\left(\frac{b + \frac{1}{2}-\lambda}{\sqrt{\lambda}}\right) - \phi\left(\frac{a-\frac{1}{2}-\lambda}{\sqrt{\lambda}}\right)$$

:::