Distributions

Fri 27 March 2026

by Daniel Melichar

in Crash Course

tagged statistics, probability, distributions, machine learning

Part of the Statistics for Machine Learning series.

Discrete Distributions
Continuous Distributions
Normal Distribution

Discrete Distributions

:::Definition (Bernoulli distribution).

A trial in an experiment that can result in $d = 2$ success or failure.

$X \sim \text{ber}(p)$ if $P(X = 1) = p$ and $P(X = 0) = 1-p$
$\mathbb{E}(X) = p$
$\mathbb{V}ar(X) = p-p^2 = p(1-p)$

:::

:::Definition (Binomial distribution).

Repeated $\text{ber}$ trials.

$X \sim \text{bin}(n,p)$ if for $x \in {0, 1, \cdots, n}$: $$P(X = x) = \binom{n}{x}p^x(1-p)^{n-x}$$
$\mathbb{E}(X) = np$
$\mathbb{V}ar(X) = np(1-p)$

:::

:::Definition (Multinomial distribution).

A trial in an experiment that can result in $d$ outcomes.

A random vector $\mathcal{X} = (X_1, \dots, X_d)$ is called multinomially distributed with $n$ number of trials and probabilities $p = (p_1, \dots, p_d) \in (0,1)^d$, denoted $\mathcal{X} \sim \text{mult}(n, p)$ if $$\begin{split} P(\mathcal{X} = (x_1, \dots, x_d)) &= \frac{n!}{x_1!\cdots x_d!} p_1^{x_1} \times \cdots \times p_k^{x_k} \ &= \binom{n}{x_1, \dots, x_d} \prod_{k=1}^d p_k^{x_k} \end{split}$$ where $(x_1, \dots, x_d) \in \mathbb{N}^d$ with $\sum_{k=1}^d x_k = n$ and $\sum_{k=1}^d p_k = 1$
$\mathbb{E}(X_k) = np_k$
$\mathbb{V}ar(X) = np_k(1-p_k)$

:::

:::Definition (Geometric distribution).

Total number of attempts before success.

$X \sim \text{geo}(p)$ if for $x \in {0, 1, \cdots, n}$: $$P(X = x) = (1-p)^x\cdot p$$
$\mathbb{E}(X) = \frac{1-p}{p}$
$\mathbb{V}ar(X) = \frac{1-p}{p^2}$

:::

:::Definition (Poisson distribution).

Intensity of something in time or space.

$X \sim \text{Poi}(\lambda)$ if for $x \in {0, 1, \cdots, n}$: $$P(X = x) = \frac{\lambda^x}{x!} \cdot e^{-\lambda}$$
$\mathbb{E}(X) = \lambda$
$\mathbb{V}ar(X) = \lambda$

:::

:::Definition (Discrete uniform distribution).

A finite number $n$ of values are equally likely to be observed.

$X \sim \mathcal{U}(a,b)$ for some integers $a, b$ with $a \leq b$
$\mathbb{E}(X = x) = 1/n$
$F(X) = \frac{\lfloor k \rfloor - a + 1}{n}$ for $k \in {a, a+1, \cdots, b-1, b}$
$\mathbb{V}ar(X) = \frac{(b-a+1)^2-1}{12}$

:::

Continuous Distributions

:::Definition (Continuous uniform distribution).

An experiment where there is an arbitrary outcome that lies between certain bounds.

$X \sim U(a,b)$ if its pdf is $$f(x) = \begin{cases} \frac{1}{b-a} & \text{if } x \in (a,b) \ 0 & \text{otherwise} \end{cases}$$
The cdf is given by $$F(x) = \begin{cases} 0 & x \leq a \ \frac{x-a}{b-a} & a < x < b \ 1 & x \geq b \end{cases}$$
$\mathbb{E}(X) = \frac{a+b}{2}$
$\mathbb{V}ar(X) = \frac{(b-a)^2}{12}$

:::

:::Definition (Exponential distribution).

Continuous analogue of geometric distribution.

$X \sim \exp(\lambda)$ for $\lambda > 0$
pdf is of the form $$f(x) = \begin{cases} \lambda e^{-\lambda x} & x \geq 0 \ 0 & \text{otherwise} \end{cases}$$
cdf is $$F(x) = \begin{cases} 1 - e^{-\lambda x} & x \geq 0 \ 0 & x < 0 \end{cases}$$
$\mathbb{E}(X) = 1/\lambda$
$\mathbb{V}ar(X) = 1/\lambda^2$

:::

:::Definition ($\chi^2$ distribution).

Let $Z_1, \dots, Z_d$ be i.i.d. random variables with $Z_i \sim N(0,1)$. A random variable $X$ is called $\chi^2$ distributed with $d$ degrees of freedom $X \sim \chi^2(d)$ if

$$X \sim Z_1^2 + \cdots + Z_d^2$$

where $X \geq 0$.

$\mathbb{E}(X) = d$
$\mathbb{V}ar(X) = 2d$

:::

Normal Distribution

:::Definition (Normal distribution).

A random variable is said to be (univariate) Gaussian or normal $X \sim N(\mu, \sigma^2)$ if its pdf is of the form

$$f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$$

Alternatively, we can use the following derivation by Herschel-Maxwell: The standard normal distribution $N(0, 1)$ has the form

$$\phi = \frac{1}{\sqrt{2\pi}}\exp \left (-\frac{1}{2}x^2 \right )$$

All normal distribution functions can be obtained by introducing the variance $\sigma$ and the mean $\mu$ as follows

$$P_{\text{norm}}(x\mid\mu,\sigma) = \frac{1}{\sigma}\phi\left(\frac{x-\mu}{\sigma}\right)$$

A random variable is called Gaussian $X \sim N(\mu, \sigma^2)$ if

$$P(a < X < b \mid \mu,\sigma) = \int_a^b P_{\text{norm}}(x\mid\mu,\sigma) dx$$

Some properties

Let $X \sim N(\mu, \sigma^2)$. If $Y = a + bX$ then $Y \sim N(a+b\mu, b^2\sigma^2)$. This is called an affine transformation.

:::

:::Theorem (Law of large numbers).

Let $X_1, X_2, \cdots, X_n$ be i.i.d. random variables with $\mathbb{E}(X_i) = \mu$ and $\mathbb{V}ar(X_i) = \sigma^2$. Then it holds with the sample mean

$$\overline{X_n} = \frac{1}{n} \sum_{i=1}^n X_i$$

and any $a > 0$ that we have

$$\lim_{n\to\infty} P(|\overline{X}_n - \mu| < a) = 1$$

:::

:::Theorem (Central limit theorem).

Let $X_1, X_2, \cdots, X_n$ be i.i.d. random variables with $\mathbb{E}(X_i) = \mu$ and finite $\mathbb{V}ar(X_i) = \sigma^2 < \infty$. Then it holds for the sum of these variables that

$$\lim_{n\to\infty} \sum_{1 \leq i \leq n} X_i \sim N(n\mu, n\sigma^2)$$

and for the sample mean it holds that

$$\lim_{n\to\infty} \frac{1}{n} \sum_{1 \leq i \leq n} X_i \sim N \left (\mu, \frac{\sigma^2}{n} \right )$$

:::

:::Lemma (Standardization).

A random variable follows the standard normal $Z \sim N(0,1)$ if $\mu = 0$ and $\sigma^2 = 1$. Its cdf is defined by $P(Z \leq z) = \phi(z)$. To standardize a random variable $X$ we calculate $Z = \frac{X-\mu}{\sigma}$. Then $Z \sim N(0,1)$.

:::

:::Lemma (68-95-99.7 rule).

Let $Z \sim N(0, 1)$. Then

$P(|Z| \leq 1\sigma) \approx 0.68$
$P(|Z| \leq 2\sigma) \approx 0.95$
$P(|Z| \leq 3\sigma) \approx 0.997$

:::

:::Lemma (Normal approximation of binomial).

Let $X \sim B(n,p)$ be a random variable following a binomial distribution where $\mu = np$ and $\sigma^2 = np(1-p)$. If $\mu$ and $\sigma^2$ are large (typically $\min{\mu, \sigma^2} \geq 10$), then probability $P(a \leq X \leq b)$ is fairly well approximated using continuity correction by

$$\begin{split} P\left(a - \frac{1}{2} \leq X \leq b + \frac{1}{2}\right) &= F\left(b + \frac{1}{2}\right) - F\left(a - \frac{1}{2}\right) \ &\approx \phi\left(\frac{b + \frac{1}{2}-\mu}{\sigma}\right) - \phi\left(\frac{a-\frac{1}{2}-\mu}{\sigma}\right) \ &\approx \phi\left(\frac{b + \frac{1}{2}-np}{\sqrt{np(1-p)}}\right) - \phi\left(\frac{a-\frac{1}{2}-np}{\sqrt{np(1-p)}}\right) \end{split}$$

We can use this for example if $X_1, \cdots, X_n \sim \text{ber}(p)$ and $S_n = \sum_{i=1}^n X_i \sim B(n,p)$, then

$$P(S_n \leq x) \approx \phi\left(\frac{x + \frac{1}{2}-np}{\sqrt{np(1-p)}}\right)$$

:::

:::Lemma (Normal approximation of Poisson).

Let $X \sim P(\lambda)$. Then if $\lambda > 15$ we can use the same method as above, apply a continuity correction and derive

$$P\left(a - \frac{1}{2} \leq X \leq b + \frac{1}{2}\right) \approx \phi\left(\frac{b + \frac{1}{2}-\lambda}{\sqrt{\lambda}}\right) - \phi\left(\frac{a-\frac{1}{2}-\lambda}{\sqrt{\lambda}}\right)$$

:::