Random Variables & Probability Functions

Fri 27 March 2026

by Daniel Melichar

in Crash Course

tagged statistics, probability, machine learning

Part of the Statistics for Machine Learning series.

Probability Functions

Probability Functions

:::Definition (Random variable).

A function $X: \Omega \to \mathbb{R}$ is called a random variable. We have that for all $\omega \in \Omega: X(\omega) = x$ for some $x \in \mathbb{R}$. We call $x$ the realization of $X$, and the set of values of $X$ the image or feature space. If the image space is finite or countably finite, we call it discrete, and if it is defined through one or more intervals and infinite, we call it continuous.

:::

:::Definition (PMF and PDF).

The probability mass function of a discrete random variable is the function

$$P(X = a) = p(a)$$

for some $a \in \mathbb{R}$. It holds that $0 \leq p(a) \leq 1$.

The probability density function $f(x)$ of a continuous random variable must for any $c \leq d$ have that

$$P(c \leq X \leq d) = \int_c^d f(x) dx$$

Furthermore, it must hold that $f$ is non-negative and that the area under $f$ is $\int f(x) dx = 1$.

:::

:::Observation

A probability function is a probability measure that adds up to $1$.

:::

:::Definition (CDF).

The cumulative distribution function of a random variable $X$ is the function $F_X$ defined by

$$F_X(x) = P(X \leq x)$$

for all $x \in \mathbb{R}$. It must satisfy the following properties

$0 \leq F(x) \leq 1$ for all $x \in \mathbb{R}$
$x \leq y \implies F(x) \leq F(y)$ monotonically increasing
$\lim_{x \to -\infty} F(x) = 0$ and $\lim_{x \to +\infty} F(x) = 1$
$F(x)$ is right continuous
$P(X > x) = 1 - P(X \leq x) = 1 - F(x)$
$P(a \leq X \leq b) = F(b) - F(a)$

The inverse of a CDF is given by

$$F^{-1}(p) = \min {x \mid F(x) \geq p} \qquad \text{for } p \in (0,1)$$

:::

:::Definition (Quantile).

A function that assigns the value $F^{-1}(p)$ to every $p \in (0, 1)$ is called quantile function

$$x_p = F^{-1}(p) \quad \text{for } p \in (0,1) \qquad \Leftrightarrow \qquad F(x_p) = p \quad \text{for } p \in (0,1)$$

Then $x_p$ is a $p$-quantile of $F$. Some common quantiles are $x_{0.5}$ the median, $x_{0.25}$ the lower quartile, $x_{0.75}$ the upper quartile.

:::

:::Definition (Expected value).

For $X$ discrete we have

$$\mathbb{E}(X) = \sum_{i = 1}^\infty x_i \cdot p(x_i)$$

For $X$ continuous we have

$$\mathbb{E}(X) = \int_{-\infty}^{+\infty} x \cdot f(x) dx$$

This is called a moment of order one and has the properties:

$\mathbb{E}(aX+b) = a\mathbb{E}(X)+b$
$\mathbb{E}(aX+bY) = a\mathbb{E}(X) + b\mathbb{E}(Y)$
$\mathbb{E}(X \cdot Y) = \mathbb{E}(X) \cdot \mathbb{E}(Y)$ if $X, Y$ independent

If we have some realization $x \in \mathbb{R}$ of a random variable $X$ we typically denote this as

$$\mathbb{E}(X = x) = \mu$$

:::

:::Definition (Covariance).

The covariance measures linear dependence. The covariance is given by

$$\begin{split} \mathbb{C}ov(X,Y) = \sigma_{XY} &= \mathbb{E}( (X-\mathbb{E}(X)) \cdot (Y - \mathbb{E}(Y))) \ &= \mathbb{E}(X \cdot Y) - \mathbb{E}(X) \cdot \mathbb{E}(Y) \end{split}$$

It has the following properties

$\mathbb{C}ov(X,X) = \sigma_{XX} = \sigma_X^2 = \mathbb{V}ar(X)$
$\mathbb{C}ov(aX+b,cY+d) = ac\mathbb{C}ov(X,Y)$
$\mathbb{C}ov(X_1+X_2, Y_1+Y_2) = \mathbb{C}ov(X_1,Y_1)+\mathbb{C}ov(X_1,Y_2)+\mathbb{C}ov(X_2,Y_1)+\mathbb{C}ov(X_2,Y_2)$
$X, Y$ are independent $\implies$ $\mathbb{C}ov(X,Y) = 0$. BUT (!!!): $\mathbb{C}ov(X,Y) = 0 \not\implies$ $X, Y$ are independent

:::

:::Definition (Variance).

$$\begin{split} \mathbb{V}ar(X) = \sigma_X^2 &= \mathbb{E}((X-\mathbb{E}(X))^2) \ &= \mathbb{E}(X^2) - \mathbb{E}(X)^2 \end{split}$$

The moment of second order with the following properties

$\mathbb{V}ar(aX+b) = a^2\mathbb{V}ar(X)$
$\mathbb{V}ar(X+Y) = \mathbb{V}ar(X) + \mathbb{V}ar(Y)$ if $X, Y$ independent
$\mathbb{V}ar(X+Y) = \mathbb{V}ar(X) + \mathbb{V}ar(Y) + 2\mathbb{C}ov(X,Y)$ if not independent

:::

:::Definition (Correlation coefficient).

$\rho(X,Y)$ is the covariance of the standardized versions of $X$ and $Y$, i.e.

$$\begin{split} \rho(X,Y) &= \mathbb{C}ov \left (\frac{X-\mu_X}{\sigma_X}, \frac{Y-\mu_Y}{\sigma_Y} \right ) \ &= \frac{\mathbb{C}ov(X,Y)}{\sqrt{\mathbb{V}ar(X)}\cdot\sqrt{\mathbb{V}ar(Y)}} \ &= \frac{\sigma_{XY}}{\sigma_X\sigma_Y} \end{split}$$

:::