Independence and Uncorrelated Normal Random Variables

Table of Contents

Introduction

While I was studying for my Probabilistic Machine Learning class, I came across this interesting post about a question asked in the Comprehensive Examinations for the PhD in Statistics at Toronto University. The question is about the two normal random variables that are uncorrelated to each other, that is, \(\mathrm{Cov}{[X, Y]} = 0\). We are simply asked whether these variables are necessarily independent or not. This question made me realize an important distinction between individually (but not jointly) normally distributed random variables and jointly normally distributed random variables. Well, before we answer this question, let’s start with the essentials.

Definitions

Assuming we are all familiar with the concepts listed below, let’s go through the definitions once more to have a concrete understanding of what we are dealing with.

  • Expectation: The expectation of a random variable \(X\) is defined as \(\mathbb{E}{[X]} = \int_{-\infty}^{\infty} x f_X(x) dx\), where \(f_X(x)\) is the probability density function of \(X\). Similarly, for a discrete random variable \(X\), the expectation is defined as \(\mathbb{E}{[X]} = \sum_{x} x P(X = x)\). We will be using the definition expectation to define covariance in the next point.

  • Variance: The variance of a random variable \(X\) is defined as \(\sigma_X^2 = \mathbb{E}{[(X - \mathbb{E}{[X]})^2]}\). Variance is used to measure the spread of the data points around the mean.

  • Covariance: The covariance of two random variables \(X\) and \(Y\) is defined as \(\mathrm{Cov}{[X, Y]} = \mathbb{E}{[(X - \mathbb{E}{[X]})(Y - \mathbb{E}{[Y]})]}\). Covariance is used as a measure to estimate how change in one variable affects the change in another variable.

  • Correlation: The correlation of two random variables \(X\) and \(Y\) is defined as \(\rho_{X, Y} = \frac{\mathrm{Cov}{[X, Y]}}{\sigma_X \sigma_Y}\), where \(\sigma_X\) and \(\sigma_Y\) are the standard deviations of \(X\) and \(Y\) respectively. Correlation is used to measure the strength and direction of a linear relationship between two variables. Correlation can be thought of as a normalized version of covariance. Covariance has a unit of \(X \times Y\), whereas correlation is a unitless quantity.

In statistics, correlation refers to a linear relationship between the variables.

  • Independence: Two random variables \(X\) and \(Y\) are said to be independent if \(P(X = x, Y = y) = P(X = x) P(Y = y)\) for all \(x\) and \(y\). In the continuous case, probability measure is replaced with probability density.

Correlation and Independence

First of all, let’s establish the relationship between independence and correlation. If two random variables are independent, then they are uncorrelated - let’s prove it first for the continuous case (proof is exactly the same for the discrete case with some minor changes).:

If \(X\) and \(Y\) are independent continuous random variables, then \(f(X = x, Y = y) = f(X = x) f(Y = y)\) by definition. \(\mathbb{E}{[X, Y]}\) can be simplified as follows:

\[\begin{aligned} \mathbb{E}{[X, Y]} &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} x y f_{X, Y}(x, y) dx dy \\ &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} x y f_X(x) f_Y(y) dx dy \\ &= \int_{-\infty}^{\infty} x f_X(x) dx \int_{-\infty}^{\infty} y f_Y(y) dy \\ &= \mathbb{E}{[X]} \mathbb{E}{[Y]} \end{aligned}\]

Recall the definition of covariance, \(\mathrm{Cov}{[X, Y]} = \mathbb{E}{[X, Y]} - \mathbb{E}{[X]} \mathbb{E}{[Y]}\). If \(X\) and \(Y\) are independent, then \(\mathrm{Cov}{[X, Y]} = 0\). Great! Now comes an important question: if \(X\) and \(Y\) are uncorrelated, are they necessarily independent? The answer is no. We can disprove it by providing a counterexample. Let’s consider the following example:

Let’s say we have a random variable \(X\) with a symmetric probability density, that is, \(p(x) = p(-x)\). Also assume that there is another random variable \(Y = f(X)\) such that \(f\) is an even function, that is, \(f(x) = f(-x)\). Now my claim is that these guys are uncorrelated but not independent. Let’s prove it:

Since the random variable \(X\) has a symmetric probability density, we can say that \(\mathbb{E}{[X]} = 0\) (negative and positive values will cancel out each other). Let’s calculate \(\mathbb{E}{[XY]}\):

\[\begin{aligned} \mathbb{E}{[XY]} &= \mathbb{E}{[X f(X)]} \\ &= \int_{-\infty}^{\infty} x f(x) p(x) dx \\ &= \int_{-\infty}^{0} x f(x) p(x) dx + \int_{0}^{\infty} x f(x) p(x) dx \\ &= \underbrace {\int_{-\infty}^{0} x f(-x) p(-x) dx}_\textrm{let $u=-x$, then $du = -dx$} + \int_{0}^{\infty} x f(x) p(x) dx \;\;\;\; \text{(using symmetry)} \\ &= {\int_{\infty}^{0} -u f(u) p(u) - du} + \int_{0}^{\infty} x f(x) p(x) dx \\ &= {\int_{\infty}^{0} u f(u) p(u) du} + \int_{0}^{\infty} x f(x) p(x) dx \\ &= -{\int_{0}^{\infty} x f(x) p(x) dx} + \int_{0}^{\infty} x f(x) p(x) dx \\ &= 0 \\ \end{aligned}\]

We have shown that \(\mathbb{E}{[XY]} = 0\), and we know that \(\mathbb{E}{[X]} = 0\), as a result \(\mathrm{Cov}{[X, Y]} = \mathbb{E}{[XY]} - \mathbb{E}{[X]} \mathbb{E}{[Y]} = 0\).

However, \(X\) and \(Y\) are not necessarily independent. Firstly, we cannot directly conclude that \(Y\) and \(X\) are dependent because \(Y\) is a function of \(X\). We can set \(f(\cdot)\) to be a constant function - which is still an even function. Therefore it is better to construct an example to show that that \(X\) and \(Y\) do not have to be independent. Think about the case when \(X \sim U[-2, 2]\) and \(Y = X^2\). Please observe that this example complies with our construction above. Now, is \(P(Y > 1 \mid -1 < X < 1) = P(Y > 1)?\). The answer is no. \(P(Y > 1 \mid -1 < X < 1) = 0\), but \(P(Y > 1) = \frac{1}{2}\). Therefore, \(X\) and \(Y\) are not independent but can be uncorrelated.

Independence requires correlation to be zero. However, (linearly) uncorrelated random variables do not have to be independent.

Gaussian Distribution

Now, to focus on the question, let’s remind ourselves univariate and multivariate Gaussian distributions. In the one-dimensional case, a random variable \(X\) is said to have a Gaussian distribution if its probability density function is given by:

\[f_X(x) = \frac{1}{\sqrt{2\pi} \sigma} \exp{\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)}\]

where \(\mu\) is the mean and \(\sigma\) is the standard deviation of the random variable \(X\). The Gaussian distribution is also known as the normal distribution. Multivariate Gaussian, on the other hand, is a generalization of the univariate Gaussian distribution to higher dimensions. A random vector \(\mathbf{X} = [X_1, X_2, \ldots, X_d]^T\) is said to have a multivariate Gaussian distribution if its probability density function is given by:

\[f_{\mathbf{X}}(\mathbf{x}) = \frac{1}{(2\pi)^{d/2} |\Sigma|^{1/2}} \exp{\left(-\frac{1}{2} (\mathbf{x} - \mathbf{\mu})^T \Sigma^{-1} (\mathbf{x} - \mathbf{\mu})\right)}\]

where \(\mathbf{\mu} = [\mu_1, \mu_2, \ldots, \mu_d]^T\) is the mean vector, \(\Sigma\) is the covariance matrix, and we use the determinant of it. When the different dimensions of the random vector are uncorrelated, the covariance matrix is diagonal. In this case, we can set \(\Sigma = \text{diag}(\sigma_1^2, \sigma_2^2, \ldots, \sigma_d^2) = \text{diag}(\sigma^2)\). Inverse of a diagonal matrix is again a diagonal matrix with inverse of each entry: \(\Sigma^{-1} = \text{diag}(\frac{1}{\sigma_1^2}, \frac{1}{\sigma_2^2}, \ldots, \frac{1}{\sigma_d^2}) = \text{diag}(\frac{1}{\sigma^2})\). Determinant of a diagonal matrix is the product of its diagonal entries: \(det(\Sigma) = \sigma_1^2 \sigma_2^2 \ldots \sigma_d^2\). We can plug in these values to the multivariate Gaussian distribution to simplify it when the random variables are uncorrelated:

\[f_{\mathbf{X}}(\mathbf{x}) = \frac{1}{(2\pi)^{d/2} (\sigma_1^2 \sigma_2^2 \ldots \sigma_d^2)^{d/2}} \exp{\left( -\frac{1}{2} (\mathbf{x} - \mathbf{\mu})^T \text{diag}(\frac{1}{\sigma^2}) (\mathbf{x} - \mathbf{\mu})\right)}\]

Now, we can actually decompose this expression into a product of univariate Gaussian distributions. Let’s start with the left part:

\[\begin{aligned} \frac{1}{(2\pi)^{d/2} (\sigma_1^2 \sigma_2^2 \ldots \sigma_d^2)^{d/2}} &= \frac{1}{(2\pi)^{d/2} \sigma_1 \sigma_2 \ldots \sigma_d} \\ &= \frac{1}{\sqrt{2\pi} \sigma_1} \frac{1}{\sqrt{2\pi} \sigma_2} \ldots \frac{1}{\sqrt{2\pi} \sigma_d} \\ &= \prod_{i=1}^{d} \frac{1}{\sqrt{2\pi} \sigma_i} \\ \end{aligned}\]

Great, let’s not forget that. Now, let’s focus on the exponent part:

\[\begin{aligned} (\mathbf{x} - \mathbf{\mu})^T \text{diag}(\frac{1}{\sigma^2}) (\mathbf{x} - \mathbf{\mu}) &= \begin{bmatrix} x_1 - \mu_1 & x_2 - \mu_2 & \ldots & x_d - \mu_d \end{bmatrix} \begin{bmatrix} \frac{1}{\sigma_1^2} & 0 & \ldots & 0 \\ 0 & \frac{1}{\sigma_2^2} & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & \frac{1}{\sigma_d^2} \end{bmatrix} \begin{bmatrix} x_1 - \mu_1 \\ x_2 - \mu_2 \\ \vdots \\ x_d - \mu_d \end{bmatrix} \\ &= \begin{bmatrix} x_1 - \mu_1 & x_2 - \mu_2 & \ldots & x_d - \mu_d \end{bmatrix} \begin{bmatrix} \frac{x_1 - \mu_1}{\sigma_1^2} \\ \frac{x_2 - \mu_2}{\sigma_2^2} \\ \vdots \\ \frac{x_d - \mu_d}{\sigma_d^2} \end{bmatrix} \\ &= \frac{(x_1 - \mu_1)^2}{\sigma_1^2} + \frac{(x_2 - \mu_2)^2}{\sigma_2^2} + \ldots + \frac{(x_d - \mu_d)^2}{\sigma_d^2} \\ &= \sum_{i=1}^{d} \frac{(x_i - \mu_i)^2}{\sigma_i^2} \\ \end{aligned}\]

Inserting our findings back to the equation yields:

\[f_{\mathbf{X}}(\mathbf{x}) = \prod_{i=1}^{d} \frac{1}{\sqrt{2\pi} \sigma_i} \exp{\left(-\frac{1}{2}\frac{(x_i - \mu_i)^2}{\sigma_i^2}\right)}\]

This is the product of univariate Gaussian distributions. We have shown that when the random variables are uncorrelated, the multivariate Gaussian distribution can be decomposed into a product of univariate Gaussian distributions. But what does it mean? It means that \(f_{\mathbf{X}}(\mathbf{x}) = f_{X_1}(x_1) f_{X_2}(x_2) \ldots f_{X_d}(x_d)\), which implies that the random variables are independent (this is exactly the definition of independence).

If two random variables \(X\) and \(Y\) are bivariate normal (jointly normal) and uncorrelated, then they are independent.

Uncorrelated Normal Random Variables

Now comes our question: if two normal random variables are uncorrelated, are they necessarily independent? The answer is no. We have shown that if two jointly normal random variables are uncorrelated, they are independent. But when we relax the assumption of joint normality, the statement does not hold. Even if two (individually) normally distributed random variables are uncorrelated, they do not have to be independent. Let’s show this with the counterexample from the original post:

Say, we have three random variables \(X\), \(Y\) and \(Z\). Let \(X \sim \mathcal{N}(0,\,1)\), \(Z \in \{-1, 1\}\) with equal probabilities and \(Y = Z X\). Now, is \(Y\) normally distributed? Since standard normal distribution is symmetric around zero, \(Y = X\) and \(Y = -X\) are both standard normal random variables. \(Y\) is the sum of standard normal random variables with equal probabilities, which is also a standard normal random variable. Having said that, now let’s calculate the covariance of \(X\) and \(Y\):

\[\begin{aligned} \mathrm{Cov}{[X, Y]} &= \mathbb{E}{[XY]} - \mathbb{E}{[X]} \mathbb{E}{[Y]} \\ &= \mathbb{E}{[X^2 Z]} - 0 \cdot 0 \\ &= \mathbb{E}{[X^2 Z]} \\ &= \mathbb{E}{[X^2]} \mathbb{E}{[Z]} \\ &= 1 \cdot 0 \\ &= 0 \end{aligned}\]

So we have two (individually) normally distributed and uncorrelated random variables. Are they independent? No! \(P(Y > 0.5 \mid -0.5 < X < 0.5) = 0\), but \(P(Y > 0.5) \neq 0\). Therefore, \(X\) and \(Y\) are not independent but uncorrelated.

In summary, it’s important to note that two random variables, each following a normal distribution do not always jointly adhere to a bivariate normal distribution, nor does a covariance of zero imply independence. But when we have a bivariate normal distribution, uncorrelated random variables are independent.

References: Rosenthal, J. S. (2005). Uncorrelated does not imply Independence. Retrieved from http://probability.ca/jeff/teaching/uncornor.html