This is an overview of how to derive the likelihood from a dataset.
Suppose we have a set of data scattered in higher dimensions that we want to fit a multivariate normal distribution (Gaussian).
The arbitrary pdf of such data is:
Where
Suppose we have a dataset
This way we can rewrite the independent probabilities as:
Due to the fact that we want to maximize/minimize, it is helpful to look at the log of the probabilities which simplifies the calculations.
To minimize the log likelihood function, we will set the derivative to zero. The partial derivative will look like this:
Assuming that
For this part we have to understand some properties of vector calculus that will be relevant.
To compute this, we will refer back to the identity of invertible matrices.
Taking the derivative here gives:
With a proof by induction, we can show this for an arbitrary 2x2 matrix:
If we rearrange the terms by pairing every other term, we get:
This is known as Jacobi's formula and the general formula goes like this:
Where
The trace has a nice circular property such that
NOTE the trace of a scalar is the same as itself. We can see that in the generalized quadratic form using matrix algebra.
Suppose we have the vector valued function
It turns out that in order to convert the output to the dimensionality required for the derivative, the combined partial sums of the derivative are equal to the trace:
Now that we have all the tools we need, we shall try to set the partial of the log likelihood with respect to the covariance to zero.
By the derivative of multi-variate functions we know that the combined partial sums evaluate to the trace:
We remember from the derivative of the inverse of a matrix and the circular trace result, we arrive at the following:
Now we can set the derivative equal to zero:
Right multiply by
Here is the python code that computes the MLE for a given dataset:
from pprint import pprint
def mean_vector(data: list[list[float]]) -> list[float]:
"""
Compute the mean vector of the data.
Parameters:
data (list of list of floats): A list of lists where each inner list is a data point.
Returns:
list of floats: The mean vector.
"""
# Number of samples and features
num_samples = len(data)
num_features = len(data[0])
# Initialize mean vector with zeros
mean = [0] * num_features
# Calculate the sum for each feature
for i in range(num_samples):
for j in range(num_features):
mean[j] += data[i][j]
# Compute the mean by dividing by the number of samples
mean = [x / num_samples for x in mean]
return mean
def covariance_matrix(data: list[list[float]], mean: list[float]) -> list[list[float]]:
"""
Compute the covariance matrix of the data given the mean vector.
Parameters
----------
data: list[list[float]]
A list of lists where each inner list is a data point.
mean: list[float]
The mean vector.
Returns
-------
list[list[float]]
The covariance matrix.
"""
num_samples = len(data)
num_features = len(data[0])
# Initialize covariance matrix with zeros
covariance = [[0] * num_features for _ in range(num_features)]
# Compute the covariance matrix
for i in range(num_features):
for j in range(num_features):
cov_sum = 0
for k in range(num_samples):
cov_sum += (data[k][i] - mean[i]) * (data[k][j] - mean[j])
covariance[i][j] = cov_sum / (num_samples - 1) # Use n-1 for sample covariance
return covariance
# Example usage
data = [
[2.5, 3.5, 4.5],
[3.0, 3.0, 4.0],
[2.0, 4.0, 5.0],
[3.5, 3.5, 4.5],
[3.0, 4.5, 5.5]
]
mean = mean_vector(data)
covariance = covariance_matrix(data, mean)
print("MLE Mean Vector:")
pprint(mean)
print("MLE Covariance Matrix:")
pprint(covariance)
MLE Mean Vector:
[2.8, 3.7, 4.7]
MLE Covariance Matrix:
[[0.325, -0.07499999999999998, -0.07499999999999998],
[-0.07499999999999998, 0.32499999999999996, 0.32499999999999996],
[-0.07499999999999998, 0.32499999999999996, 0.32499999999999996]]