Maximum likelihood estimation (MLE) is a method for estimating model $p(x\mid\theta)$ parameters $\theta$ by choosing values that maximizes the probability of the data:

$$ \hat\theta_\text{MAP}=\argmax_\theta L(\theta) $$

where $L(\theta)$ is the likelihood.

Equivalently:

Example: Gaussian mean

Suppose $x_i\sim\cal N(\mu,\sigma^2)$ with variance $\sigma^2$. Maximizing log-likelihood:

$$ \ell(\mu) = -\frac1{2\sigma^2}\sum(x_i-\mu)^2 + \text{const} $$

is equivalent to minimizing squared error.

The result is that the MLE is the sample mean:

$$ \hat\mu_\text{MLE}=\frac12\sum x_i $$

Machine Learning

In supervised learning: