Maximum likelihood estimation (MLE) is a method for estimating model $p(x\mid\theta)$ parameters $\theta$ by choosing values that maximizes the probability of the data:

$$ \hat\theta_\text{MAP}=\argmax_\theta L(\theta) $$

where $L(\theta)$ is the likelihood.

Equivalently:

Minimizes negative log-likelihood.
Minimizes KL-divergence from empirical distribution to the model distribution.
In exponential families, often corresponds to matching sufficient statistics.

Example: Gaussian mean

Suppose $x_i\sim\cal N(\mu,\sigma^2)$ with variance $\sigma^2$. Maximizing log-likelihood:

$$ \ell(\mu) = -\frac1{2\sigma^2}\sum(x_i-\mu)^2 + \text{const} $$

is equivalent to minimizing squared error.

The result is that the MLE is the sample mean:

$$ \hat\mu_\text{MLE}=\frac12\sum x_i $$

Machine Learning

In supervised learning:

Regression → minimize squared error (Gaussian MLE)
Logistic regression → maximize Bernoulli likelihood