Maximum likelihood estimation (MLE) is a method for estimating model $p(x\mid\theta)$ parameters $\theta$ by choosing values that maximizes the probability of the data:
$$ \hat\theta_\text{MAP}=\argmax_\theta L(\theta) $$
where $L(\theta)$ is the likelihood.
Equivalently:
Suppose $x_i\sim\cal N(\mu,\sigma^2)$ with variance $\sigma^2$. Maximizing log-likelihood:
$$ \ell(\mu) = -\frac1{2\sigma^2}\sum(x_i-\mu)^2 + \text{const} $$
is equivalent to minimizing squared error.
The result is that the MLE is the sample mean:
$$ \hat\mu_\text{MLE}=\frac12\sum x_i $$
In supervised learning: