A loss function is a measurable function that induces a risk functional whose minimizer defines the learned decision rule.
A loss function:
Given spaces $X,Y$ and a parametric model $f_\theta\colon X\to Y$ a point-wise loss function $\ell$ has the form:
$$ \ell\colon Y\times Y\to\R_{\ge 0} $$
The expected risk $\cal R(\theta)$ of parameter $\theta$ is then defined as:
$$ \cal R(\theta)=\Bbb E_{(x,y)\sim\cal D}\left[ \ell(y,f_\theta(x)) \right] $$
Since the true distribution $\cal D$ is unknown, we minimize empirical risk:
$$ \widehat{\cal R(\theta)}=\frac1n\sum^n_{i=1}\ell(y_i,f_\theta(x_i)) $$
Training is then:
$$ \theta_*=\argmin_\theta \widehat{\cal R(\theta)} $$
A loss function is a penalty:
$$ \ell(y,\hat y) $$
A learner seeks a decision rule minimizing expected loss.
The optimal predictor $f^*$ under loss $\ell$ is: