A metric or distance function is a function \(d(x,y)\) that defines the distance between elements of a set as a nonnegative real number. If the distance is zero, both elements are equivalent under that specific metric. Distance functions thus provide a way to measure how close two elements are, where elements do not have to be numbers but can also be vectors, matrices or arbitrary objects. Distance functions are often used as error or cost functions to be minimized in an optimization problem.
There are multiple ways to define a metric on a set. A typical distance for real numbers is the absolute difference, \(d : (x, y) \mapsto xy\). But a scaled version of the absolute difference, or even \(d(x, y) = \begin{cases} 0 &\mbox{if } x = y \\ 1 & \mbox{if } x \ne y. \end{cases}\) are valid metrics as well. Every normed vector space induces a distance given by \(d(\vec x, \vec y) = \\vec x  \vec y\\).
Math.NET Numerics provides the following distance functions on vectors and arrays:
The sum of absolute difference is equivalent to the \(L_1\)norm of the difference, also known as Manhattan or Taxicabnorm.
The abs
function makes this metric a bit complicated to deal with analytically, but it is more robust than SSD.
\[d_{\mathbf{SAD}} : (x, y) \mapsto \xy\_1 = \sum_{i=1}^{n} x_iy_i\]

The sum of squared difference is equivalent to the squared \(L_2\)norm, also known as Euclidean norm.
It is therefore also known as Squared Euclidean distance.
This is the fundamental metric in least squares problems and linear algebra. The absence of the abs
function makes this metric convenient to deal with analytically, but the squares cause it to be very
sensitive to large outliers.
\[d_{\mathbf{SSD}} : (x, y) \mapsto \xy\_2^2 = \langle xy, xy\rangle = \sum_{i=1}^{n} (x_iy_i)^2\]

The mean absolute error is a normalized version of the sum of absolute difference.
\[d_{\mathbf{MAE}} : (x, y) \mapsto \frac{d_{\mathbf{SAD}}}{n} = \frac{\xy\_1}{n} = \frac{1}{n}\sum_{i=1}^{n} x_iy_i\]

The mean squared error is a normalized version of the sum of squared difference.
\[d_{\mathbf{MSE}} : (x, y) \mapsto \frac{d_{\mathbf{SSD}}}{n} = \frac{\xy\_2^2}{n} = \frac{1}{n}\sum_{i=1}^{n} (x_iy_i)^2\]

The euclidean distance is the \(L_2\)norm of the difference, a special case of the Minkowski distance with p=2. It is the natural distance in a geometric interpretation.
\[d_{\mathbf{2}} : (x, y) \mapsto \xy\_2 = \sqrt{d_{\mathbf{SSD}}} = \sqrt{\sum_{i=1}^{n} (x_iy_i)^2}\]

The Manhattan distance is the \(L_1\)norm of the difference, a special case of the Minkowski distance with p=1 and equivalent to the sum of absolute difference.
\[d_{\mathbf{1}} \equiv d_{\mathbf{SAD}} : (x, y) \mapsto \xy\_1 = \sum_{i=1}^{n} x_iy_i\]

The Chebyshev distance is the \(L_\infty\)norm of the difference, a special case of the Minkowski distance where p goes to infinity. It is also known as Chessboard distance.
\[d_{\mathbf{\infty}} : (x, y) \mapsto \xy\_\infty = \lim_{p \rightarrow \infty}\bigg(\sum_{i=1}^{n} x_iy_i^p\bigg)^\frac{1}{p} = \max_{i} x_iy_i\]

The Minkowski distance is the generalized \(L_p\)norm of the difference. The contour plot on the left demonstrates the case of p=3.
\[d_{\mathbf{p}} : (x, y) \mapsto \xy\_p = \bigg(\sum_{i=1}^{n} x_iy_i^p\bigg)^\frac{1}{p}\]

The Canberra distance is a weighted version of the Manhattan distance, introduced and refined 1967 by Lance, Williams and Adkins. It is often used for data scattered around an origin, as it is biased for measures around the origin and very sensitive for values close to zero.
\[d_{\mathbf{CAD}} : (x, y) \mapsto \sum_{i=1}^{n} \frac{x_iy_i}{x_i+y_i}\]

The cosine distance contains the dot product scaled by the product of the Euclidean distances from the origin. It represents the angular distance of two vectors while ignoring their scale.
\[d_{\mathbf{cos}} : (x, y) \mapsto 1\frac{\langle x, y\rangle}{\x\_2\y\_2} = 1\frac{\sum_{i=1}^{n} x_i y_i}{\sqrt{\sum_{i=1}^{n} x_i^2}\sqrt{\sum_{i=1}^{n} y_i^2}}\]

The Pearson distance is a correlation distance based on Pearson's productmomentum correlation coefficient of the two sample vectors. Since the correlation coefficient falls between [1, 1], the Pearson distance lies in [0, 2] and measures the linear relationship between the two vectors.
\[d_{\mathbf{Pearson}} : (x, y) \mapsto 1  \mathbf{Corr}(x, y)\]

The hamming distance represents the number of entries in the two sample vectors which are different. It is a fundamental distance measure in information theory but less relevant in noninteger numerical problems.
