Curve Fitting: Linear Regression

Regression is all about fitting a low order parametric model or curve to data, so we can reason about it or make predictions on points not covered by the data. Both data and model are known, but we'd like to find the model parameters that make the model fit best or good enough to the data according to some metric.

We may also be interested in how well the model supports the data or whether we better look for another more appropriate model.

In a regression, a lot of data is reduced and generalized into a few parameters. The resulting model can obviously no longer reproduce all the original data exactly - if you need the data to be reproduced exactly, have a look at interpolation instead.

Simple Regression: Fit to a Line

In the simplest yet still common form of regression we would like to fit a line \(y : x \mapsto a + b x\) to a set of points \((x_j,y_j)\), where \(x_j\) and \(y_j\) are scalars. Assuming we have two double arrays for x and y, we can use Fit.Line to evaluate the \(a\) and \(b\) parameters of the least squares fit:

double[] xdata = new double[] { 10, 20, 30 };
double[] ydata = new double[] { 15, 20, 25 };

Tuple<double, double> p = Fit.Line(xdata, ydata);
double a = p.Item1; // == 10; intercept
double b = p.Item2; // == 0.5; slope

Or in F#:

let a, b = Fit.Line ([|10.0;20.0;30.0|], [|15.0;20.0;25.0|])

How well do these parameters fit the data? The data points happen to be positioned exactly on a line. Indeed, the coefficient of determination confirms the perfect fit:

GoodnessOfFit.RSquared(xdata.Select(x => a+b*x), ydata); // == 1.0

Linear Model

In practice, a line is often not an adequate model. But if we can choose a model that is linear, we can leverage the power of linear algebra; otherwise we have to resort to iterative methods (see Nonlinear Optimization).

A linear model can be described as linear combination of \(N\) arbitrary but known functions \(f_i(x)\), scaled by the model parameters \(p_i\). Note that none of the functions \(f_i\) depends on any of the \(p_i\) parameters.

\[y : x \mapsto p_1 f_1(x) + p_2 f_2(x) + \cdots + p_N f_N(x)\]

If we have \(M\) data points \((x_j,y_j)\), then we can write the regression problem as an overdefined system of \(M\) equations:

\[\begin{eqnarray} y_1 &=& p_1 f_1(x_1) + p_2 f_2(x_1) + \cdots + p_N f_N(x_1) \\ y_2 &=& p_1 f_1(x_2) + p_2 f_2(x_2) + \cdots + p_N f_N(x_2) \\ &\vdots& \\ y_M &=& p_1 f_1(x_M) + p_2 f_2(x_M) + \cdots + p_N f_N(x_M) \end{eqnarray}\]

Or in matrix notation with the predictor matrix \(X\) and the response \(y\):

\[\begin{eqnarray} \mathbf y &=& \mathbf X \mathbf p \\ \begin{bmatrix}y_1\\y_2\\ \vdots \\y_M\end{bmatrix} &=& \begin{bmatrix}f_1(x_1) & f_2(x_1) & \cdots & f_N(x_1)\\f_1(x_2) & f_2(x_2) & \cdots & f_N(x_2)\\ \vdots & \vdots & \ddots & \vdots\\f_1(x_M) & f_2(x_M) & \cdots & f_N(x_M)\end{bmatrix} \begin{bmatrix}p_1\\p_2\\ \vdots \\p_N\end{bmatrix} \end{eqnarray}\]

Provided the dataset is small enough, if transformed to the normal equation \(\mathbf{X}^T\mathbf y = \mathbf{X}^T\mathbf X \mathbf p\) this can be solved efficiently by the Cholesky decomposition (do not use matrix inversion!).

Vector<double> p = MultipleRegression.NormalEquations(X, y);

Using normal equations is comparably fast as it can dramatically reduce the linear algebra problem to be solved, but that comes at the cost of less precision. If you need more precision, try using MultipleRegression.QR or MultipleRegression.Svd instead, with the same arguments.

Polynomial Regression

To fit to a polynomial we can choose the following linear model with \(f_i(x) := x^i\):

\[y : x \mapsto p_0 + p_1 x + p_2 x^2 + \cdots + p_N x^N\]

The predictor matrix of this model is the Vandermonde matrix. There is a special function in the Fit class for regressions to a polynomial, but note that regression to high order polynomials is numerically problematic.

double[] p = Fit.Polynomial(xdata, ydata, 3); // polynomial of order 3

Multiple Regression

The \(x\) in the linear model can also be a vector \(\mathbf x = [x^{(1)}\; x^{(2)} \cdots x^{(k)}]\) and the arbitrary functions \(f_i(\mathbf x)\) can accept vectors instead of scalars.

If we use \(f_i(\mathbf x) := x^{(i)}\) and add an intercept term \(f_0(\mathbf x) := 1\) we end up at the simplest form of ordinary multiple regression:

\[y : x \mapsto p_0 + p_1 x^{(1)} + p_2 x^{(2)} + \cdots + p_N x^{(N)}\]

For example, for the data points \((\mathbf{x}_j = [x^{(1)}_j\; x^{(2)}_j], y_j)\) with values ([1,4],15), ([2,5],20) and ([3,2],10) we can evaluate the best fitting parameters with:

double[] p = Fit.MultiDim(
    new[] {new[] { 1.0, 4.0 }, new[] { 2.0, 5.0 }, new[] { 3.0, 2.0 }},
    new[] { 15.0, 20, 10 },
    intercept: true);

The Fit.MultiDim routine uses normal equations, but you can always choose to explicitly use e.g. the QR decomposition for more precision by using the MultipleRegression class directly:

double[] p = MultipleRegression.QR(
    new[] {new[] { 1.0, 4.0 }, new[] { 2.0, 5.0 }, new[] { 3.0, 2.0 }},
    new[] { 15.0, 20, 10 },
    intercept: true);

Arbitrary Linear Combination

In multiple regression, the functions \(f_i(\mathbf x)\) can also operate on the whole vector or mix its components arbitrarily and apply any functions on them, provided they are defined at all the data points. For example, let's have a look at the following complicated but still linear model in two dimensions:

\[z : (x, y) \mapsto p_0 + p_1 \mathrm{tanh}(x) + p_2 \psi(x y) + p_3 x^y\]

Since we map (x,y) to (z) we need to organize the tuples in two arrays:

double[][] xy = new[] { new[]{x1,y1}, new[]{x2,y2}, new[]{x3,y3}, ...  };
double[] z = new[] { z1, z2, z3, ... };

Then we can call Fit.LinearMultiDim with our model, which will return an array with the best fitting 4 parameters \(p_0, p_1, p_2, p_3\):

double[] p = Fit.LinearMultiDim(xy, z,
    d => 1.0,                                  // p0*1.0
    d => Math.Tanh(d[0]),                      // p1*tanh(x)
    d => SpecialFunctions.DiGamma(d[0]*d[1]),  // p2*psi(x*y)
    d => Math.Pow(d[0], d[1]));                // p3*x^y

Evaluating the model at specific data points

Let's say we have the following model:

\[y : x \mapsto a + b \ln x\]

For this case we can use the Fit.LinearCombination function:

double[] p = Fit.LinearCombination(
    new[] {61.0, 62.0, 63.0, 65.0},
    new[] {3.6,3.8, 4.8, 4.1},
    x => 1.0,
    x => Math.Log(x)); // -34.481, 9.316

In order to evaluate the resulting model at specific data points we can manually apply the values of p to the model function, or we can use an alternative function with the Func suffix that returns a function instead of the model parameters. The returned function can then be used to evaluate the parametrized model:

Func<double,double> f = Fit.LinearCombinationFunc(
    new[] {61.0, 62.0, 63.0, 65.0},
    new[] {3.6, 3.8, 4.8, 4.1},
    x => 1.0,
    x => Math.Log(x));
f(66.0); // 4.548

Linearizing non-linear models by transformation

Sometimes it is possible to transform a non-linear model into a linear one. For example, the following power function

\[z : (x, y) \mapsto u x^v y^w\]

can be transformed into the following linear model with \(\hat{z} = \ln z\) and \(t = \ln u\)

\[\hat{z} : (x, y) \mapsto t + v \ln x + w \ln y\]

var xy = new[] {new[] { 1.0, 4.0 }, new[] { 2.0, 5.0 }, new[] { 3.0, 2.0 }};
var z = new[] { 15.0, 20, 10 };

var z_hat = z.Select(r => Math.Log(r)).ToArray(); // transform z_hat = ln(z)
double[] p_hat = Fit.LinearMultiDim(xy, z_hat,
    d => 1.0,
    d => Math.Log(d[0]),
    d => Math.Log(d[1]));
double u = Math.Exp(p_hat[0]); // transform t = ln(u)
double v = p_hat[1];
double w = p_hat[2];

Weighted Regression

Sometimes the regression error can be reduced by dampening specific data points. We can achieve this by introducing a weight matrix \(W\) into the normal equations \(\mathbf{X}^T\mathbf{y} = \mathbf{X}^T\mathbf{X}\mathbf{p}\). Such weight matrices are often diagonal, with a separate weight for each data point on the diagonal.

\[\mathbf{X}^T\mathbf{W}\mathbf{y} = \mathbf{X}^T\mathbf{W}\mathbf{X}\mathbf{p}\]

var p = WeightedRegression.Weighted(X,y,W);

Weighter regression becomes interesting if we can adapt them to the point of interest and e.g. dampen all data points far away. Unfortunately this way the model parameters are dependent on the point of interest \(t\).

// warning: preliminary api
var p = WeightedRegression.Local(X,y,t,radius,kernel);

Regularization

Iterative Methods

val a : obj

val b : obj

Math.NET Numerics | Math.NET Project | GitHub