We need to reference Math.NET Numerics and open the statistics namespace:

The primary class for statistical analysis is Statistics
which provides common
descriptive statics as static extension methods to IEnumerable<double>
sequences.
However, various statistics can be computed much more efficiently if the data source
has known properties or structure, that's why the following classes provide specialized
static implementations:
Inplace
suffix, indicating that they reorder the
input array slightly towards being sorted during execution  without fully sorting
them, which could be expensive.
Another alternative, in case you need to gather a whole set of statistical characteristics
in one pass, is provided by the DescriptiveStatistics
class:

The minimum and maximum values of a sample set can be evaluated with the Minimum
and Maximum
functions of all four classes: Statistics
, ArrayStatistics
, SortedArrayStatistics
and StreamingStatistics
. The one in SortedArrayStatistics
is the fastest with constant
time complexity, but expects the array to be sorted ascendingly.
Both min and max are directly affected by outliers and are therefore no robust statistics at all. For a more robust alternative, consider using Quantiles instead.

The arithmetic mean or average of the provided samples. In statistics, the sample mean is a measure of the central tendency and estimates the expected value of the distribution. The mean is affected by outliers, so if you need a more robust estimate consider to use the Median instead.
Statistics.Mean(data)
StreamingStatistics.Mean(stream)
ArrayStatistics.Mean(data)
\[\overline{x} = \frac{1}{N}\sum_{i=1}^N x_i\]
let whiteNoise = Generate.Normal(1000, mean=10.0, standardDeviation=2.0)
val samples : float [] = [12.90021939; 9.631515037; 7.810008046; 14.13301053; ...]
Statistics.Mean whiteNoise
val it : float = 10.02162347
let wave = Generate.Sinusoidal(1000, samplingRate=100., frequency=5., amplitude=0.5)
Statistics.Mean wave
val it : float = 4.133520783e17
Variance \(\sigma^2\) and the Standard Deviation \(\sigma\) are measures of how far the samples are spread out.
If the whole population is available, the functions with the Populationprefix will evaluate the respective measures with an \(N\) normalizer for a population of size \(N\).
Statistics.PopulationVariance(population)
Statistics.PopulationStandardDeviation(population)
\[\sigma^2 = \frac{1}{N}\sum_{i=1}^N (x_i  \mu)^2\]
On the other hand, if only a sample of the full population is available, the functions without the Populationprefix will estimate unbiased population measures by applying Bessel's correction with an \(N1\) normalizer to a sample set of size \(N\).
Statistics.Variance(samples)
Statistics.StandardDeviation(samples)
\[s^2 = \frac{1}{N1}\sum_{i=1}^N (x_i  \overline{x})^2\]
Statistics.Variance whiteNoise
val it : float = 3.819436094
Statistics.StandardDeviation whiteNoise
val it : float = 1.954337764
Statistics.Variance wave
val it : float = 0.1251251251
Since mean and variance are often needed together, there are routines that evaluate both in a single pass:
Statistics.MeanVariance(samples)
ArrayStatistics.MeanVariance(samples)
StreamingStatistics.MeanVariance(samples)
Statistics.MeanVariance whiteNoise
val it : float * float = (10.02162347, 3.819436094)
The sample covariance is an estimation of the Covariance, a measure of how much two random variables change together. Similarly to the variance above, there are two versions in order to apply Bessel's correction to bias in case of sample data.
Statistics.Covariance(samples1, samples2)
\[q = \frac{1}{N1}\sum_{i=1}^N (x_i  \overline{x})(y_i  \overline{y})\]
Statistics.PopulationCovariance(population1, population2)
\[q = \frac{1}{N}\sum_{i=1}^N (x_i  \mu_x)(y_i  \mu_y)\]
Statistics.Covariance(whiteNoise, whiteNoise)
val it : float = 3.819436094
Statistics.Covariance(whiteNoise, wave)
val it : float = 0.04397985084
The kth order statistic of a sample set is the kth smallest value. Note that, as an exception to most of Math.NET Numerics, the order k is onebased, meaning the smallest value is the order statistic of order 1 (there is no order 0).
Statistics.OrderStatistic(data, order)
SortedArrayStatistics.OrderStatistic(data, order)
If the samples are sorted ascendingly, this is trivial and can be evaluated in constant time,
which is what the SortedArrayStatistics
implementation does.
If you have the samples in an array which is not (guaranteed to be) sorted, but if it is fine if the array does incrementally get sorted over multiple calls, you can also use the following inplace implementation. It is usually faster than fully sorting the array, unless you need to compute it for more than a handful orders.
ArrayStatistics.OrderStatisticInplace(data, order)
For convenience there's also an option that returns a function Func<int, double>
,
mapping from order to the resulting order statistic. Internally it sorts a copy of the
provided data and then on each invocation uses efficient sorted algorithms:
Statistics.OrderStatisticFunc(data)
Such Inplace and Func variants are a common pattern throughout the Statistics class and also the rest of the library.
Statistics.OrderStatistic(whiteNoise, 1)
val it : float = 3.633070184
Statistics.OrderStatistic(whiteNoise, 1000)
val it : float = 16.65183566
let os = Statistics.orderStatisticFunc whiteNoise
os 250
val it : float = 8.645491746
os 500
val it : float = 10.11872428
os 750
val it : float = 11.33170746
Median is a robust indicator of central tendency and much less affected by outliers than the sample mean. The median is estimated by the value exactly in the middle of the sorted set of samples and thus separating the higher half of the data from the lower half.
Statistics.Median(data)
SortedArrayStatistics.Median(data)
ArrayStatistics.MedianInplace(data)
The median is only unique if the sample size is odd. This implementation internally
uses the default quantile definition, which is equivalent to mode 8 in R and is approximately
medianunbiased regardless of the sample distribution. If you need another convention, use
QuantileCustom
instead, see below for details.
Statistics.Median whiteNoise
val it : float = 10.11872428
Statistics.Median wave
val it : float = 2.452600839e16
Quartiles group the ascendingly sorted data into four equal groups, where each group represents a quarter of the data. The lower quartile is estimated by the middle number between the first two groups and the upper quartile by the middle number between the remaining two groups. The middle number between the two middle groups estimates the median as discussed above.
Statistics.LowerQuartile(data)
Statistics.UpperQuartile(data)
SortedArrayStatistics.LowerQuartile(data)
SortedArrayStatistics.UpperQuartile(data)
ArrayStatistics.LowerQuartileInplace(data)
ArrayStatistics.UpperQuartileInplace(data)
Statistics.LowerQuartile whiteNoise
val it : float = 8.645491746
Statistics.UpperQuartile whiteNoise
val it : float = 11.33213732
Using that data we can provide a useful set of indicators usually named 5number summary, which consists of the minimum value, the lower quartile, the median, the upper quartile and the maximum value. All these values can be visualized in the popular box plot diagrams.
Statistics.FiveNumberSummary(data)
SortedArrayStatistics.FiveNumberSummary(data)
ArrayStatistics.FiveNumberSummaryInplace(data)
Statistics.FiveNumberSummary whiteNoise
val it : float [] = [3.633070184; 8.645937823; 10.12165054; 11.33213732; 16.65183566]
Statistics.FiveNumberSummary wave
val it : float [] = [0.5; 0.3584185509; 2.452600839e16; 0.3584185509; 0.5]
The difference between the upper and the lower quartile is called interquartile range (IQR) and is a robust indicator of spread. In box plots the IQR is the total height of the box.
Statistics.InterquartileRange(data)
SortedArrayStatistics.InterquartileRange(data)
ArrayStatistics.InterquartileRangeInplace(data)
Just like median, quartiles use the default R8 quantile definition internally.
Statistics.InterquartileRange whiteNoise
val it : float = 2.686199498
Percentiles extend the concept further by grouping the sorted values into 100 equal groups and looking at the 101 places (0,1,..,100) between and around them. The 0percentile represents the minimum value, 25 the first quartile, 50 the median, 75 the upper quartile and 100 the maximum value.
Statistics.Percentile(data, p)
Statistics.PercentileFunc(data)
SortedArrayStatistics.Percentile(data, p)
ArrayStatistics.PercentileInplace(data, p)
Just like median, percentiles use the default R8 quantile definition internally.
Statistics.Percentile(whiteNoise, 5)
val it : float = 6.693373507
Statistics.Percentile(whiteNoise, 98)
val it : float = 13.97580653
Instead of grouping into 4 or 100 boxes, quantiles generalize the concept to an infinite number of boxes and thus to arbitrary real numbers \(\tau\) between 0.0 and 1.0, where 0.0 represents the minimum value, 0.5 the median and 1.0 the maximum value. Quantiles are closely related to the inverse cumulative distribution function of the sample distribution.
Statistics.Quantile(data, tau)
Statistics.QuantileFunc(data)
SortedArrayStatistics.Quantile(data, tau)
ArrayStatistics.QuantileInplace(data, tau)
Statistics.Quantile(whiteNoise, 0.98)
val it : float = 13.97580653
Remember that all these descriptive statistics do not compute but merely estimate statistical indicators of the value distribution. In the case of quantiles, there is usually not a single number between the two groups specified by \(\tau\). There are multiple ways to deal with this: the R project supports 9 modes and Mathematica and SciPy have their own way to parametrize the behavior.
The QuantileCustom
functions support all 9 modes from the Rproject, which includes the one
used by Microsoft Excel, and also the 4parameter variant of Mathematica:
Statistics.QuantileCustom(data, tau, definition)
Statistics.QuantileCustomFunc(data, definition)
SortedArrayStatistics.QuantileCustom(data, tau, a, b, c, d)
SortedArrayStatistics.QuantileCustom(data, tau, definition)
ArrayStatistics.QuantileCustomInplace(data, tau, a, b, c, d)
ArrayStatistics.QuantileCustomInplace(data, tau, definition)
The QuantileDefinition
enumeration has the following options:
Rank statistics are the counterpart to order statistics. The Ranks
function evaluates the rank
of each sample and returns them as an array of doubles. The return type is double instead of int
in order to deal with ties, if one of the values appears multiple times.
Similar to QuantileDefinition
, the RankDefinition
enumeration controls how ties should be handled:
Statistics.Ranks(data, definition)
SortedArrayStatistics.Ranks(data, definition)
ArrayStatistics.RanksInplace(data, definition)
Statistics.Ranks(whiteNoise)
val it : float [] = [634.0; 736.0; 405.0; 395.0; 197.0; 167.0; 722.0; 44.0; ...]
Statistics.Ranks([ 13.0; 14.0; 11.0; 12.0; 13.0 ], RankDefinition.Average)
val it : float [] = [3.5; 5.0; 1.0; 2.0; 3.5]
Statistics.Ranks([ 13.0; 14.0; 11.0; 12.0; 13.0 ], RankDefinition.Sports)
val it : float [] = [3.0; 5.0; 1.0; 2.0; 3.0]
Counterpart of the Quantile
function, estimates \(\tau\) of the provided \(\tau\)quantile value
\(x\) from the provided samples. The \(\tau\)quantile is the data value where the cumulative distribution
function crosses \(\tau\).
Statistics.QuantileRank(data, x, definition)
Statistics.QuantileRankFunc(data, definition)
SortedArrayStatistics.QuantileRank(data, x, definition)
Statistics.QuantileRank(whiteNoise, 13.0)
val it : float = 0.9370045563
Statistics.QuantileRank(whiteNoise, 6.7, RankDefinition.Average)
val it : float = 0.04960610389
Statistics.EmpiricalCDF(data, x)
Statistics.EmpiricalCDFFunc(data)
Statistics.EmpiricalInvCDF(data, tau)
Statistics.EmpiricalInvCDFFunc(data)
SortedArrayStatistics.EmpiricalCDF(data, x)
let ecdf = Statistics.EmpiricalCDFFunc whiteNoise
Generate.LinearSpacedMap(20, start=3.0, stop=17.0, map=ecdf)
val it : float [] =
[0.0; 0.001; 0.002; 0.005; 0.022; 0.05; 0.094; 0.172; 0.278; 0.423; 0.555;
0.705; 0.843; 0.921; 0.944; 0.983; 0.992; 0.997; 0.999; 1.0]
let eicdf = Statistics.empiricalInvCDFFunc whiteNoise
[ for tau in 0.0..0.05..1.0 > eicdf tau ]
val it : float [] =
[3.633070184; 6.682142043; 7.520000817; 8.040513497; 8.347587493;
8.645491746; 9.02681611; 9.298987151; 9.522627142; 9.819352699; 10.11872428;
10.35991046; 10.57530906; 10.8259542; 11.08605473; 11.33170746; 11.54356436;
11.90973541; 12.4294346; 13.36889423; 16.65183566]
A histogram can be computed using the Histogram class. Its constructor takes the samples enumerable, the number of buckets to create, plus optionally the range (minimum, maximum) of the sample data if available.

The Correlation
class supports computing Pearson's productmomentum and Spearman's ranked
correlation coefficient, as well as their correlation matrix for a set of vectors.
Code Sample: Computing the correlation coefficient of 1000 samples of f(x) = 2x and g(x) = x^2:
