Likely the most common file format for tabular data, delimited files like CSV store data as text with one line per row and values within rows separated by a comma. Such text files are supported by virtually all software that deals with tabular data.
Example:
|
Unfortunately there is no universal standard on what character is used as separator and how individual values are formatted and escaped. CSV files traditionally use a comma as separator, but this causes problems e.g. in Germany where the comma is used as decimal point in numbers. The tabulator proves to be a useful alternative, usually denoted by using the TSV extension instead of CSV. Other separators like semicolons or colons are common as well.
Math.NET Numerics provides basic support for delimited files with the MathNet.Numerics.Data.Text package, which is available on NuGet as separate package and not included in the basic distribution.
The DelimitedReader
class provides static functions to read a matrix from a file or string in delimited form.
It can read from:
All these functions expect the data type of the matrix to be generated as generic type argument. Only Double, Single, Complex and Complex32 are supported.
Example:
|
Unfortunately the lack of standard means that the parsing logic needs to be parametrized accordingly. There are ways to automatically profile the provided file to find out the correct parameters automatically, but for simplicity the Read functions expects those parameters explicitly as optional arguments:
\s
(white space).
The dual to the reader above is the DelimitedWriter
class that can serialize a matrix
to a delimited text file, stream or TextWriter.
The static Write functions accept the following optional arguments to control the output format:
\t
(tabulator).
Example:
|
The data extension packages also offer other ways to serialize a matrix to a binary stream or file. Among others: