What's New in Math.NET Numerics 2.5 | Christoph Rüegg

What's New in Math.NET Numerics 2.5

Math.NET Numerics v2.5, released in April 2013, is focused on statistics and linear algebra. As usual you'll find a full listing of all changes in the release notes. However, I'd like to take the chance to highlight some important changes, show some code samples and explain the reasoning behind the changes.

Please let me know if these "What's New" articles are useful in this format and whether I should continue to put the together for future releases.

Statistics

Order Statistics & Quantiles

Previously our order statistics and quantile functions were quite limited. With this release we finally have almost complete quantile support:

  • OrderStatistic
  • Median
  • LowerQuartile
  • UpperQuartile
  • InterquartileRange
  • FiveNumberSummary
  • Percentile
  • Quantile
  • QuantileCustom

All of them are implemented on top of the quantile function. We always default to approximately median-unbiased quantiles, usually denoted as type R-8, which do not assume samples to be normally distributed. If you need compatibility with another implementation, you can use QuantileCustom which accepts either a QuantileDefinition enum (we support all 9 R-types, SAS 1-5, Excel, Nist, Hydrology, etc.) or a 4-parameter definition as in Mathematica.

For the empirical inverse cummulative distribution, which is essentially an R1-type quantile, you can use the new Statistics.InverseCDF function.

More efficient ways to compute statistics

Previously there were two ways to estimate some statistics from a sample set: The Statistics class provided static extension methods to evaluate a single statistic from an enumerable, and DescriptiveStatistics to compute a whole set of standard statistics at once. This was unsatisfactory since it was not very efficient: the DescriptiveStatistics way actually required more than one pass internally (mostly because of the median) and it was not leveraging the fact that the sample set may already be sorted.

To fix the first issue, we've marked DescriptiveStatistics.Median as obsolete and will remove it in v3. Until then, the median computation is delayed until requested the first time. In normal cases where Median is not used it now only requires a single pass.

The second issue we attacked by introducing three new classes to compute a single statistic directly from the best fitting sample data format:

  • ArrayStatistics operates on arrays which are not assumed to be sorted.
  • SortedArrayStatistics operates on arrays which must be sorted in ascending order.
  • StreamingStatistics operates on a stream in a single pass, without keeping the full data in memory at any time. Can thus be used to stream over data larger than system memory.

ArrayStatistics implements Minimum, Maximum, Mean, Variance, StandardDeviation, PopulationVariance and PopulationStandardDeviation. In addition it implements all the order statistics/quantile functions mentioned above, but in an inplace way that reorders the data array (partial sorting) and because of that is marked with an Inplace-suffix to indicate the side effect. These inplace functions get slightly faster when calling them repeatedly, but will always be slower than the sorted array statistics. Nevertheless, since the sorting itself is quite expensive, all in all the (non-sorted) array statistics are still faster in practice if only few calls are needed.

Example: We want to compute the IQR of {3,1,2,4}.

1: 
2: 
var data = new double[] { 3.0, 1.0, 2.0, 4.0 };
ArrayStatistics.InterquartileRangeInplace(data); // iqr = 2.16666666666667

This is equivalent to executing IQR(c(3,1,2,4), type=8) in R, with quantile definition type R-8.

SortedArrayStatistics expects data to be sorted in ascending order and implements Minimum, Maximum, and all the order statistics/quantile functions mentioned above. It leverages the ordering for very fast (constant time) order statistics. There's also no need to reorder the data, so other than ArrayStatistics, this class never modifies the provided array and has no side effect. It does not re-implement any operations that cannot leverage the ordering, like Mean or Variance, so use the implementation from ArrayStatistics instead if needed.

1: 
2: 
var data = new[] { 1.0, 2.0, 3.0, 4.0 };
var iqr = SortedArrayStatistics.InterquartileRange(data); // iqr = 2.16666666666667

StreamingStatistics estimates statistics in a single pass without memorization and implements Minimum, Maximum, Mean, Variance, StandardDeviation, PopulationVariance and PopulationStandardDeviation. It does not implement any order statistics, since they require sorting and are thus not computable in a single pass without keeping the data in memory. No function of this class has any side effects on the data.

The Statistics class has been updated to leverage these new implementations internally, and implements all of the statistics mentioned above as extension methods on enumerables. No function of this class has any side effects on the data.

1: 
var iqr = new[] { 3.0, 1.0, 2.0, 4.0 }.InterquartileRange(); // iqr = 2.16666666666667

Note that this is generally slower than ArrayStatistics because it requires to copy the array to make sure there are no side effects, and much slower than SortedArrayStatistics which would have constant time (assuming we sorted it manually first).

Repeated Evaluation and Precomputed Functions

Most of the quantile functions accept a tau-parameter. Often you need to evaluate that function with not one but a whole range of values for that parameter, say for plotting. In such scenarios it is advantageous to first sort the data and then use the SortedArrayStatistics functions with constant time complexity. For convenience we also provide alternative implementations with the Func-suffix in the Statistics class that do exactly that: instead of accepting a tau-parameter themselves, they return a function that accepts tau:

1: 
2: 
3: 
var icdf = new[] { 3.0, 1.0, 2.0, 4.0 }.InverseCDFFunc();
var a = icdf(0.3); // 2
var b = new[] { 0.0, 0.1, 0.5, 0.9, 1.0 }.Select(icdf).ToArray(); // 1,1,2,4,4

Linear Algebra

There have been quite a few bug fixes and performance improvements around linear algebra. See the release notes for details.

Matrix and Vector String Formatting

Previously the ToString method used to render the whole matrix to a string. When working with very large data sets this can be an expensive operation both on CPU and memory usage. It also makes it a pain to work with interactive consoles or REPL environments like F# Interactive that write the ToString of the resulting object to the console output.

Starting from v2.5, ToString methods no longer render the whole structure to a string for large data. Instead, ToString now only renders an excerpt of the data, together with a line about dimension, type and in case of sparse data a sparseness indicator. The intention is to give a good idea about the data in a visually useful way:

1: 
DenseMatrix.CreateRandom(60,80,Normal.WithMeanVariance(2.0,1.0)).ToString();

generates the following multi-line string:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
DenseMatrix 60x80-Double
 1.68665      1.24322      3.36594      2.07444      4.13008 ...      3.01076
 1.10888       2.8856      2.31662      3.94124      3.56711 ...    -0.216804
0.843804      2.67243        1.097      2.34063     0.875953 ...        1.808
 3.87044      2.69509      2.79642    -0.354365      2.45302 ...      2.79665
 2.05722      3.39823      2.56256      1.88849      1.75259 ...       3.8987
  1.9874      2.97047      1.40584      1.97734      2.37733 ...        2.875
 2.06503      1.15681      3.85957      2.84836      1.25326 ...    -0.108938
     ...          ...          ...          ...          ... ...          ...
 4.39245      1.32734      2.83637      1.78257      2.44356 ...      2.58935

The output is not perfect yet, as we'd ideally align the decimal point and automatically choose the right width for each column. Hopefully we can fix that in a future version. How much data is shown by default can be adjusted in the Control class:

1: 
2: 
Control.MaxToStringColumns = 6;
Control.MaxToStringRows = 8;

Or you can use an override or alternative:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
Matrix.ToString(maxRows, maxColumns, formatProvider)

// Just the top info line
Matrix.ToTypeString()

// Just the matrix data without the top info line
Matrix.ToMatrixString(maxRows, maxColumns, formatProvider)
Matrix.ToMatrixString(maxRows, maxColumns, padding, format, formatProvider)

Note that ToString has never been intended to serialize a matrix to a string in order to parse it back later. Please use one of our data libraries instead, e.g. the MathNet.Numerics.Data.Text package.

Creating a Matrix or Vector

Constructing a matrix or vector has become more consistent: Except for obsolete members which will be removed in v3, all constructors now directly use the provided data structure by reference, without any copying. This means that there are only constructors left that accept the actual inner data structure format. Usually you'd use the new static functions instead, which always either create a copy or construct the inner data structure directly from the provided data, without keeping a reference to it.

Some examples below. The C# way usually works in F# just the same as well, but we provide more idiomatic alternatives for most of them:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
// Directly from an array in the internal column-major format (no copying)
C#: new DenseMatrix(2, 3, new[] { 1.0, 2.0, 10.0, 20.0, 100.0, 300.0 })
F#: DenseMatrix.raw 2 3 [| 1.0; 2.0; 10.0; 20.0; 100.0; 300.0 |]

// All-zero 3x4 (3 rows, 4 columns) matrix
C#: new DenseMatrix(3, 4)
F#: DenseMatrix(3, 4)
F#: DenseMatrix.zeroCreate 3 4

// 3x4, all cells the same fixed value
C#: DenseMatrix.Create(3, 4, (r, c) => 20.5)  // note: better way is planned
F#: DenseMatrix.create 3 4 20.5

// 3x4, random
C#: DenseMatrix.CreateRandom(3, 4, Normal.WithMeanVariance(2.0, 0.5));
F#: DenseMatrix.randomCreate 3 4 (Normal.WithMeanVariance(2.0, 0.5))

// 3x4, using an initializer function
C#: DenseMatrix.Create(3, 4, (r, c) => r/100.0 + j)
F#: DenseMatrix.init 3 4 (fun r c -> float r/100.0 + float c)

// From enumerables of enumerables (as rows or columns)
C#: DenseMatrix.OfColumns(2,3, new[] { new[] { 1.0, 4.0 }, new[] { 2.0, 5.0 }, new[] { 3.0, 6.0 } })
F#: DenseMatrix.ofRows 2 3 [{ 1.0 .. 3.0 }; { 4.0 .. 6.0 }]

// From F# lists of lists
F#: DenseMatrix.ofRowsList 2 3 [[1.0 .. 3.0]; [4.0 .. 6.0]]
F#: matrix [[1.0; 2.0; 3.0]
            [4.0; 5.0; 6.0]]

// By calling a function to construct each row (or analog for columns)
F#: DenseMatrix.initRow 10 10 (fun r -> vector [float r .. float (r+9)])

All of these work the same way also for sparse matrices, and similarly for vectors. Useful for sparse data is another way that accepts a list or sequence of indexed row column value tuples, where all other cells are assumed to be zero:

1: 
2: 
3: 
F#: SparseMatrix.ofListi 200 100 [(4,3,20.0); (18,9,3.0); (2,1,99.9)]
F#: DenseMatrix.ofSeqi 10 10 (seq {for i in 0..9 -> (i,i,float(i*i))})
C#: SparseMatrix.OfIndexed(10,10,Enumerable.Range(0,10).Select(i => Tuple.Create(i,i,(double)(i*i))))

Inplace Map

The F# modules have supported (outplace) combinators like map for quite some time. We've now implemented inplace map directly in the storage classes so it can operate efficiently also on sparse data, and is accessible from C# code without using the F# extensions. The F# map-related functions have been updated to leverage these routines:

1: 
2: 
3: 
4: 
matrix |> Matrix.map (fun x -> 2.0*x)
matrix |> Matrix.mapnz (fun x -> 2.0*x) // non-zero: may skip zero values, useful if sparse
matrix |> Matrix.mapi (fun i j x -> x + float i - float j) // indexed with row, column
matrix |> Matrix.mapinz (fun i j x -> x + float i - float j) // indexed, non-zero

Or the equivalent inplace versions which return unit:

1: 
2: 
3: 
4: 
matrix |> Matrix.mapInPlace (fun x -> 2.0*x)
matrix |> Matrix.mapnzInPlace (fun x -> 2.0*x)
matrix |> Matrix.mapiInPlace (fun i j x -> x + float i - float j)
matrix |> Matrix.mapinzInPlace (fun i j x -> x + float i - float j)

In C#:

1: 
2: 
matrix.MapInplace(x => 2*x);
matrix.MapIndexedInplace((i,j,x) => x+i-j, forceMapZeros:true);

F# Slice Setters

Speaking about F#, we've supported the slice getter syntax for a while as a nice way to get a sub-matrix. For example, to get the bottom right 2x2 sub-matrix of m we can do:

1: 
2: 
3: 
let m = DenseMatrix.init 3 4 (fun i j -> float (10 * i + j))
m |> printfn "%A"
m.[1..2,2..3] |> printfn "%A"

The same syntax now also works for setters, e.g. to overwrite the very same bottom right corner we can write:

1: 
2: 
3: 
m.[1..2,2..3] <- matrix [[0.1; 0.2]
                         [0.3; 0.4]]
printfn "%A" m

The 3 printfn statements generate the following output:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
DenseMatrix 3x4-Double
           0            1            2            3
          10           11           12           13
          20           21           22           23
DenseMatrix 2x2-Double
          12           13
          22           23
DenseMatrix 3x4-Double
           0            1            2            3
          10           11          0.1          0.2
          20           21          0.3          0.4
Multiple items
val double : value:'T -> double (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.double

--------------------
type double = System.Double

Full name: Microsoft.FSharp.Core.double
namespace Microsoft.FSharp.Control
Multiple items
val float : value:'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

--------------------
type float = System.Double

Full name: Microsoft.FSharp.Core.float

--------------------
type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>
Multiple items
val seq : sequence:seq<'T> -> seq<'T>

Full name: Microsoft.FSharp.Core.Operators.seq

--------------------
type seq<'T> = System.Collections.Generic.IEnumerable<'T>

Full name: Microsoft.FSharp.Collections.seq<_>
val printfn : format:Printf.TextWriterFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn