Towards Math.NET Numerics Version 3 | Christoph Rüegg

Towards Math.NET Numerics Version 3

Math.NET Numerics is well on its way towards the next major release, v3.0. A first preview alpha has already been pushed to the NuGet gallery, even though there's still a lot to do. If you'd like to understand a bit better where we currently are, where we're heading to, and why, then read on.

Why a new major release?

We apply the principles of semantic versioning, meaning that we are not supposed to break any parts of the public surface of the library, which is almost everything in our case, during minor releases (with the 3-part version format major.minor.patch). This makes sure you can easily upgrade within minor releases without second thoughts or breaking any of your code.

Nevertheless, sometimes there really is a good reason to change the design, because it is way to complicated to use, inconsistent, leads to bad performance or was just not very well thought out. Or we simply learned how to do it in a much better way. You may have noticed that some members have been declared as obsolete over the last couple minor releases, with suggestions how to do it instead, even though the old implementation was kept intact. Over the time all that old code became a pain to maintain, and using the library was much more complicated than needed. So I decided it's time to finally fix most of these issues and clean up.

We do move some cheese around in this release. Your code will break in a few occasions. But in all cases a fix should be easy if not trivial. Also, once there we will again be bound by semantic versioning to keep the library stable over all future minor releases and thus likely for years to come. Also, we may keep providing patches for the old v2 branch if needed for a while. Nevertheless, I strongly recommend to upgrade to v3 once available.

Feedback is welcome

A first preview (v3.0.0-alpha1) has already been published to NuGet and I plan to do at least two more preview releases before we reach the first v3.0 release. Please do have a look at it and give feedback - now is a unique possibility for breaking changes.

Overview on what has been done so far

  • Namespace simplifications.
  • More functional design where appropriate. Make sure everything works fine and feels native in both C# and F#.
  • Use common short names if well known instead of very long full names (trigonometry).
  • Linear Algebra: Using the generic types is the recommended way now; make sure it works well. The IO classes for matrix/vector serialization become separate packages. Major refactoring of the iterative solvers. Filled some missing pieces, various simplifications, lots of other changes.
  • Distributions: Major cleanup. Direct exposure of distributions functions (pdf, cdf, etc). Parameter Estimation.
  • New distance functions

Overview on what is planned to do

  • Iterative solvers need more work. I'd also like to design them such that they can be iterated manually, in a simple way.
  • Integral transformations (FFT etc) need major refactoring. Backed by native provider if possible.
  • Consider to bring back filtering (FIR, IIR, moving average, etc.)
  • The current QR-decomposition-based curve fitting is inefficient for large data sets, but fixing it is actually not very complicated.
  • Investigate and fix an inconsistency in the Precision class.
  • Drop redundant null-checks

Details on what's new in version 3 so far

Dropping .Algorithms Namespaces

Did you ever have to open 10 different Math.NET Numerics namespaces to get all you need? This should get somewhat better in v3, as the static facades like Integrate, Interpolate, Fit or FindRoots for simple cases have been moved directly to the root namespace MathNet.Numerics and all the algorithms namespaces (for advanced uses) of the form MathNet.Numerics.X.Algorithms are now simply MathNet.Numerics.X.

Interpolation

In addition to the simplified namespaces, the last Differentiate overload that returns all the interpolated value and the first and second derivative at some point x has been simplified: instead of two out-parameters in an unexpected order it now returns a tuple with reasonable ordering.

Integration

The design of the double-exponential transformation was rather weird. It has been simplified to a static class and is much simpler to use explicitly.

Probability Distributions

Although it was always possible to assign a custom random source (RNG) to a distribution for random number sampling, it was somewhat complicated and required two steps. Now all distribution constructors have an overload accepting a custom random source directly at construction, in a single step.

A few distributions now support maximum-likelihood parameter estimation and most distributions implement an inverse cumulative distribution function. Distribution functions like PDF, CDF and InvCDF are now exposed directly as static functions.

The inline documentation and parameter naming has been improved significantly. ChiSquare became ChiSquared, and the IDistribution interface became IUnivariateDistribution. Simpler more composeable random sampling in F# with new Sample module.

New Distance functions

Standard routines for evaluating the Euclidean, Manhattan and Chebychev distances between arrays or vectors, also for the common Sum of Absolute Difference (SAD), Mean-Absolute Error (MAE), Sum of Squared Difference (SSD) and Mean-Squared Error (MSE) metrics. Hamming distance. Leveraging providers where appropriate.

Less null checks and ArgumentNullExceptions

Likely as a side effect from my exposure to functional programming over the last year, I no longer follow the arguments why in C# every routine must explicitly check all arguments for null. I've already dropped a few of these checks, but there are still more than 2000 places where Math.NET Numerics throws an ArgumentNullException. Most of these will likely be gone. There is one case where it does make sense to keep them though: when a routine accepts an argument but does not use it immediately (and therefore does not cause an immediate NullReferenceException), a null reference sneaking in could be hard to debug, so we'll keep the check. But such cases are quite rare given the nature of the library.

IO Library

The IO library that used to be distributed as part of the core package is now a set of separate NuGet packages, e.g. MathNet.Numerics.Data.Text, and lives in a separate repository.

Favoring generic linear algebra types

Since the generic namespace was required all the time anyway and the recommended happy path is now to always use the generic types, everything from the .Generic namespace has been moved one namespace up. From now on you usually only need to open two namespaces when working with linear algebra, even if factorizations are needed. For example, when using the double type, you'd open MathNet.Numerics.LinearAlgebra and MathNet.Numerics.LinearAlgebra.Double.

Since typing is stronger in F#, all the init/create functions in the F# module now directly return generic types so you don't have to upcast manually all the time. Most routines have been generalized to work on generic types.

For cases where you want to implement generic algorithms but also need to create new dense or sparse matrices or vectors a new generic builder has been added. This should rarely be needed in user code though.

Missing scalar-matrix routines

A few missing scalar-matrix routines like adding or subtracting a scalar to a matrix or dividing a scalar by a matrix have been added, backed by providers where possible. There's now also a modulus routine.

Point-wise infix operators where supported (F#)

We've added point-wise .*, ./ and .% operators to matrices and vectors in the core library. This is not supported in all .Net languages yet, but works fine in F# even though without currying support. Of course in the other languages you can continue to use the normal methods as before.

Factorization and Iterative Solvers

Previously matrix factorization was only accessible by extension methods or explicit creation, which did not work very well when using generic types. The generic matrix type now provides methods to create them directly. As such, the actual implementations have been internalized as there is no longer any need for direct access.

The QR factorization is now thin by default, and factorizations no longer clone their results for no practical reason.

The iterative solver design has been significantly simplified and is now generic and shared where possible and accepts generic types everywhere. The namespaces are now much more flat as the very detailed structure did not add any value but meant you had to open a dozen namespaces.

Misc linear algebra improvements

  • Vectors now have a ConjugateDotProduct routine in addition to DotProduct.
  • Vectors now explicitly provide proper L1, L2 and infinity norms
  • Matrices/Vectors now have consistent enumerators, with a variant that skips zeros (useful if sparse).
  • Matrix/Vector creation routines have been simplified and usually no longer require explicit dimensions. New variants to create diagonal matrices, or such where all fields have the same value.
  • Matrices/Vectors expose whether storage is dense with a new IsDense property.
  • Providers have been moved to a Providers namespace and are fully generic again.

Misc

  • More robust complex Asin/Acos for large real numbers.
  • Trig functions: common short names instead of very long names.
  • Complex: common short names for Exp, Ln, Log10, Log.
  • Statistics: new single-pass MeanVariance method (as used often together).