Christoph Rüegg

Math.NET, distributed computing and how an electrical engineer sees the world of complex software

Math.NET Numerics With Native Linear Algebra

Linear algebra is one of those areas where performance can be essential, but also one where native optimizations can make a huge difference. That’s why in Math.NET Numerics we implemented linear algebra on top of a provider abstraction where providers can be exchanged.

Out of the box Math.NET Numerics only includes a fully managed provider which is supported on all platforms, but unfortunately is also rather slow. This doesn’t matter much for most problems, but if you’re working with very large dense matrices it can be a deal breaker. That’s why we’ve added some helper projects you can use to compile your own native provider, but that is still quite involved and requires some experience around C or C++. Not any more, kudos to @marcuscuda!

Since Math.NET Numerics v2.4 we begin to distribute native providers as NuGet packages, starting with one based on Intel MKL. Enabling native algorithms becomes almost as simple as adding a NuGet package to your project.

Git Howto: Mirror a GitHub Repo Without Pull Refs

GitHub recently started publishing all pull request as special git refs. This is awesome, since it makes it trivial to checkout out and work with them from your local repository, without having to add the submitter’s repo as a remote all the time. It is also nicely done in that it does not affect normal clones in any way - unless you actually want to fetch them.

However, there is one case where it may have an undesired side effect: mirrors. For example, I routinely mirror the Math.NET Numerics mainline repository to a couple other places, including Codeplex, Gitorious and Google Code. I want a mirror to exactly mirror the source repository, adding all new branches and tags automatically, but also remove those that have been deleted in the source. Git has excellent support for such exact mirroring. Unfortunately this mirroring mechanism includes all the pull refs as well, which may not be what you want. In Math.NET Numerics, some pull request actually base on an old (long removed) branch that included some corrupt objects. So in this case, including them in the mirror not only doubles the repository size, it also causes a corrupt git file system.

Luckily there is an easy way to skip them in the mirror, but to do that we must understand how git refs actually work:

Linear Regression With Math.NET Numerics

Likely the most requested feature for Math.NET Numerics is support for some form of regression, or fitting data to a curve. I’ll show in this article how you can easily compute regressions manually using Math.NET, until we support it out of the box. We already have broad interpolation support, but interpolation is about fitting some curve exactly through a given set of data points and therefore an entirely different problem.

For a regression there are usually much more data points available than curve parameters, so we want to find the parameters that produce the lowest errors on the provided data points, according to some error metric.

Lokad.Cloud Architecture Refresh

In a recent post about new deployment and versioning approaches in Lokad.Cloud I mentioned that I’m also heavily refactoring the old cloud service framework and runtime. That refactoring was long due but also required to support these new approaches effectively.

In essence, developing Cloud Services still works as before. There is a framework library (Lokad.Cloud.Services.Framework) that provides base classes for a small set of service types that you can derive from. The following figure shows the dependencies of all involved components:

Cleaning Up After Migrating From Hg to Git

There is a lot of guidance out there on how to migrate from Mercurial to Git, but they often leave you with a repository in a bad state. Even more so if it originally was a subversion repository, then migrated to Mercurial and now finally to Git.

The Lokad.Cloud repository was such a case. The committers and authors in the commit history were a complete mess, but that’s not that much of an issue in practice. Worse is the fact that most text files were stored with CLRF line endings instead of LF internally. Git supports platform-native checkouts (CRLF on Windows, LF on Linux) quite nicely, but it only works well if text files are normalized to LF internally when committed. I strongly recommend doing that, as it will save you from a lot of trouble later on. Luckily it is also the default behavior for new repositories.

Lokad.Cloud Application Deployment and Versioning Refresh

Disclaimer: I’m a major contributor to the Lokad.Cloud opensource project. Lokad.Cloud is a framework for distributed computing in Windows Azure, plus a set of independent toolkits like Lokad.Cloud.Storage for simpler and more reliable cloud storage access and Lokad.Cloud.Provisioning for dynamic worker auto-scaling. We use Lokad.Cloud at Lokad to deal with our massive and rapidly changing computation demands.

At the very beginning of the Lokad.Cloud project we decided to not rely on the Windows Azure management tools to deploy new versions of our application. Instead we implemented a dynamic worker role - initially deployed once using the Windows Azure tools - that provides a runtime environment that can load and unload applications on demand, without even recycling the azure virtual machine. The applications are isolated in a separate AppDomain so we can unload them safely, plus for sandboxing.

Content-Based Storage in the Cloud

One derivative of the NoSQL movement that rediscovers non-relational storage approaches lately is a content-based value store. Such a store is similar to a Key-Value store but uses a cryptographic hash of the value as key.

An SHA-1 hash of the value is good enough to identify it

The SHA-1 hash function is unique, meaning that for every value there’s exactly one key that can be computed using SHA-1, hence value implies key. We can always compute the unique key of a value.

How to Create 2048bit Certificate CSRs for Dell’s iDRAC6

In case you happen to manage a recent DELL server with a dedicated iDRAC remote management card and you’d like to secure it by using your own certificate, you’ll have to request a certificate based on a CSR request created directly in the iDRAC web interface.

Unfortunately these CSRs have only 1024 bit keys, which get refused by some public certificate authorities like StartCom (for security reasons they require at least 2048 bits). You can’t choose the bit length in the iDRAC web interface, but luckily there is another way to make it generate 2048 or 4096 bit long keys for the CSR using racadm from Dell’s System Management Tools:

Git Howto: Revert a Commit Already Pushed to a Remote Repository

So you’ve just pushed your local branch to a remote branch, but then realized that one of the commits should not be there, or that there was some unacceptable typo in it. No problem, you can fix it. But you should do it rather fast before anyone fetches the bad commits, or you won’t be very popular with them for a while ;)

First two alternatives that will keep the history intact:

Alternative: Correct the mistake in a new commit

Simply remove or fix the bad file in a new commit and push it to the remote repository. This is the most natural way to fix an error, always safe and totally non-destructive, and how you should do it 99% of the time. The bad commit remains there and accessible, but this is usually not a big deal, unless the file contains sensitive information.

Alternative: Revert the full commit

Sometimes you may want to undo a whole commit with all changes. Instead of going through all the changes manually, you can simply tell git to revert a commit, which does not even have to be the last one. Reverting a commit means to create a new commit that undoes all changes that were made in the bad commit. Just like above, the bad commit remains there, but it no longer affects the the current master and any future commits on top of it.

1
$ git revert dd61ab32

About History Rewriting

People generally avoid history rewiriting, for a good reason: it will fundamentally diverge your repository from anyone who cloned or forked it. People cannot just pull your rewritten history as usual. If they have local changes, they have to do some work to get in sync again; work which requires a bit more knowledge on how Git works to do it properly.

However, sometimes you do want to rewrite the history. Be it because of leaked sensitive information, to get rid of some very large files that should not have been there in the first place, or just because you want a clean history (I certainly do).

Lost in Math.NET Codenames?

Math.NET Numerics? Iridium? dnAnalytics? Yttrium? Huh? …sounds familiar?

It looks like some of you got lost in all the Math.NET subprojects and codenames. Math.NET evolved over time, with projects splitting into separate new projects, the introduction of codenames and new projects replacing older ones with a slightly different focus and approach. Unfortunately this lead to a mess (sorry for that!), so I’m trying to throw light on it by the following small chart, depicting the Math.NET Project history: