Iridium Performance Analysis

I started performing some very basic performance analysis of interesting parts of the Math.NET Iridium (Numerics) Library, motivated by a forum post.

Machine Configuration

I tested it on the following computers:

A: Desktop, Intel P4, 3.4 GHz, XP SP2

B: Notebook, Intel Core 2 Duo, 2 GHz, XP MCE SP2
C: Desktop, Intel Core 2 Duo, 2.?? GHz, XP SP2

On both machines there are lots of other processes running, hence I start the app with high priority.

Test Strategy

Quite simple (for a start): I run different problem sets. For each set I first run it two times (warm up, to let the JIT compiler do its work), then start a timer/stopwatch, run it five times, stop the timer and divide the resulting time span through five, to get an average.

Case 1: Fourier Transform

I tested the forward fourier transform on a real sample set of doubles with seven different sample set lengths:

Set Length:A:B:C:
10240 ms0 ms0 ms
40960 ms0 ms0 ms
163841 ms1 ms1 ms
655366 ms5 ms5 ms
26214433 ms28 ms30 ms
1048576503 ms578 ms507 ms
20971521339 ms1375 ms1032 ms

Note that on B and C the algorithm only used one of the two cores. Looks like we need to tweak that a bit, since multi-core CPUs are standard these days.

Case 2: Solving a linear equation system

I tested solving a square real linear equation system. Internally the LU decomposition algorithm is used for such square systems. A Size of 1000x1000 means that we solve for 1000 unknowns:

Unknowns:Matrix Elements:A:B:C:
100100002 ms2 ms2 ms
2004000018 ms18 ms17 ms
400160000129 ms129 ms121 ms
600360000415 ms418 ms396 ms
8006400001056 ms968 ms 981 ms
100010000002041 ms1911 ms1901 ms

Again only one of the two cores on machine B and C was used.

Conclusion

I think the performance is quite acceptable for now. Of course it's far away from high performance libraries like the Intel MKL (blas, lapack etc. which take advantage of all specialized features of Intel CPUs), but in the end we target another kind of developer/application anyway. If you really want to get all out of your machine, you'll hardly do that in a managed programming environment like .Net. What we offer instead is a very easy to use infrastructure that is completely managed (no unsafe wrapper), and that runs on any platform that runs .Net (including your PDA) and that is still fast enough for most cases.

After all, only 0.006 seconds for a Fourier transform of 65536 samples is not that bad, neither is 0.018 seconds to solve a linear equation system with 200 unknowns.

Update: Added Machine C. The differences between the measured values on A,B and C don't make much sense, I think I need a better testbed and a larger sample size. Or there are simply other factors that play an important role (e.g. memory alignment -> cache behavior).