I started performing some very basic performance analysis of interesting parts of the Math.NET Iridium (Numerics) Library, motivated by a forum post.
I tested it on the following computers:
A: Desktop, Intel P4, 3.4 GHz, XP SP2
B: Notebook, Intel Core 2 Duo, 2 GHz, XP MCE SP2
C: Desktop, Intel Core 2 Duo, 2.?? GHz, XP SP2
On both machines there are lots of other processes running, hence I start the app with high priority.
Quite simple (for a start): I run different problem sets. For each set I first run it two times (warm up, to let the JIT compiler do its work), then start a timer/stopwatch, run it five times, stop the timer and divide the resulting time span through five, to get an average.
Case 1: Fourier Transform
I tested the forward fourier transform on a real sample set of doubles with seven different sample set lengths:
|1024||0 ms||0 ms||0 ms|
|4096||0 ms||0 ms||0 ms|
|16384||1 ms||1 ms||1 ms|
|65536||6 ms||5 ms||5 ms|
|262144||33 ms||28 ms||30 ms|
|1048576||503 ms||578 ms||507 ms|
|2097152||1339 ms||1375 ms||1032 ms|
Note that on B and C the algorithm only used one of the two cores. Looks like we need to tweak that a bit, since multi-core CPUs are standard these days.
Case 2: Solving a linear equation system
I tested solving a square real linear equation system. Internally the LU decomposition algorithm is used for such square systems. A Size of 1000x1000 means that we solve for 1000 unknowns:
|100||10000||2 ms||2 ms||2 ms|
|200||40000||18 ms||18 ms||17 ms|
|400||160000||129 ms||129 ms||121 ms|
|600||360000||415 ms||418 ms||396 ms|
|800||640000||1056 ms||968 ms||981 ms|
|1000||1000000||2041 ms||1911 ms||1901 ms|
Again only one of the two cores on machine B and C was used.
I think the performance is quite acceptable for now. Of course it's far away from high performance libraries like the Intel MKL (blas, lapack etc. which take advantage of all specialized features of Intel CPUs), but in the end we target another kind of developer/application anyway. If you really want to get all out of your machine, you'll hardly do that in a managed programming environment like .Net. What we offer instead is a very easy to use infrastructure that is completely managed (no unsafe wrapper), and that runs on any platform that runs .Net (including your PDA) and that is still fast enough for most cases.
After all, only 0.006 seconds for a Fourier transform of 65536 samples is not that bad, neither is 0.018 seconds to solve a linear equation system with 200 unknowns.
Update: Added Machine C. The differences between the measured values on A,B and C don't make much sense, I think I need a better testbed and a larger sample size. Or there are simply other factors that play an important role (e.g. memory alignment -> cache behavior).