<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Christoph Rüegg]]></title>
  <link href="http://christoph.ruegg.name/blog/atom.xml" rel="self"/>
  <link href="http://christoph.ruegg.name/"/>
  <updated>2012-01-13T14:42:52+01:00</updated>
  <id>http://christoph.ruegg.name/</id>
  <author>
    <name><![CDATA[Christoph Rüegg]]></name>
    
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[Lokad.Cloud Architecture Refresh]]></title>
    <link href="http://christoph.ruegg.name/blog/2011/8/4/lokadcloud-architecture-refresh.html"/>
    <updated>2011-08-04T13:29:00+02:00</updated>
    <id>http://christoph.ruegg.name/blog/2011/8/4/lokadcloud-architecture-refresh</id>
    <category term="Azure" />
    <category term="Cloud Computing" />
    <category term="Lokad" />
    <category term="Lokad.AppHost" />
    <category term="Lokad.Cloud" />
    
    <content type="html"><![CDATA[<p>In a recent post about <a href="http://christoph.ruegg.name/blog/2011/7/15/lokadcloud-application-deployment-and-versioning-refresh.html">new deployment and versioning approaches</a> in <a href="http://lokad.github.com/lokad-cloud/">Lokad.Cloud</a> I mentioned that I&#8217;m also heavily refactoring the old cloud service framework and runtime. That refactoring was long due but also required to support these new approaches effectively.</p>

<p>In essence, developing Cloud Services still works as before. There is a framework library (Lokad.Cloud.Services.Framework) that provides base classes for a small set of service types that you can derive from. The following figure shows the dependencies of all involved components:</p>

<p><img class="purecenterimage" src="http://christoph.ruegg.name/images/lokadcloud-architecture-refresh/A.png"></p>

<!--more-->


<p>Previously the framework also contained the complete runtime with AppDomain isolation and all. This is no longer the case (since we want to use the new deployment approach). Instead, the framework now comes with service runners, which are lightweight classes that take already created service instances plus their settings and can be used to run cloud services directly in the current thread, without any isolation. This comes handy for easier debugging and testing. In simple scenarios it might even be good enough for production or integration into another system.</p>

<p><img class="purecenterimage" src="http://christoph.ruegg.name/images/lokadcloud-architecture-refresh/B.png"></p>

<p>However, in production scenarios you often do want proper isolation, more robustness and some deployment story. That&#8217;s where the new AppHost comes in.</p>

<h2>Introducing the Lokad.Cloud AppHost</h2>

<p>The <a href="http://christoph.ruegg.name/blog/2011/7/15/lokadcloud-application-deployment-and-versioning-refresh.html">new deployment approach</a>  is currently implemented in a prototype, Lokad.Cloud AppHost. I&#8217;ll introduce the AppHost in more details in a later post. Important for now is that it comes with two assemblies, AppHost and AppHost.Framework. AppHost.Framework is essentially a set of contracts, while AppHost implements the actual runtime environment. Both are quite small and simple. The typical architecture anticipated in the prototype is as follows:</p>

<p><img class="purecenterimage" src="http://christoph.ruegg.name/images/lokadcloud-architecture-refresh/C.png"></p>

<ul>
<li><p><strong>Your Context</strong><br/>
Represents the whole environment where the AppHost is executed to the AppHost itself. This is why AppHost has no dependencies at all (except SharpZipLib, but that will likely be dropped soon). Thanks to this abstraction, AppHost is completely neutral to where and how it is executed. The context also provides a deployment reader and thus decides where and how application deployments are stored.</p></li>
<li><p><strong>Your Worker Process</strong><br/>
This would be the process where the whole application is executing, e.g. a Windows Azure WorkerRole, an Windows Service or even some CLI application. The worker builds the &#8220;host context&#8221;, creates an AppHost Host instances using said context and then starts and stops the host on demand.</p></li>
<li><p><strong>Your Entry Point</strong><br/>
The entry point of your application that is hosted using the AppHost. The entry point class type is chosen in the deployment itself, and automatically created in one or more runtime cells (again as specified in the deployment), isolated by AppDomain and in its own thread.</p></li>
</ul>


<p>Note that this figure does not mention Lokad.Cloud Services, Storage or Provisioning at all. Indeed, AppHost could be used to host all kind of applications (e.g. even some business application based on <a href="http://lokad.github.com/lokad-cqrs/">Lokad.CQRS</a>).</p>

<h2>Hosting Lokad.Cloud Services in AppHost</h2>

<p>That&#8217;s all nice and well, but the primary scenario is to run Lokad.Cloud Services. One of the design targets of Cloud Services have always been simple usage, achieved in parts by tight integration of our storage and provisioning toolkits into the services framework (opinionated on infrastructure). Luckily this gives us the opportunity to fully provide complete AppHost Context and EntryPoint implementations. The complete services solution now looks like this:</p>

<p><img class="purecenterimage" src="http://christoph.ruegg.name/images/lokadcloud-architecture-refresh/D2.png"></p>

<p>Note that the services framework no longer depends on Provisioning, and does not depend on any AppHost infrastructure at all. Neither AppHost nor Provisioning thus leak into your cloud services implementations. The separation between AppContext and AppEntryPoint also reflects that they run in different places: AppContext is used directly in the host process, while AppEntryPoints run in the isolated runtime cell AppDomains. This becomes clear when we visualize the complete solution:</p>

<p><img class="purecenterimage" src="http://christoph.ruegg.name/images/lokadcloud-architecture-refresh/E2.png"></p>

<p>This looks quite complicated and like a lot of infrastructure just to support that little yellow box on the top right. But this is somewhat misleading, as all the components are very focused and most of them small and independent.</p>

<p>A closer look at what is actually deployed on the worker (e.g. your Azure WorkerRole) reveals that there is really nothing more than the AppContext opinionating the AppHost towards Lokad.Cloud Provisioning and Storage and then connecting this context with the AppHost and run it in the worker process:</p>

<p><img class="purecenterimage" src="http://christoph.ruegg.name/images/lokadcloud-architecture-refresh/F.png"></p>

<p>Similarly, the actual (versioned) application deployments need to contain only the assemblies shown in the following figure. Obviously there are your cloud services, but also the entry point and the service framework:</p>

<p><img class="purecenterimage" src="http://christoph.ruegg.name/images/lokadcloud-architecture-refresh/G2.png"></p>

<p>All these parts are thus movable and &#8220;replaceable&#8221; per deployment. This brings up some nice opportunities, as you can patch and replace any of these assemblies in specific deployments without worrying about compatibility with the worker process (this used to be an issue in the past). You can change the scheduling, add new cloud service types or even replace the framework and entry point completely with your own code. Technically there&#8217;s also no need to keep it separated into three assemblies, but the isolated EntryPoint helps keeping some dependencies like AppHost out of your cloud services.</p>

<h2>Overengineered?</h2>

<p>I claim it is not. If you do want all of these:</p>

<ul>
<li><p><strong>Storage:</strong><br/>
Robust storage (especially important for remote cloud storage) that is very easy to use</p></li>
<li><p><strong>Provisioning:</strong><br/>
Automatically scale your worker instances (cloud scenario) based on demand</p></li>
<li><p><strong>Deployments:</strong><br/>
Easily switch between deployments, fast, versioned including settings, in Git style.</p></li>
<li><p><strong>Runtime:</strong><br/>
Robust multi-cell cloud application hosting, self-healing to some degree.</p></li>
<li><p><strong>Cloud Services:</strong><br/>
Compute agents that are easy to implement.</p></li>
</ul>


<p>then you do need all these components. You can either have them all in one huge monolithical assembly and your logic depending on all of them, or you can isolate them logically, keep them simple and focused and avoid unnecessary dependencies, as suggested in the presented architecture.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Cleaning up after migrating from Hg to Git]]></title>
    <link href="http://christoph.ruegg.name/blog/2011/7/30/cleaning-up-after-migrating-from-hg-to-git.html"/>
    <updated>2011-07-30T11:35:00+02:00</updated>
    <id>http://christoph.ruegg.name/blog/2011/7/30/cleaning-up-after-migrating-from-hg-to-git</id>
    <category term="Git" />
    
    <content type="html"><![CDATA[<p>There is a lot of guidance out there on how to migrate from Mercurial to Git, but they often leave you with a repository in a bad state. Even more so if it originally was a subversion repository, then migrated to Mercurial and now finally to Git.</p>

<p>The <a href="https://github.com/Lokad/lokad-cloud">Lokad.Cloud</a> repository was such a case. The committers and authors in the commit history were a complete mess, but that&#8217;s not that much of an issue in practice. Worse is the fact that most text files were stored with CLRF line endings instead of LF internally. Git supports platform-native checkouts (CRLF on Windows, LF on Linux) quite nicely, but it only works well if text files are normalized to LF internally when committed. I strongly recommend doing that, as it will save you from a lot of trouble later on. Luckily it is also the default behavior for new repositories.</p>

<!--more-->


<h2>Migration: Fast-Export to Git</h2>

<p>This is the usual procedure that properly converts branches and tags to the git equivalents:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>git clone git://repo.or.cz/fast-export.git
</span><span class='line'>mkdir git_repo &amp;&amp; cd git_repo
</span><span class='line'>git init
</span><span class='line'>/path/to/hg-fast-export.sh -r /path/to/mercurial_repo
</span><span class='line'>git checkout HEAD</span></code></pre></td></tr></table></div></figure>


<h2>Normalize the whole history to LF line-endings</h2>

<p>This step is only needed if all or some of the commits have been using non-LF line endings internally. If the repo once was in Subversion on Windows this most certainly is the case, but not necessarily on pure mercurial repositories. You can find out whether this is an issue, if you remove your git index and then reset. If a lot of files are now listed as modified, you better fix it as described here, if not you can skip this step.</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>rm .git/index
</span><span class='line'>git reset</span></code></pre></td></tr></table></div></figure>


<p><em>I recommend to do this step in Linux as it didn&#8217;t work well for me on Windows.</em></p>

<p>First we need to turn off any automated git end-of-line handling. Unfortunately this is controlled in multiple places (for historical reasons). First there is the core.autocrlf config we need to turn off:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>git config core.autocrlf false</span></code></pre></td></tr></table></div></figure>


<p>Then we need to get rid of all the .gitattributes files in your repository in case they specify any automatic eol handling. This is not necessary in most of the cases, yet the repository I was dealing with used to be a hybrid git/mercurial repo some time ago and thus did already have a gitattributes file. If there is one, delete it and commit. Afterwards your current working directory should be clean, since git no longer wants to fix your line endings on any touched text files.</p>

<p>But to make sure the .gitattributes file in previous commits don&#8217;t mess with us, we need to drop it in all commits (single line):</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>git filter-branch --prune-empty --index-filter
</span><span class='line'>   'git rm --cached --ignore-unmatch .gitattributes' -- --all</span></code></pre></td></tr></table></div></figure>


<p>After that we finally can go converting all the text files to LF line endings, with another history rewrite (single line):</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>git filter-branch -f --prune-empty --tree-filter
</span><span class='line'>   'git ls-files -z | xargs -0 dos2unix --skipbin' -- --all</span></code></pre></td></tr></table></div></figure>


<p>What this does is for every commit, for all files that are not binary, convert them to LF endings using dos2unix. In my case there are some paths with spaces in them (don&#8217;t ask..), so I switched over to NULL-character separation using the <code>-z</code> and <code>-0</code> options.</p>

<p>To ensure the normalization is enforced in future commits (especially from people forking your repository and then send you pull requests), create a new .gitattributes files containing at least something like <code>* text=auto</code>. The config option core.autocrlf however is not only local but also depreciated. You can remove it completely using</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>git config --unset core.autocrlf</span></code></pre></td></tr></table></div></figure>


<h2>Clean up committers and authors</h2>

<p>You can get a quick overview on how badly the authors are off using</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>git shortlog -se</span></code></pre></td></tr></table></div></figure>


<p>Luckily, fixing them is not that difficult, with yet another history rewrite:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>git filter-branch -f --env-filter '
</span><span class='line'>if [ "$GIT_COMMITTER_NAME" = "bad user name" ]
</span><span class='line'>then
</span><span class='line'>export GIT_COMMITTER_NAME="correct user name"
</span><span class='line'>export GIT_COMMITTER_EMAIL="correct email address"
</span><span class='line'>fi
</span><span class='line'>if [ "$GIT_AUTHOR_NAME" = "bad user name" ]
</span><span class='line'>then&lt;br /&gt;export GIT_AUTHOR_NAME="correct user name"
</span><span class='line'>export GIT_AUTHOR_EMAIL="correct email address"
</span><span class='line'>fi
</span><span class='line'>' -- --all</span></code></pre></td></tr></table></div></figure>


<h2>Housekeeping</h2>

<p>After all these rewrites it would be a good time to do some git maintenance, i.e.</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>git fsck --full</span></code></pre></td></tr></table></div></figure>


<p>to check and verify your repository, drop no longer used blobs with</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>git prune</span></code></pre></td></tr></table></div></figure>


<p>and then clean up and optimize your local repository using</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>git gc --aggressive</span></code></pre></td></tr></table></div></figure>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Lokad.Cloud Application Deployment and Versioning Refresh]]></title>
    <link href="http://christoph.ruegg.name/blog/2011/7/15/lokadcloud-application-deployment-and-versioning-refresh.html"/>
    <updated>2011-07-15T17:10:00+02:00</updated>
    <id>http://christoph.ruegg.name/blog/2011/7/15/lokadcloud-application-deployment-and-versioning-refresh</id>
    <category term="Azure" />
    <category term="Cloud Computing" />
    <category term="Lokad" />
    <category term="Lokad.AppHost" />
    <category term="Lokad.Cloud" />
    
    <content type="html"><![CDATA[<p><em>Disclaimer: I&#8217;m a major contributor to the <a href="https://code.google.com/p/lokad-cloud/">Lokad.Cloud</a> opensource project. Lokad.Cloud is a framework for distributed computing in Windows Azure, plus a set of independent toolkits like Lokad.Cloud.Storage for simpler and more reliable cloud storage access and Lokad.Cloud.Provisioning for dynamic worker auto-scaling. We use Lokad.Cloud at Lokad to deal with our massive and rapidly changing computation demands.</em></p>

<p>At the very beginning of the <a href="https://code.google.com/p/lokad-cloud/">Lokad.Cloud</a> project we decided to not rely on the Windows Azure management tools to deploy new versions of our application. Instead we implemented a dynamic worker role - initially deployed once using the Windows Azure tools - that provides a <strong>runtime environment that can load and unload applications on demand</strong>, without even recycling the azure virtual machine. The applications are isolated in a separate AppDomain so we can unload them safely, plus for sandboxing.</p>

<!--more-->


<p>This approach worked out nicely for us, both for apps that don&#8217;t change for months and apps where we often have multiple redeployments per hour. Yet over the last year we gained a lot of experience and discovered some dark spots in the current stable release. Most of them were &#8220;good enough&#8221; back then, but are starting to get in the way:</p>

<ul>
<li><p><strong>Non-atomic deployments</strong><br/>
The deployment mainly consisted of an assembly blob (zip file containing all the assemblies), a config blob (IoC configuration to add application-specific registrations and config like additional connection strings) plus service settings distributed over multiple blobs. Since workers automatically discover new deployments themselves, deployments are not atomic. Some worker could see new assemblies but old and possibly incompatible IoC configuration, causing undefined behavior until it catches up.</p></li>
<li><p><strong>Lots of blobs to poll for changes</strong><br/>
By design, all workers are completely self-contained and self-healing. A worker is never &#8220;contacted&#8221; in any way from outside, except via cloud storage (queues, blobs, tables). Since the deployments are non-atomic and in particular service settings spread over multiple blobs, they all have to be polled for change repeatedly on every worker. Polling for changed etags is not expensive but causes latency and can sum up if there are lots of services and worker instances. Even worse, some service settings also contained state (for simplicity) and thus change quite often.</p></li>
<li><p><strong>Replacing an application is easy, getting back not so</strong><br/>
It is very easy to redeploy, but there&#8217;s no way to get back to the previous state unless you have a backup ready or can rebuild it from sources. Upgrading becomes much safer if it is easy to get back, reducing the burden of shorter deployment cycles.</p></li>
<li><p><strong>Growing demand for stronger runtime</strong><br/>
E.g. to support multiple runtime &#8220;cores&#8221; or &#8220;cells&#8221; on each worker with independent scheduling and customizable assignment/affinity.</p></li>
</ul>


<p>Hence I started refactoring the Lokad.Cloud service framework recently, including reworking the handling of app deployments (not released yet):</p>

<h3>Concentrating service settings to single blob</h3>

<p>Cloud services have been refactored so that they no longer have to manage their own settings. Instead all settings are now stored in a single blob. Settings include parameters like whether a service is disabled or the trigger interval for scheduled services. Settings generally change rarely (e.g. manually through the management console) so the new settings blob still changes only rarely and conflicts are no issue (can be handled with optimistic concurrency). This brings the number of blobs to poll for drastically down to three, reducing a lot of unnecessary storage I/O and thus latency.</p>

<h3>Separating deployments from currently active deployment</h3>

<p>Previously there were just three blobs (assemblies, config and settings) for the currently active deployment. Changes were almost immediately applied on all workers.</p>

<p>From now on we can have multiple deployments exist in parallel. The &#8220;currently active deployment&#8221; is given by a pointer to the chosen deployment. Deployments are now read-only. If we want to change anything in a deployment (e.g. change some settings) we essentially create a new deployment and update the active deployment pointer to point to that new deployment instead. Let&#8217;s call that new pointer <strong>HEAD</strong>.</p>

<h3>Only one blob to poll</h3>

<p>Since deployments are read-only, the only blob we have to poll is HEAD. Since no other polling is needed, we can easily poll more often to get much more reactive workers. If we poll once every 15 seconds, we get transaction costs of around US$ 0.17 per month per active worker plus maybe US$ 0.10 for bandwidth (note that most of the time the HTTP packets will only have headers, no body payload). This is negligible compared to the worker instance cost.</p>

<h3>Content-based storage for deployments</h3>

<p>I&#8217;ve introduced <a href="http://christoph.ruegg.name/blog/2010/7/21/content-based-storage-in-the-cloud.html">content-based storage in a previous blog post</a>. The general idea is to identify data by its hash (often SHA-1 or SHA256) resulting in automatic deduplication and verifiable referential consistency. It is an ideal concept for versioning, that&#8217;s why it is also broadly used in the popular git distributed version control system. It is also an ideal approach for managing our deployments. Like this:</p>

<p><img class="pureimage" src="http://christoph.ruegg.name/images/lokadcloud-application-deployment-and-versioning-refresh/blobs.png"></p>

<p>Think of assemblies, config and settings as <em>files</em>, deployments as <em>commits </em>(pointing to one assemblies, config and settings blob each, shared if equal), and HEAD as <em>head</em> just like in git. All the arrows include the full hash of the target (as part of their name, shortened in the diagram).</p>

<p>The Index is just a redundant list of all deployments for easier management so we don&#8217;t have to iterate through all available deployments all the time. In a similar way a History blob could be interesting to track the last few deployments and when they have been deployed. Note that HEAD and Index (and History) are the only mutable blobs, all the others are readonly, although they can be garbage collected.</p>

<p>The arrows between deployments will likely be dropped, they don&#8217;t seem provide any value in practice.</p>

<h3>Prepare deployment, then activate atomically</h3>

<p>Both the creation of a deployment (based on assemblies, config and settings) and actually activating it (by changing HEAD to point to it) are now atomic operations. They still can happen at different times though, so you can prepare one or more deployments but activate them much later, if at all. The applications can be completely unrelated, so you can use this mechanism to quickly switch between different applications.</p>

<p>Note that it still takes a while until all workers have detected the change, so there will be a phase where different workers (on different VMs and servers) may have different applications running. There are ways to deal with this if it is an issue, see below.</p>

<h3>Get back to the previous version</h3>

<p>&#8230; is as trivial as looking up the previous version in the History or Index and change HEAD to point to it.</p>

<h3>Changing service settings</h3>

<p>If you change some settings in the web console, for example disable a service or change a trigger interval, a new settings blob will be created, then a new deployment referring to the new settings, and in the end HEAD will be changed to point to the new deployment. If you decide to change it back, the new settings will already exist (with the same hash), so in effect only the HEAD blob will be changed back to point to the previous deployment (plus the History updated if available). Note that you won&#8217;t see much of that in practice as the management classes will handle it automatically.</p>

<h3>Handling changes in the runtime</h3>

<p>If the runtime detects a changed HEAD it will immediately load the deployment blob. Since it knows its current deployment and since the blobs are named after their content hash and readonly, it can simply compare the names to detect what blob has been changed. If either assemblies or config has changed, the runtime will have to restart all the processes, but if only the settings changed then it&#8217;s usually enough to just adapt the scheduling appropriately. Settings changes therefore still have a rather small impact in practice, despite switching to a completely different deployment in the storage.</p>

<h3>Forcing a deployment form an application</h3>

<p>Sometimes you need to ensure that a message is processed by a  specific deployment. For example, we sometimes deploy a new version and  then want to do some computation on exactly that version. To achieve  that we could either wait, or include the deployment hash in the message  and make the application force the runtime to load exactly that  deployment if it isn&#8217;t matching already. For that and similar purposes I  suggest to provide some way for services to send commands to the local runtime, like commands to enforce loading of the head or some specific deployment as soon as possible.</p>

<h3>Multi-Head Scenario</h3>

<p>This is unrelated to deployments, but the new runtime will also support multiple processes (&#8221;<em>cells</em>&#8221;) in parallel, isolated in separate AppDomains and threads and with independent scheduling. Services settings contain a new cell affinity parameter to control in what cells a service should be executed. This can be useful e.g. to create a cell for low latency queue services or to avoid blocking when some services can have long processing times but there are only few worker instances available.</p>

<p>Now technically it would be possible to load different applications or versions in different cells at the same time, with separate HEADs for each cell. This would bring the service interleaving approach to a new level. Not sure how useful it would be in practice though (plus it would require some work to decide which app can choose the number of worker instances), so I won&#8217;t follow that idea any further for now.</p>

<h2>Too complicated?</h2>

<p>This all seems very complicated just to do deployments. <strong>How could we simplify it but still satisfy our requirements?</strong></p>

<ul>
<li><p><strong>Single blob only</strong>:<br/>
Store everything (assemblies, config, settings) in a single blob and give the currently active blob a special name (like &#8220;current&#8221; or still &#8220;HEAD&#8221;). Way simpler. <em>Disadvantage</em>: the whole blob can get large and has to be touched by every single settings change. Unlikely an issue in most deployments though.</p></li>
<li><p><strong>Drop the hashing</strong>:<br/>
SHA is built in, so this is not really a big simplification. <em>Disadvantage</em>: we&#8217;d loose deduplication</p></li>
<li><p><strong>Toggle instead of versioning</strong>:<br/>
Just provide two versions of each blob, which can be switched on demand (similar to Azure staging vs. production deployments). The staging blobs would be edited and when done switched somehow in an atomic way (e.g. HEAD blob pointing to version again). This may simplify management remarkably.</p></li>
<li><p><strong>Outsource the versioning</strong>:<br/>
Use a proofed version control system instead, like git. Technically even subversion would work (I&#8217;ve tested subversion on Azure in the past, worked fine. For git we could even use one of the native git libraries). <em>Disadvantage</em>: checking a remote repository for changes is more expensive than a simple azure blob storage ETag check (git beats svn here). In the worst case we could work around that by introducing a HEAD blob in azure storage again, containing the current head revision/hash. We would then update that blob after every commit and poll it from the workers at much higher frequency. <em>Advantage</em>: Much more robust versioning, we could drop the zip files (simply version the assemblies directly), and we&#8217;d get push deployment for free.</p></li>
</ul>


<p><em>Personally I like that last alternative using git the most.</em></p>

<h2>Feedback</h2>

<p>What do you think? Too complicated? Overengineered? Schould I use a real git repository instead? Let me know. Thanks!</p>

<h2>(Migrated Comments)</h2>

<h4>Rinat Abdullin, July 23, 2011</h4>

<p>Once again, that&#8217;s a fine post, just like the previous one on hashing. I loved rereading it.</p>

<p>Just a few thoughts.</p>

<ol>
<li>How hard would be to use Lokad.Cloud cell management with AppDomains without using actual services and message dispatch?</li>
<li>I think that given versioning sandbox/production toggle is just an overkill (it duplicates the logic). swaps are just a way to shift focus from one version to another, while keeping the ability to roll-back.</li>
<li>Hashing and separated blobs, as I believe, are a must for a simple implementation. They allow to keep stuff simple and decoupled. Besides, the complexity could be reduced by the tooling.</li>
<li>Git (versus self-implemented versioning) is just a way to deliver changes in my opinion (besides, versioned settings). So it should not be that different for blob or git storage (we just poll for changes and pull the version specified by head). I&#8217;m wondering how well would private github repo work here&#8230;</li>
</ol>


<p>In case of blob, head is stored in blob storage (human-editable JSON), pointing to the deployment blobs
In case of full git, head is in git
In mixed scenario, head is stored in blob storage, pointing to the git version/url</p>

<p>Sorry for pushing that much of my rambling here :)</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Content-Based Storage in the Cloud]]></title>
    <link href="http://christoph.ruegg.name/blog/2010/7/21/content-based-storage-in-the-cloud.html"/>
    <updated>2010-07-21T12:19:00+02:00</updated>
    <id>http://christoph.ruegg.name/blog/2010/7/21/content-based-storage-in-the-cloud</id>
    <category term="Azure" />
    <category term="Cloud Computing" />
    <category term="Storage" />
    
    <content type="html"><![CDATA[<p>One derivative of the NoSQL movement that rediscovers non-relational storage approaches lately is a content-based value store. Such a store is similar to a Key-Value store but uses a cryptographic hash of the value as key.</p>

<p><img class="purecenterimage" src="http://christoph.ruegg.name/images/content-based-storage-in-the-cloud/hashing.png"></p>

<h3>An SHA-1 hash of the value is good enough to identify it</h3>

<p>The <a href="http://en.wikipedia.org/wiki/SHA-1">SHA-1 hash function</a> is unique, meaning that for every value there&#8217;s exactly one key that can be computed using SHA-1, hence <strong>value implies key</strong>. We can always compute the unique key of a value.</p>

<p><img class="purecenterimage" src="http://christoph.ruegg.name/images/content-based-storage-in-the-cloud/sha1unique.png"></p>

<!--more-->


<p>The probability of an SHA-1 hash collision is <a href="http://en.wikipedia.org/wiki/Birthday_attack">extremely low</a>. The <a href="http://progit.org/book/ch6-1.html">most cited numbers</a> show that you&#8217;d need 10<sup>24</sup> values in order to cause a 50% chance of a collision. Even with a whopping 10<sup>18</sup> distinct values the likelihood of at least one collision is already down at 10<sup>-9</sup>. Hence, a <strong>key refers to a single value</strong> with extremely high probability. While the SHA1 function is not strictly injective, it is approximatively injective enough for almost all practical applications.</p>

<p><img class="purecenterimage" src="http://christoph.ruegg.name/images/content-based-storage-in-the-cloud/sha1injective.png"></p>

<p>Note that this is different from common non-distributed <a href="http://en.wikipedia.org/wiki/Hash_table">Hash Tables</a> where a very short hash function is used to directly jump fast to an inner data structure (bucket) containing all items sharing the same hash. The motivation for hashing in such hash tables is to be able to directly compute the position where an item is stored, avoiding long linear or binary searches.</p>

<h3>Verifiable Consistency</h3>

<p>A nice side effect of using SHA1 is that given the key, the value retrieved from the storage can be verified (detect data corruption or tampering) simply by recomputing its SHA-1 hash and comparing it to   the key. You can even digitally sign a key and by that implicitly sign its value and all those referred by it.</p>

<p><img class="purecenterimage" src="http://christoph.ruegg.name/images/content-based-storage-in-the-cloud/corruption.png"></p>

<h3>Keys are uniformly distributed</h3>

<p>When using the common hex string format, the 160bit long keys are always  40 characters long and look something like this:</p>

<pre><code>d921970aadf03b3cf0e71becdaab3147ba71cdef
</code></pre>

<p>We can safely treat them as if their characters were <strong>distributed uniformly</strong> (0-9, a-f). This brings some advantages especially when used in a distributed or cloud-like scenario, as simple prefix ranges (like 0-3, 4-7, 8-b, c-f) can be used for partitioning and distributed processing.</p>

<p>On the other hand this means that you can&#8217;t use other indexed keys or ordering out of the box without further logic or storage on top of it.</p>

<h3>The value of a key is fixed and can&#8217;t be changed</h3>

<p>The value associated with a key can <strong>never change</strong>. If you store an updated value, you&#8217;ll get a new key for it and update the reference to this new key. This has severe consequences on where this storage scheme can be used efficiently. For example, a typical relational data model with cyclic relations wouldn&#8217;t fit at all to such a content-based data store.</p>

<p>However, in practice in a cloud-like application this is often not that much of an issue. Even more so as soon as you realize that the existence of read-only stale yet still consistent data is not an issue either (see <em>CQRS</em>).</p>

<p>Again this fits very well with distributed and cloud computing, as it becomes <strong>trivial to aggressively cache values locally</strong>. If a value is found in the local cache it is guaranteed to be up to date (since values can&#8217;t change), so you don&#8217;t even have to check for timestamps or whether it has been changed remotely. Since in Azure the instances come with a lot of local storage, a simple <a href="http://en.wikipedia.org/wiki/Cache_algorithms">MRU cache</a> for a few GB can save you a lot of downloads and roundtrips if you use only a relatively small number of instances or have managed to create some weak affinity between jobs and Azure instances.</p>

<h4>Example: Large Queue Messages</h4>

<p>Azure Queues have content size limitations, that&#8217;s why <a href="http://code.google.com/p/lokad-cloud/">Lokad.Cloud</a> implements logic to let messages transparently overflow to blob storage. To do that it needs a way to store a value in a blob that it can retrieve later by some identifier. This identifier is then packed to the actual message. There&#8217;s no need to access it in any other way, so it&#8217;s a perfect candidate for a content-based value store.</p>

<p>In my experience, in real life the probability that a message is processed on the same worker that originally put it there is often high or at least not negligible. In all these cases, a cached content-based value store would save you from having to download these blobs completely, but still work correctly otherwise.</p>

<h3>Implicit Value-Deduplication</h3>

<p>Since the same value leads to the same key, trying to store the same value twice means you get the same storage location and the value gets stored only once. The second trial can even be aborted early by provoking a precondition violation, or skipped completely if it is already in the local cache (depending on the deletion plan).</p>

<p><img class="purecenterimage" src="http://christoph.ruegg.name/images/content-based-storage-in-the-cloud/deduplication.png"></p>

<h4>Example: Daily Backup Snapshots</h4>

<p>I recently wrote a small service that periodically takes full snapshots of all tables and blobs of a set of Azure storage accounts to a separate account, keeps the last N snapshots each and removes the rest. Often only a small subset of blobs or table entities actually change in a day. Had I used content-based storage, I could have saved a lot of storage (and thus cost) by deduplication without having to implement complicated incremental or differential backups. Taking a snapshot would likely also have taken less time thanks to some saved uploads.</p>

<h3>Trivial Distribution and Replication</h3>

<p>Other than any classical relational databases and key value stores,  replication and distribution of data in such a content-based value store  is trivial since <strong>there can&#8217;t be any conflicts</strong>. This is why the  caching mentioned above works so well. Replication simply means to copy  the values of all missing keys over to the target. A consequence of this is that for some scenarios there&#8217;s no technical need for a single master database. A peer can synchronize with any other peer, resulting in full peer to peer support. <a href="http://en.wikipedia.org/wiki/Distributed_hash_table">Distributed  hash tables</a> (DHT) as used by most file sharing solutions including BitTorrent work similarly and turn out to be very efficient.</p>

<h3>History Consistency and Versioning</h3>

<p>Since values can&#8217;t change, they <strong>remain consistent with each other even when they become stale</strong>. That&#8217;s why this approach is used by most of the popular distributed version control systems like <a href="http://www.git-scm.com/">Git</a> and <a href="http://mercurial.selenic.com/">Mercurial</a> as well.</p>

<p>The Git object model is nicely described in the <a href="http://book.git-scm.com/1_the_git_object_model.html">git community book</a> (the following two images are taken from there). In essence, all objects are stored just as described here. In addition to data blobs (i.e. source code files) there are also tree objects representing a folder simply by listing all the SHA-1 keys of its child elements, again stored by its hash:</p>

<p><img class="purecenterimage" src="http://christoph.ruegg.name/images/content-based-storage-in-the-cloud/git-object-blob-tree.png"></p>

<p>If a file changes in git in a new revision, it will get a new hash. The  folder/tree containing it will update that hash in its list, and in turn  will itself get a new hash. Both the old a new version are therefore  still available completely and consistently simply by referring to the  hash of the respective version of the tree.</p>

<p><img class="purecenterimage" src="http://christoph.ruegg.name/images/content-based-storage-in-the-cloud/git-objects-example.png"></p>

<p>Historical consistency can be useful for all kind of applications. Note that this approach persists snapshots of values and content, not how they are changed. This is thus a dual counterpart to concepts like <em>event sourcing</em> where only the actions causing changes of the values are persisted but not the actual values.</p>

<h3>Append-Only Storage or Value Scavenging</h3>

<p>Unless you need an append-only storage, you need to be careful about deleting values in such a system. Since there is implicit deduplication, you can&#8217;t just delete what you&#8217;ve just inserted since the same value could also be used in other places. There are several approaches how you can attack this, depending on your scenario:</p>

<ul>
<li><p><strong>Garbage Collection:</strong> If there is a hierarchy where all values are referenced by another value, you can follow the tree from time to time and then remove all values you haven&#8217;t seen. This is used by all the distributed version control systems. Be careful about race conditions though.</p></li>
<li><p><strong>Reference Tracking:</strong> Use metadata to list all keys or items referring a value. If you remove the last reference, remove it. This can be combined with garbage collection. You can also use reference counters, but they are difficult to handle correctly in an unreliable world like a cloud environment where instantaneous VM shutdowns without prior notice are to be expected.</p></li>
<li><p><strong>Time-Based:</strong> You &#8220;touch&#8221; a value (update a timestamp in the metadata) whenever it is used, and from time to time remove all items that haven&#8217;t been used for a while. Note that that causes a lot of round trips (although they could be performed asynchronously in the background).</p></li>
<li><p><strong>Limited Lifetime:</strong> Sometimes its good enough to just define that a value can safely be removed after a day or a month.</p></li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[How to create 2048bit Certificate CSRs for Dell's iDRAC6]]></title>
    <link href="http://christoph.ruegg.name/blog/2010/7/20/how-to-create-2048bit-certificate-csrs-for-dells-idrac6.html"/>
    <updated>2010-07-20T11:21:00+02:00</updated>
    <id>http://christoph.ruegg.name/blog/2010/7/20/how-to-create-2048bit-certificate-csrs-for-dells-idrac6</id>
    <category term="Server Management" />
    
    <content type="html"><![CDATA[<p>In case you happen to manage a recent DELL server with a dedicated iDRAC remote management card and you&#8217;d like to secure it by using your own certificate, you&#8217;ll have to request a certificate based on a CSR request created directly in the iDRAC web interface.</p>

<p>Unfortunately these CSRs have only 1024 bit keys, which get refused by some public certificate authorities like <a href="http://www.startssl.com">StartCom</a> (for security reasons they require at least 2048 bits). You can&#8217;t choose the bit length in the iDRAC web interface, but luckily there is another way to make it generate 2048 or 4096 bit long keys for the CSR using racadm from Dell&#8217;s System Management Tools:</p>

<!--more-->


<p>View the current configuration (all on 1 line):</p>

<pre><code>racadm.exe -r [iDRAC IP] -u [user] -p [password]
getconfig -g cfgRacSecurity
</code></pre>

<p>Change the key length to 2048 bits (all on 1 line):</p>

<pre><code>racadm.exe -r [iDRAC IP] -u [user] -p [password]
config -g cfgRacSecurity -o cfgRacSecCsrKeySize 2048
</code></pre>

<h2>(Migrated Comments)</h2>

<h4>Dan Orum, September 7, 2010</h4>

<p>If you are using the Express version of the iDRAC card, you can&#8217;t use the racadm.exe utility with an IP address remotely. Instead, you need to run the utility on the local server without specifying the -r parameter.</p>

<h4>Christoph Ruegg, September 11, 2010</h4>

<p>Indeed, thanks for the hint!</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[git howto: revert a commit already pushed to a remote repository]]></title>
    <link href="http://christoph.ruegg.name/blog/2010/5/5/git-howto-revert-a-commit-already-pushed-to-a-remote-reposit.html"/>
    <updated>2010-05-05T23:22:00+02:00</updated>
    <id>http://christoph.ruegg.name/blog/2010/5/5/git-howto-revert-a-commit-already-pushed-to-a-remote-reposit</id>
    <category term="Git" />
    
    <content type="html"><![CDATA[<p>So you&#8217;ve just pushed your local branch to a remote branch, but then realized that one of the commits should not be there, or that there was some typo in it. No problem, you can fix it. But you should do it rather fast <em>before anyone fetches the bad commits</em>, or you won&#8217;t be very popular with them for a while ;)</p>

<h2>Case 1: Revert the last commit</h2>

<p>This is the easiest case. Let&#8217;s say we have a remote <em>mathnet</em> with branch <em>master</em> that currently points to commit <em>dd61ab32</em>. We want to remove the top commit. Translated to git terminology, we want to force the <em>master</em> branch of the <em>mathnet</em> remote repository to the parent of <em>dd61ab32</em>:</p>

<pre><code>$ git push mathnet +dd61ab32^:master
</code></pre>

<!--more-->


<p>Where git interprets <code>x^</code> as the parent of x and <code>+</code> as a forced non-fastforward push. If you have the master branch checked out locally, you can also do it in two simpler steps: First reset the branch to the parent of the current commit, then force-push it to the remote.</p>

<pre><code>$ git reset HEAD^ --hard
$ git push mathnet -f
</code></pre>

<h2>Case 2: Revert the second last commit</h2>

<p>Let&#8217;s say the bad commit <em>dd61ab32</em> is not the top commit, but a slightly older one, e.g. the second last one. We want to remove it, but keep all commits that followed it. In other words, we want to rewrite the history and force the result back to <em>mathnet/master</em>. The easiest way to rewrite history is to do an interactive rebase down to the parent of the offending commit:</p>

<pre><code>$ git rebase -i dd61ab32^
</code></pre>

<p>This will open an editor and show a list of all commits since the commit we want to get rid of:</p>

<pre><code>pick dd61ab32
pick dsadhj278
...
</code></pre>

<p>Simply remove the line with the offending commit, likely that will be the first line (vi: delete current line = <code>dd</code>). Save and close the editor (vi: press <code>:wq</code> and return). Resolve any conflicts if there are any, and your local branch should be fixed. Force it to the remote and you&#8217;re done:</p>

<pre><code>$ git push mathnet -f
</code></pre>

<h2>Case 3: Fix a typo in one of the commits</h2>

<p>This works almost exactly the same way as case 2, but instead of removing the line with the bad commit, simply replace its <code>pick</code> with <code>edit</code> and save/exit. Rebase will then stop at that commit, put the changes into the index and then let you change it as you like. Commit the change and continue the rebase (git will tell you how to keep the commit message and author if you want). Then push the changes as described above. The same way you can even split commits into smaller ones, or merge commits together.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Lost in Math.NET Codenames?]]></title>
    <link href="http://christoph.ruegg.name/blog/2010/4/26/lost-in-mathnet-codenames.html"/>
    <updated>2010-04-26T20:27:00+02:00</updated>
    <id>http://christoph.ruegg.name/blog/2010/4/26/lost-in-mathnet-codenames</id>
    <category term="Math.NET" />
    <category term="Math.NET Numerics" />
    
    <content type="html"><![CDATA[<p>Math.NET Numerics? Iridium? dnAnalytics? Yttrium? Huh? &#8230;sounds familiar?</p>

<p>It looks like some of you got lost in all the Math.NET subprojects and codenames. Math.NET evolved over time, with projects splitting into separate new projects, the introduction of codenames and new projects replacing older ones with a slightly different focus and approach. Unfortunately this lead to a mess (sorry for that!), so I&#8217;m trying to throw light on it by the following small chart, depicting the Math.NET Project history:</p>

<p><img class="pureimage" src="http://christoph.ruegg.name/images/lost-in-mathnet-codenames/history-small.png"></p>

<!--more-->


<p>It all started with <em>MathLib</em> which was a very verbose object oriented computer algebra approach, including all kind of numeric routines to back the symbolics, including basic linear algebra. At the same time <em>dnAnalytics</em> was founded independently and unrelated to Math.NET, focusing entirely on numerics and statistics, leveraging highly optimized native libraries for better performance.</p>

<p>Soon it became obvious that it would make sense to refactor out the numerical aspects of <em>MathLib</em> to a separate project and to develop it independently, so <em>Numerics</em> was born, as well as several other non-numeric subprojects. <em>Numerics</em> became <em>Iridium</em>, and in 2009 <em>Iridium</em> and <em>dnAnalytics</em> finally decided to join forces and work together on the new <em>Math.NET Numerics</em> project, replacing both <em>Iridium</em> and <em>dnAnalytics</em> and entirely unrelated to the early <em>Numerics 0.1-0.4</em> back in 2004.</p>

<p>Mostly thanks to Marcus and Jurgen, Math.NET Numerics is very well alive and active. Check out our <a href="http://github.com/mathnet/mathnet-numerics/">source code repository</a> and <a href="http://mathnetnumerics.codeplex.com/Thread/List.aspx">forums</a>.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Connect from Azure to an SQL Server Named Instance]]></title>
    <link href="http://christoph.ruegg.name/blog/2009/12/23/connect-from-azure-to-an-sql-server-named-instance.html"/>
    <updated>2009-12-23T17:52:00+01:00</updated>
    <id>http://christoph.ruegg.name/blog/2009/12/23/connect-from-azure-to-an-sql-server-named-instance</id>
    <category term="Azure" />
    <category term="Cloud Computing" />
    <category term="Networking" />
    <category term="Storage" />
    
    <content type="html"><![CDATA[<p>In some situations you can&#8217;t or don&#8217;t want to move all your data completely to the cloud. Be it to connect to your existing infrastructure, a company policy, to remain multi-tenant or simply when migrating slowly step by step. Common to these cases is often the requirement to synchronize with or connect from Azure to some local or offsite SQL Server database. For synchronization you may want to try the <a href="http://msdn.microsoft.com/en-us/sync/default.aspx">Microsoft Sync Framework</a>. This post is about the other option: connecting to an external named SQL Server instance.</p>

<h2>Connecting to Named SQL Server Instances</h2>

<p>In addition to its own storage options like SQL Azure and Azure Table Storage, Azure also allows you to connect to <strong>external SQL Servers</strong> over TCP/IP. However, there&#8217;s a pitfall right now when using named SQL Server instances:</p>

<!--more-->


<blockquote><p>  <strong>System.Data.SqlClient.SqlException</strong>:<br/>
  A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections.<br/>
  (provider: SQL Network Interfaces, error: 26 - <strong>Error Locating Server/Instance Specified</strong>)<br/>
  <strong><a href="http://stackoverflow.com/questions/1952904/connecting-to-remote-sql-server-2008-from-windows-azure">source</a></strong></p></blockquote>

<p>Provided your connection string is correct, this is likely to be an issue with how SQL server finds your named instance.</p>

<h2>Instance Resolution using SQL Server Browser</h2>

<p>Since SQL Server 2005, the <a href="http://msdn.microsoft.com/en-us/library/ms181087.aspx">SQL Server Browser service</a> is responsible for enumerating available instances on a machine, and to resolve instance names to the actual named pipe or TCP port (for SQL Server 2000 it was the SQL Server Resolution Protocol).</p>

<p>In order to resolve the TCP port of a named instance, the client sends an UDP datagram to port 1434, to which the server browser replies with another datagram listing the instance endpoint to which the client then connects to. Thanks to this mechanism it is no longer required to have the server listen on the standard SQL server TCP port 1433, so it can fully support multiple (named) instances. In fact, the default for named instances is to use a dynamic random TCP port.</p>

<h2>Azure vs. SQL Server Browser</h2>

<p>When connecting from Azure this resolution mechanism fails, simply because the UDP datagrams never reach their target (this may change in the future). So there&#8217;s no way the client can find the actual probably random TCP port to connect to, and will throw the SqlException cited above.</p>

<h2>Solution</h2>

<p>To work around this issue, you can <strong><a href="http://msdn.microsoft.com/en-us/library/ms177440.aspx">configure your named instance to listen on a static TCP port</a></strong> instead of randomly selecting a new dynamic one on every restart (<a href="http://support.microsoft.com/kb/823938">related kb</a>). You can then specify this static port directly in the connection string in your Azure worker role:</p>

<pre><code>Data Source={domain/ip},{port};Network Library=DBMSSOCN;
Initial Catalog={dbname};User ID={user};Password={pw}
</code></pre>

<p>Note that in this case there&#8217;s no need to specify the name of the instance in the connection string. The network library parameter tells the client to use TCP/IP instead of e.g. Named Pipes.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Azure: Cloud Service Models]]></title>
    <link href="http://christoph.ruegg.name/blog/2009/12/15/azure-cloud-service-models.html"/>
    <updated>2009-12-15T13:26:00+01:00</updated>
    <id>http://christoph.ruegg.name/blog/2009/12/15/azure-cloud-service-models</id>
    <category term="Azure" />
    <category term="Cloud Computing" />
    <category term="Lokad" />
    <category term="Lokad.Cloud" />
    
    <content type="html"><![CDATA[<p>Since I joined <a href="http://www.lokad.com/">Lokad</a> this September I finally had the chance to dive into cloud computing. We chose <a href="http://www.microsoft.com/windowsazure/">Windows Azure</a> as platform for our very computation intensive business, and built a neutral opensource framework on top of it: <a href="http://code.google.com/p/lokad-cloud/">Lokad.Cloud</a>.</p>

<h2>Cloud Services</h2>

<p>Lokad.Cloud is described as a .net object-to-cloud persistence mapper, but it&#8217;s actually much more. This post shall concentrate on one aspect only: Its notion of <a href="http://code.google.com/p/lokad-cloud/wiki/ScalableCloudServices">Cloud Services</a> as horizontally scalable workers.</p>

<p>In essence, cloud services are managed and executed as follows:</p>

<!--more-->


<ol>
<li><p>The Lokad.Cloud management infrastructure (for now essentially a web role) allows you to upload one or more assemblies containing a set of cloud services and optionally some configuration file.</p></li>
<li><p>Every Azure worker role instance loads all these services in an isolated AppDomain.</p></li>
<li><p>Each Azure worker then executes these services one at a time according to some scheduling algorithm and execution policy.</p></li>
</ol>


<p>We provide specialized base classes to simplify implementing services processing items from a shared queue or for services which are to be called in regular intervals.</p>

<p>We treat all azure workers as equal and therefore execute every cloud service on each Azure worker from time to time. In other words, we map all cloud services to all Azure workers, forming a complete bipartite graph between cloud services and Azure workers as shown in the following figure.</p>

<p><img class="center" src="http://christoph.ruegg.name/images/azure-cloud-service-models/CloudServicesToAzureWorkers.png"></p>

<p>This is a fundamental concept that yields a very simple design with a potential for ideal horizontal scaling, and is even resilient to failing azure workers as long as at least one worker remains intact.</p>

<h2>Cloud Service Models and Deployments</h2>

<p>The only object that is aware of this mapping is the service scheduler. Yet, from the management and diagnostics perspective it would be interesting to represent the cloud services as first class objects. I&#8217;m therefore introducing the notion of Cloud Service Models for Lokad.Cloud (not part of the current release, open whether it ever will be).</p>

<p>In Azure, web and worker roles are explicitly defined and configured in two xml files. Since the latest update of the Azure tools for Microsoft VisualStudio, they are referred to as <em>Azure Service Model</em>. Using the Azure management website one can upload an assembly plus the two xml files to create a unique <em>Azure deployment</em>. A deployment can be stopped or running, either in production or in staging mode.</p>

<p>The same concepts can also be applied to Cloud Services, on a slightly higher level of abstraction and orthogonal to the Azure terms.</p>

<p>A <em>Cloud Service Model</em> is a unique entity, associated with a set of assemblies, the cloud services defined in them and their configuration (if applicable). Using the Lokad.Cloud management tools an administrator can upload such a model and create a unique <strong>Cloud Service Deployment</strong>. A deployment can be stopped or running, and of course be removed when no longer needed. A failing or malfunctioning deployment can be diagnosed and dealt with directly in the management UI.</p>

<p>Note that the currently implemented option to upload a zip file containing assemblies and optional configuration is already very close to such a models, but is missing identity and other metadata.</p>

<p><img class="center" src="http://christoph.ruegg.name/images/azure-cloud-service-models/CloudServicesDeployments.png"></p>

<p>In each Azure worker, our scheduler will load the current service model, load the services and schedule them accordingly. From time to time the scheduler will check whether the deployed service model has changed, and update if necessary.</p>

<p>Technically this design would also allow to run multiple different deployments in parallel, e.g. by breaking the complete bipartite graph between Cloud Services and Azure workers into a non-complete bipartite one where Azure workers are assigned to a single Cloud Service Deployment:</p>

<p><img class="center" src="http://christoph.ruegg.name/images/azure-cloud-service-models/CloudServicesDeployments2.png"></p>

<p>Or by sharing the Azure workers by Cloud Service Deployments in a way or another (e.g. in parallel, or round robin):</p>

<p><img class="center" src="http://christoph.ruegg.name/images/azure-cloud-service-models/CloudServicesDeployments3.png"></p>

<p>Remember however that some of these scenarios violate the fundamental concept mentioned above. Hence, as usual, there&#8217;s a tradeoff between flexibility and robustness.</p>

<h2>Update</h2>

<p>It seems there&#8217;s a better way to differentiate between cloud service models and deployments:</p>

<ul>
<li><p><strong>Model:</strong> An identity, a set of (named) cloud services, their assemblies and optionally some configuration.</p></li>
<li><p><strong>Deployment:</strong> An identity, a set of models and their mapping to (Azure) worker nodes.</p></li>
</ul>


<p>I.e. only one deployment can run at at time, but there&#8217;s an option to support configuring multiple models in a deployment. Also, there&#8217;s a trivial empty deployment where no models are loaded at all.</p>

<p>Hence, the labels in the figures above should read &#8220;Cloud Service Model A&#8221; instead of &#8220;Cloud Service Deployment A&#8221;, etc.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[dnAnalytics + Iridium = Math.NET Numerics]]></title>
    <link href="http://christoph.ruegg.name/blog/2009/8/3/dnanalytics-iridium-mathnet-numerics.html"/>
    <updated>2009-08-03T10:44:00+02:00</updated>
    <id>http://christoph.ruegg.name/blog/2009/8/3/dnanalytics-iridium-mathnet-numerics</id>
    <category term="Math.NET" />
    <category term="Math.NET Numerics" />
    
    <content type="html"><![CDATA[<p>You may have wondered why the Math.NET Iridium development has stopped abruptly almost two months ago. Luckily this is not entirely true, in the last few weeks the .Net numerics library has progressed well - but at a different place:</p>

<p><strong><a href="http://www.mathdotnet.com/Iridium.aspx">Math.NET Iridium</a> is being merged with <a href="http://dnanalytics.codeplex.com/">dnAnalytics</a>, resulting in a new project named <a href="http://numerics.mathdotnet.com">Math.NET Numerics</a></strong></p>

<p>What does that mean for existing Math.NET Iridium users?</p>

<!--more-->


<ul>
<li><p><strong>Higher development momentum</strong> and larger user community (as a direct result of merging two projects).</p></li>
<li><p><strong>Better algorithm and code quality</strong> by picking the best of each project and simply by having new highly skilled developers on board.</p></li>
<li><p>New opensource license model: <strong>MIT/X11</strong>. This is a very open license similar to the so called New BSD License. This model is much less restricting than the previous LGPL and is (to my knowledge) source-compatible to a wide range of licenses including all GPL-based licenses and the Microsoft opensource licenses, too.</p></li>
<li><p>Some <strong>API changes</strong>. This is unavoidable since we try to integrate the best of both dnAnalytics and Iridium. At the same time this is a good chance to throw out some old designs that have shown to be improvable and replace them with better approaches. However, we try hard to keep migration as smooth as possible.</p></li>
<li><p>In addition to the completely self-contained managed implementation, we&#8217;ll profit from the dnAnalytics experience with parallelized and native optimizations (MKL, ACMS, CUDA etc) and will therefore provide <strong>optional wrappers around native libraries</strong> which provide <strong>significantly better performance</strong> when working with large data sets.</p></li>
<li><p>Again thanks to the dnAnalytics experience, you can expect better <strong>F#</strong> support, even though the library is still written in C#.</p></li>
<li><p>Although Iridium did support sparse linear algebra for a very short time, we had to remove it due to several issue. You can expect Math.NET Numerics to finally <strong>support sparse linear algebra</strong> in a clean way.</p></li>
</ul>


<p>You&#8217;ll find the new Math.NET Numerics discussion board and tracker at
<a href="http://mathnetnumerics.codeplex.com/">CodePlex</a> and the current sources at
<a href="http://github.com/mathnet/mathnet-numerics/">Github</a> (subversion mirror at
<a href="http://code.google.com/p/mathnet-numerics/source/checkout">google</a>). The full portal website and wikis etc. will be available in a few weeks. Feel free to post your ideas, feedback or even fork the repository at github to contribute code to the project (note that we will completely reorganize the project structure until mid August).</p>

<p>We&#8217;ll let you know here and on <a href="http://twitter.com/MathNetNumerics">Twitter</a> as soon as we reach a first milestone and have an api preview ready.</p>

<h2>(Migrated Comments)</h2>

<h4>Joannes Vermorel, August 3, 2009</h4>

<p>Congratulations! Sparse linear algebra is really a nice move (I am sorry I had not been able to push it forward at the time).</p>

<h4>Alexey Zakharov, October 23, 2009</h4>

<p>Good news! C# really needs such library in stable version.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Online API Reference]]></title>
    <link href="http://christoph.ruegg.name/blog/2009/4/17/online-api-reference.html"/>
    <updated>2009-04-17T18:08:00+02:00</updated>
    <id>http://christoph.ruegg.name/blog/2009/4/17/online-api-reference</id>
    <category term="Math.NET" />
    <category term="Math.NET Numerics" />
    
    <content type="html"><![CDATA[<p>We now finally provide an online api reference in an rdoc-like style, generated by
<a href="http://docu.jagregory.com/">docu</a> (actually by my github <a href="http://github.com/cdrnet/docu/network">fork</a> of it). Note that docu is new and still under heavy development, so the quality is likely to improve over the next months (e.g. right now the class summaries are missing).</p>

<p><a href="http://api.mathdotnet.com/">http://api.mathdotnet.com/</a></p>

<p>It is simple, but (other than the older NDoc &amp; Sandcastle generated sites) loads very fast.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Iridium Statistics Accumulator: Better numerical stability]]></title>
    <link href="http://christoph.ruegg.name/blog/2009/1/7/iridium-statistics-accumulator-better-numerical-stability.html"/>
    <updated>2009-01-07T21:12:00+01:00</updated>
    <id>http://christoph.ruegg.name/blog/2009/1/7/iridium-statistics-accumulator-better-numerical-stability</id>
    <category term="Math.NET" />
    <category term="Math.NET Iridium" />
    
    <content type="html"><![CDATA[<p>The algorithm on how the Mean, Variance and Sigma are incrementally computed in the statisics accumulator (MathNet.Numerics.Statistics.Accumulator) has been improved last week in Iridium revision 503 to provide better numeric stability when dealing with samples with a very large mean but only a small variance.</p>

<p>For example, the variance of normally distributed samples with mean 10<sup>e+9</sup> but a variance of only 1 can now be accurately estimated. The previous implementation has been very unstable in that case.</p>

<p>The new algorithm continues to support removing samples from the accumulator (and updates the estimates accordingly).</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Iridium 2008 August Release (2008.8.16.470)]]></title>
    <link href="http://christoph.ruegg.name/blog/2008/8/14/iridium-2008-august-release-2008816470.html"/>
    <updated>2008-08-14T14:14:00+02:00</updated>
    <id>http://christoph.ruegg.name/blog/2008/8/14/iridium-2008-august-release-2008816470</id>
    <category term="Math.NET" />
    <category term="Math.NET Iridium" />
    
    <content type="html"><![CDATA[<p>Iteration 16 of Math.NET Iridium (Numerics) is now available, as 2008 August Release with Version 2008.8.16.470. <strong>Grab it <a href="http://www.mathdotnet.com/downloads/Iridium-2008-8-16-470.ashx">here</a></strong>.</p>

<p>Please continue reporting issues and bugs you find, it&#8217;s very useful and helps making the whole project better. We&#8217;ve also setup a <a href="http://feedback.mathdotnet.com/">UserVoice</a> page for you to suggest or vote for new features or enhancement.</p>

<!--more-->


<p><strong>Team:</strong> Christoph R&uuml;egg, Joann&egrave;s Vermorel, Matthew Kitchin</p>

<h4>Summary</h4>

<ul>
<li>Bugs: 4 bugs have been fixed.</li>
<li>Completely revised and extended interpolation toolkit.</li>
<li>New complex matrix and vector type (Complex Linear Algebra will follow in the next iteration).</li>
<li>Slightly enhanced real matrix and vector types.</li>
<li>QR decompositions are now unique (positive real R diagonal).</li>
<li>Complex type now has a public constructor, more intuitive.</li>
<li>New Digamma (Psi) special function.</li>
<li>Various other small changes.</li>
</ul>


<p>For more details have a look at the <a href="http://www.mathdotnet.com/downloads/Iridium-2008-8-16-470.ashx">download page</a>.</p>

<h2>(Migrated Comments)</h2>

<h4>Joannes Vermorel, August 14, 2008</h4>

<p>Thanks for this good work. I have also noticed that you&#8217;ve posted the release on Sourceforge.Net. This is really a nice thing to do (so far, I have always found Sourceforge to be a top notch backup provider :-)</p>

<h4>Christoph Ruegg, August 14, 2008</h4>

<p>Thanks. Yes, indeed, the SourceForge mirror infrastructure is unbeatable :)</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Revised Interpolation Toolkit in Iridium]]></title>
    <link href="http://christoph.ruegg.name/blog/2008/5/10/revised-interpolation-toolkit-in-iridium.html"/>
    <updated>2008-05-10T17:31:00+02:00</updated>
    <id>http://christoph.ruegg.name/blog/2008/5/10/revised-interpolation-toolkit-in-iridium</id>
    <category term="Math.NET" />
    <category term="Math.NET Iridium" />
    
    <content type="html"><![CDATA[<p>The next release of Math.NT Iridium, Iteration 16, comes with a revised interpolation architecture and implementation.</p>

<p>Up to now, the interpolation classes have been a bit awkward to use, partially because of the SampleList collection class you had to use, and because of the design in general. It also provided only two interpolation algorithms, which are both somewhat outdated these days.</p>

<!--more-->


<p>The new implementation provides some newer more stable algorithms (like Floater and Hormann&#8217;s algorithm for pole-free rational interpolation) together with a cleaner design. Additionally, some of the algorithms can also provide the first and second derivative and in the case of splines even a definite integration. There is also a new facade/portal class that reduces building an interpolation to one simple method call. Usually you would just use this facade class to build/precompute an interpolation, but all the algorithms are also publicly available in the Algorithms-namespace, so if you know what you&#8217;re doing you can use them directly.</p>

<p>Sample code, which uses a pole-free rational barycentric interpolation (the default algorithm) with 5 given sample pairs (t, x(t)):</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>double[] t = new double[] { -2, -1, 0, 1, 2};
</span><span class='line'>double[] x = new double[] { 1, 2, -1, 0, 1};
</span><span class='line'>var method = Interpolation.Create(t, x);
</span><span class='line'>
</span><span class='line'>double res = method.Interpolate(0.5);</span></code></pre></td></tr></table></div></figure>


<p>Simple, isn&#8217;t it?</p>

<p>Unfortunately it was not possible to fit the new design into the old classes, so the new classes and interfaces replace the old classes completely. These old classes are still there for now and continue to work, but they&#8217;re marked as obsolete and we recommend strongly to upgrade your code base to the new architecture.</p>

<p>The new architecture has already been checked in to the <a href="http://www.mathdotnet.com/Repository.aspx">source repository</a>. If you&#8217;re interested, please have a look at it and provide feedback - it&#8217;s not released yet so we can still change it completely :).</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Iridium 2008 April Release (v2008.4.14.425)]]></title>
    <link href="http://christoph.ruegg.name/blog/2008/4/6/iridium-2008-april-release-v2008414425.html"/>
    <updated>2008-04-06T13:50:00+02:00</updated>
    <id>http://christoph.ruegg.name/blog/2008/4/6/iridium-2008-april-release-v2008414425</id>
    <category term="Math.NET" />
    <category term="Math.NET Iridium" />
    
    <content type="html"><![CDATA[<p>Iteration 14 of Math.NET Iridium (Numerics) is now available, as 2008 April Release with Version 2008.4.14.425. <strong>Grab it <a href="http://www.mathdotnet.com/downloads/Iridium-2008-4-14-425.ashx">here</a></strong>.</p>

<p>Sorry for the very short release cycle (just one week after iteration 12). The reason is that I won&#8217;t be able to work on Math.NET for the next three weeks and that some of the fixes and changes are important enough to not let you wait three weeks for no reason.</p>

<p>Please continue reporting issues and bugs you find, it&#8217;s very useful and helps making the whole project better. There&#8217;s also a big chance that the issue will actually be fixed: in the last few releases we always managed to fix all bugs we were aware of at that point. Thanks!</p>

<!--more-->


<p><strong>Team:</strong> Christoph R&uuml;egg, Joann&egrave;s Vermorel, Matthew Kitchin</p>

<h4>Summary</h4>

<ul>
<li>Bugs: All known 3 bugs have been fixed.</li>
<li>Better special function precision (Gamma, Beta, Erf, Distributions etc): now up to 12 - 14 digits.</li>
<li>New direct/real gamma function, new harmonic number function.</li>
<li>Interpolation: usability enhancements (better double-array support, less user code)</li>
</ul>


<h4>New Features</h4>

<ul>
<li>IRID-122: Core - New direct Gamma function (additional to GammaLn) with negative value support</li>
<li>IRID-123: Core - New Special Function: Harmonic Numbers</li>
</ul>


<h4>Ehancements</h4>

<ul>
<li>IRID-121: Core - Better numerical precision for Gamma function</li>
<li>IRID-125: Interpolation - Additional interpolation and sample list constructors for double arrays.</li>
<li>IRID-126: Interpolation - Better interpolation order access and defaults</li>
</ul>


<h4>Fixed Bugs</h4>

<ul>
<li>IRID-119: Interpolation - Polynomial Extrapolation in positive direction throws IndexOutOfRangeException</li>
<li>IRID-120: Linear Algebra - Infinite recursion</li>
<li>IRID-124: Linear Algebra - Matrix.CopyToArray - wrong indexer in inner loop condition.</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Iridium 2008 March Release (v2008.3.12.405)]]></title>
    <link href="http://christoph.ruegg.name/blog/2008/3/31/iridium-2008-march-release-v2008312405.html"/>
    <updated>2008-03-31T11:41:00+02:00</updated>
    <id>http://christoph.ruegg.name/blog/2008/3/31/iridium-2008-march-release-v2008312405</id>
    <category term="Math.NET" />
    <category term="Math.NET Iridium" />
    
    <content type="html"><![CDATA[<p>Iteration 12 of Math.NET Iridium (Numerics) is now available, as 2008 March Release with Version 2008.3.12.405. <strong>Grab it <a href="http://www.mathdotnet.com/downloads/Iridium-2008-3-12-405.ashx">here</a></strong>.</p>

<!--more-->


<p><strong>Team:</strong> Christoph R&uuml;egg, Joann&egrave;s Vermorel, Matthew Kitchin<br/>
<strong>New Contributions:</strong> Mike Shugai</p>

<h4>Summary</h4>

<ul>
<li><strong>Bugs:</strong> All known 3 bugs have been fixed.</li>
<li><strong>Linear Algera:</strong> Matrix Kronecker product, new Vector class, performance work (caching).</li>
<li><strong>Core:</strong> New Sinc function, Neper/Decibel ratio routines.</li>
<li><strong>Probability Distributions:</strong> New distributions: Student&#8217;s T-Distribution, F-Distribution, Skew-Alpha Stable Distribution.</li>
<li>Assemblies no longer signed with a certificate (because verification caused network access).</li>
</ul>


<h4>New Features</h4>

<ul>
<li>IRID-113: Core - Neper and Decibel Helper and Factors</li>
<li>IRID-116: Core - Sinc Function</li>
<li>IRID-111: Linear Algebra - Matrix Kronecker Tensor Product</li>
<li>RID-59: Linear Algebra - New Vector class (related to the Matrix class)</li>
<li>IRID-99: Probability Distributions - New Distribution: Skew Alpha Stable Distribution</li>
<li>IRID-108: Probability Distributions - New Distribution: F-Distribution</li>
<li>IRID-109: Probability Distributions - New Distribution: Student&#8217;s-T</li>
</ul>


<h4>Enhancements</h4>

<ul>
<li>IRID-106: Linear Algebra - Cache for on-demand computations (like decompositions)</li>
</ul>


<h4>Fixed Bug</h4>

<ul>
<li>IRID-107: Core - Complex: Unexpected power behavior at zero</li>
<li>IRID-97: Linear Algebra - Matrix.Identity allocation bug in non-square cases</li>
<li>IRID-98: Probability Distributions - ArbitraryDistribution NextInt32 does not consider offset.</li>
</ul>


<h4>Other</h4>

<ul>
<li>IRID-110: Remove Certificate Signing (cert validation causes network access)</li>
</ul>


<h2>(Migrated Comments)</h2>

<h4>Joannes Vermorel, March 31, 2008</h4>

<p>Nice to see that new people are joining Math.NET. If I get the time, I will try to submit some Erlang-related formulas and distribution for the next release :-)</p>

<h4>Christoph Ruegg, April 6, 2008</h4>

<p>Thanks, I&#8217;m looking forward to it! :)</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Iridium 2008 February Release (v2008.2.10.364)]]></title>
    <link href="http://christoph.ruegg.name/blog/2008/2/3/iridium-2008-february-release-v2008210364.html"/>
    <updated>2008-02-03T12:39:00+01:00</updated>
    <id>http://christoph.ruegg.name/blog/2008/2/3/iridium-2008-february-release-v2008210364</id>
    <category term="Math.NET" />
    <category term="Math.NET Iridium" />
    
    <content type="html"><![CDATA[<p>Iteration 10 of Math.NET Iridium (Numeric) is now available, as 2008 February Release with Version 2008.2.10.364. <strong>Grab it <a href="http://www.mathdotnet.com/downloads/Iridium-2008-2-10-364.ashx">here</a></strong>.</p>

<p>This is mostly a service release.</p>

<!--more-->


<ul>
<li><strong>Bugs:</strong> All known bugs have been fixed.</li>
<li><strong>Performance:</strong> The linear algebra implementation has ben optimized, resulting in <a href="http://christoph.ruegg.name/blog/2007/10/14/matrix-data-structure-optimization-nearly-50-perf-gain.html">nearly 50% perf gain</a>.</li>
<li><strong>Api References:</strong> Inline Xml Documentation has been improved (but is still far from where we&#8217;d like it to be).</li>
<li><strong>Security:</strong> The released binaries now have a strong name, are locked down with code access security and allow partial trusted callers. We now use test-signing internally. Also, the official assemblies are now signed with a certificate.</li>
<li><strong>Build/Release Integration:</strong> We now moved completely to custom msbuild targets, releases are now fully automated (incl. documentation generation) and the continuous integration system has been upgraded. Since releasing is now much easier, you can expect new releases more often.</li>
</ul>


<p>At the same time I also released a first version of Math.NET Neodym (Signal Processing), Iteration 2 what makes it Version 2008.2.2.364. Grab it <a href="http://www.mathdotnet.com/downloads/Neodym-2008-2-2-364.ashx">here</a>. Hopefully I&#8217;ll have more time in the future to work on Neodym&#8230;</p>

<h2>(Migrated Comments)</h2>

<h4>J.Henkel, February 6, 2008</h4>

<p>Excellent. Time to play a bit&#8230;</p>

<p>Out of curiosity, Chris, are you planning to release a build of Yttrium any time soon? I&#8217;ve been playing around with a copy from svn, and just reading through the test files makes me want to play with a compiling version (mine is old&#8230;keeps looking for MathNet.Numerics.Fn.PowInt).</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Matrix data structure optimization: Nearly 50% perf gain]]></title>
    <link href="http://christoph.ruegg.name/blog/2007/10/14/matrix-data-structure-optimization-nearly-50-perf-gain.html"/>
    <updated>2007-10-14T21:28:00+02:00</updated>
    <id>http://christoph.ruegg.name/blog/2007/10/14/matrix-data-structure-optimization-nearly-50-perf-gain</id>
    <category term="Math.NET" />
    <category term="Math.NET Iridium" />
    
    <content type="html"><![CDATA[<p>Some time ago I did some very basic <a href="http://christoph.ruegg.name/blog/2007/4/16/iridium-performance-analysis.html">performance analysis of solving linear equation systems</a> with Iridium. Since then we decided to rewrite the Matrix class to use jagged arrays instead of rectangular ones, see <a href="http://community.opensourcedotnet.info/forums/t/384.aspx">this forum discussion</a>. There already was a discussion about that issue some long time ago, but at that time we decided to go for the more clean and safe way of rectangular arrays. Unfortunately the C# compiler today still can&#8217;t optimize loops on rectangular arrays as good as loops on jagged arrays. So we finally moved forward to jagged arrays, and indeed, we got a performance improvement (solving a linear equation system) by nearly 50%.</p>

<!--more-->


<p>Unfortunately, the change of the data structure comes at a cost: The semantics of the two following members changes, as they now do deep-copies instead of using the data structure directly as internal data structure. Have a look at the mentioned forum discussion on why this might be an issue.</p>

<pre><code>public Matrix(double[,] A)
public static implicit operator double[,] (Matrix m)
</code></pre>

<p>If you want to avoid deep-copying, e.g. for performance reasons, then use double[][] instead of double[,] to fill the matrix.</p>

<p>The changes are already submitted to the <a href="http://www.mathdotnet.com/Repository.aspx">repository</a>, and will be included in the next <a href="http://www.mathdotnet.com/doc/Releases.ashx">iridium release</a>.</p>

<h2>(Migrated Comments)</h2>

<h4>Joannes Vermorel, October 15, 2007</h4>

<p>This is an excellent news. Thanks for upgrading the matrix to jagged array. The use of rectangular arrays was a (poor) choice of mine when I did initially port the JAMA package to Math.NET.</p>

<p>Among the other benefits that you gain with jagged arrays is the possibility to perform multi-thread operations on matrices (using BLAS).</p>

<p>Best regards, Joannes</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Encrypt local mail folders and the desktop search index]]></title>
    <link href="http://christoph.ruegg.name/blog/2007/8/25/encrypt-local-mail-folders-and-the-desktop-search-index.html"/>
    <updated>2007-08-25T20:10:00+02:00</updated>
    <id>http://christoph.ruegg.name/blog/2007/8/25/encrypt-local-mail-folders-and-the-desktop-search-index</id>
    <category term="Security" />
    
    <content type="html"><![CDATA[<p>I&#8217;m using <a href="http://www.truecrypt.org/">TrueCrypt</a> to encrypt all kind of files on my computer (and on all external harddrives and most USB thumbdrives) for quite some time now and I really like it. On the PC I have a set of data I want to have encrypted but that I always need, so I&#8217;ve configured it to automatically mount it as <strong>X:\ </strong> after login (that is, I just have to enter my password, hit enter and forget about it). One of this kind of data is the web browser bookmarks, history and cache. Luckily Firefox allows you to store the profile wherever you want, so I simply store it on that virtual drive <strong>X</strong> mentioned before.</p>

<p>Desktop Search is very useful (be it Google Desktop or Microsoft Desktop Search or whatever). However, what&#8217;s the point of encrypting files when they&#8217;re full-text indexed to some unencrypted location anyway. Hence I want this index to be stored on that encrypted drive <strong>X</strong> too.</p>

<!--more-->


<ul>
<li><strong>Issue:</strong> The Windows Desktop Search service starts before I can enter the password and mount <strong>X</strong>. If it can&#8217;t access the index, it automatically starts generating a new index at the default (and unencrypted) location. So I need some way of delaying the service start.</li>
</ul>


<p>In addition I also want my local mail folder encrypted on <strong>X</strong>. Similar to Firefox, the Mozilla email client Thunderbird allows my to store the profile wherever I want, so I chose to store it directly on <strong>X</strong>. However, I also use Outlook 2007, and there it&#8217;s not that simple:</p>

<ul>
<li><strong>Issue:</strong> While Outlook 2007 allows you to store the main post folder wherever you want, for some yet unknown reason you can&#8217;t move IMAP or Exchange data folders.</li>
</ul>


<p>Luckily there is a workaround for both issues, although it&#8217;s not trivial and involves some coding. Here&#8217;s what I did, let me know if you have a better solution (the NTFS built-in encryption is no option for me), or if you&#8217;re interested in the code:</p>

<ul>
<li>Change the Search Index Service to start manually (instead of automatically)</li>
<li>Write a small windows service that starts automatically and checks all few seconds whether <strong>X</strong> is mounted. If it is, it stops further checking and starts the Search Index Service.</li>
<li>Move all files from the outlook data folder (something like Documents and Settings..\Local Settings\Application Data\Microsoft\Office) to a new folder in <strong>X</strong> and delete this Outlook folder</li>
<li>Have the small windows service (from point 2) create an <a href="http://www.codeproject.com/cs/files/JunctionPointsNet.asp">NTFS Junction</a> (aka symlink) in place of the just deleted Outlook folder, and point it to the new directory in <strong>X</strong>, as soon as <strong>X</strong> is mounted; and remove this junction again once the service stops. This makes Outlook 2007 think the files are still where it expects it, but in reality they&#8217;re on <strong>X</strong> instead.</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Math.NET Iridium on Linux and Mono]]></title>
    <link href="http://christoph.ruegg.name/blog/2007/8/19/mathnet-iridium-on-linux-and-mono.html"/>
    <updated>2007-08-19T18:43:00+02:00</updated>
    <id>http://christoph.ruegg.name/blog/2007/8/19/mathnet-iridium-on-linux-and-mono</id>
    <category term="Linux" />
    <category term="Math.NET" />
    <category term="Math.NET Iridium" />
    
    <content type="html"><![CDATA[<p>One of the principles of the Math.NET Iridium numerics library is to not depend on special hardware or some external super-optimized library, but just on the core .Net Framework 2.0. Thanks to this, Iridium also works with Mono on Linux with no special treatment - just add a reference to the assembly in MonoDevelop and start coding&#8230;</p>

<p>Apparently some users are not used to reference .Net assemblies on Mono yet, so I&#8217;ve created some screen shots showing how to add such a reference step-by-step in MonoDevelop on Ubuntu Linux:</p>

<p><strong><a href="http://www.mathdotnet.com/doc/MathNetOnMonoLinux.ashx">Math.NET on Mono (and Linux)</a></strong></p>
]]></content>
  </entry>
  
</feed>

