Updates
Wednesday
Jul212010

Content-Based Storage in the Cloud

One derivative of the NoSQL movement that rediscovers non-relational storage approaches lately is a content-based value store. Such a store is similar to a Key-Value store but uses a cryptographic hash of the value as key.

An SHA-1 hash of the value is good enough to identify it

The SHA-1 hash function is unique, meaning that for every value there’s exactly one key that can be computed using SHA-1, hence value implies key. We can always compute the unique key of a value.

The probability of an SHA-1 hash collision is extremely low. The most cited numbers show that you’d need 1024 values in order to cause a 50% chance of a collision. Even with a whopping 1018 distinct values the likelihood of at least one collision is already down at 10-9. Hence, a key refers to a single value with extremely high probability. While the SHA1 function is not strictly injective, it is approximatively injective enogh for almost all practical applications.

Note that this is different from common non-distributed Hash Tables where a very short hash function is used to directly jump fast to an inner data structure (bucket) containing all items sharing the same hash. The motivation for hashing in such hash tables is to be able to directly compute the position where an item is stored, avoiding long linear or binary searches.

Verifiable Consistency

A nice side effect of using SHA1 is that given the key, the value retrieved from the storage can be verified (detect data corruption or tampering) simply by recomputing its SHA-1 hash and comparing it to the key. You can even digitally sign a key and by that implicitly sign its value and all those referred by it.

Keys are uniformly distributed

When using the common hex string format, the 160bit long keys are always 40 characters long and look something like this:

d921970aadf03b3cf0e71becdaab3147ba71cdef

We can safely treat them as if their characters were distributed uniformly (0-9, a-f). This brings some advantages especially when used in a distributed or cloud-like scenario, as simple prefix ranges (like 0-3, 4-7, 8-b, c-f) can be used for partitioning and distributed processing.

On the other hand this means that you can’t use other indexed keys or ordering out of the box without further logic or storage on top of it.

The value of a key is fixed and can’t be changed

The value associated with a key can never change. If you store an updated value, you’ll get a new key for it and update the reference to this new key. This has severe consequences on where this storage scheme can be used efficiently. For example, a typical relational data model with cyclic relations wouldn’t fit at all to such a content-based data store.

However, in practice in a cloud-like application this is often not that much of an issue. Even more so as soon as you realize that the existence of read-only stale yet still consistent data is not an issue either (see CQRS).

Again this fits very well with distributed and cloud computing, as it becomes trivial to aggressively cache values locally. If a value is found in the local cache it is guaranteed to be up to date (since values can’t change), so you don’t even have to check for timestamps or whether it has been changed remotely. Since in Azure the instances come with a lot of local storage, a simple MRU cache for a few GB can save you a lot of downloads and roundtrips if you use only a relatively small number of instances or have managed to create some weak affinity between jobs and Azure instances.

Example: Large Queue Messages

Azure Queues have content size limitations, that’s why Lokad.Cloud implements logic to let messages transparently overflow to blob storage. To do that it needs a way to store a value in a blob that it can retrieve later by some identifier. This identifier is then packed to the actual message. There’s no need to access it in any other way, so it’s a perfect candidate for a content-based value store.

In my experience, in real life the probability that a message is processed on the same worker that originally put it there is often high or at least not negligible. In all these cases, a cached content-based value store would save you from having to download these blobs completely, but still work correctly otherwise.

Implicit Value-Deduplication

Since the same value leads to the same key, trying to store the same value twice means you get the same storage location and the value gets stored only once. The second trial can even be aborted early by provoking a precondition violation, or skipped completely if it is already in the local cache (depending on the deletion plan).

Example: Daily Backup Snapshots

I recently wrote a small service that periodically takes full snapshots of all tables and blobs of a set of Azure storage accounts to a separate account, keeps the last N snapshots each and removes the rest. Often only a small subset of blobs or table entities actually change in a day. Had I used content-based storage, I could have saved a lot of storage (and thus cost) by deduplication without having to implement complicated incremental or differential backups. Taking a snapshot would likely also have taken less time thanks to some saved uploads.

Trivial Distribution and Replication

Other than any classical relational databases and key value stores, replication and distribution of data in such a content-based value store is trivial since there can’t be any conflicts. This is why the caching mentioned above works so well. Replication simply means to copy the values of all missing keys over to the target. A consequence of this is that for some scenarios there’s no technical need for a single master database. A peer can synchronize with any other peer, resulting in full peer to peer support. Distributed hash tables (DHT) as used by most file sharing solutions including BitTorrent work similarly and turn out to be very efficient.

History Consistency and Versioning

Since values can’t change, they remain consistent with each other even when they become stale. That’s why this approach is used by most of the popular distributed version control systems like Git and Mercurial as well.

The Git object model is nicely described in the git community book (the following two images are taken from there). In essence, all objects are stored just as described here. In addition to data blobs (i.e. source code files) there are also tree objects representing a folder simply by listing all the SHA-1 keys of its child elements, again stored by its hash:

If a file changes in git in a new revision, it will get a new hash. The folder/tree containing it will update that hash in its list, and in turn will itself get a new hash. Both the old a new version are therefore still available completely and consistently simply by referring to the hash of the respective version of the tree.

Historical consistency can be useful for all kind of applications. Note that this approach persists snapshots of values and content, not how they are changed. This is thus a dual counterpart to concepts like event sourcing where only the actions causing changes of the values are persisted but not the actual values.

Append-Only Storage or Value Scavenging

Unless you need an append-only storage, you need to be careful about deleting values in such a system. Since there is implicit deduplication, you can’t just delete what you’ve just inserted since the same value could also be used in other places. There are several approaches how you can attack this, depending on your scenario:

  • Garbage Collection: If there is a hierarchy where all values are referenced by another value, you can follow the tree from time to time and then remove all values you haven’t seen. This is used by all the distributed version control systems. Be careful about race conditions though.
  • Reference Tracking: Use metadata to list all keys or items referring a value. If you remove the last reference, remove it. This can be combined with garbage collection. You can also use reference counters, but they are difficult to handle correctly in an unreliable world like a cloud environment where instantanous VM shutdowns without prior notice are to be expected.
  • Time-Based: You “touch” a value (update a timestamp in the metadata) whenever it is used, and from time to time remove all items that haven’t been used for a while. Note that that causes a lot of round trips (although they could be performed asynchronously in the background).
  • Limited Lifetime: Sometimes its good enough to just define that a value can safely be removed after a day or a month.

Where to get it?

Lokad.Cloud v1 doesn’t support this approach out of the box yet, altough implementing it on top of it should be rather simple. Nothing has been decided, but it’s not unlikely that the cloud storage library of its successor will provide some infrastructure to simplify it further. Note that this approach will not replace any existing storage provider, it is merely a new alternative that works better in some selected scenarios.

Tuesday
Jul202010

How to create 2048bit Certificate CSRs for Dell's iDRAC6

In case you happen to manage a recent DELL server with a dedicated iDRAC remote management card and you’d like to secure it by using your own certificate, you’ll have to request a certificate based on a CSR request created directly in the iDRAC web interface.

Unfortunately these CSRs have only 1024 bit keys, which get refused by some public certificate athorities like StartCom (for security reasons they require at least 2048 bits). You can’t choose the bit length in the iDRAC web interface, but luckily there is another way to make it generate 2048 or 4096 bit long keys for the CSR using racadm from Dell’s System Management Tools:

View the current configuration (all on 1 line):

racadm.exe -r [iDRAC IP] -u [user] -p [password]
getconfig -g cfgRacSecurity

Change the key length to 2048 bits (all on 1 line):

racadm.exe -r [iDRAC IP] -u [user] -p [password]
config -g cfgRacSecurity -o cfgRacSecCsrKeySize 2048
Wednesday
May052010

git howto: revert a commit already pushed to a remote repository

So you’ve just pushed your local branch to a remote branch, but then realized that one of the commits should not be there, or that there was some typo in it. No problem, you can fix it. But you should do it rather fast before anyone fetches the bad commits, or you won’t be very popular with them for a while ;)

Case 1: Revert the last commit

This is the easiest case. Let’s say we have a remote mathnet with branch master that currently points to commit dd61ab32. We want to remove the top commit. Translated to git terminology, we want to force the master branch of the mathnet remote repository to the parent of dd61ab32:

$ git push mathnet +dd61ab32^:master

Where git interprets “x^” as the parent of x and “+” as a forced non-fastforward push. If you have the master branch checked out locally, you can also do it in two simpler steps: First reset the branch to the parent of the current commit, then force-push it to the remote.

$ git reset HEAD^ --hard
$ git push mathnet -f

Case 2: Revert the second last commit

Let’s say the bad commit dd61ab32 is not the top commit, but a slightly older one, e.g. the second last one. We want to remove it, but keep all commits that followed it. In other words, we want to rewrite the history and force the result back to mathnet/master. The easiest way to rewrite history is to do an interactive rebase down to the parent of the offending commit:

$ git rebase -i dd61ab32^

This will open an editor and show a list of all commits since the commit we want to get rid of:

pick dd61ab32
pick dsadhj278
...

Simply remove the line with the offending commit, likely that will be the first line (vi:, delete current line = “dd”). Save and close the editor (vi: press “:wq” and return). Resolve any conflicts if there are any, and your local branch should be fixed. Force it to the remote and you’re done:

$ git push mathnet -f

Case 3: Fix a typo in one of the commits

This works almost exactly the same way as case 2, but instead of removing the line with the bad commit, simply replace its “pick” with “edit” and save/exit. Rebase will then stop at that commit, put it the changes into the index and then let you change it as you like. Commit the change and continue the rebase (git will tell you how to keep the commit message and author if you want). Then push the changes as described above. The same way you can even split commits into smaller ones, or merge commits together.

Monday
Apr262010

Lost in Math.NET Codenames?

Math.NET Numerics? Iridium? dnAnalytics? Yttrium? Huh? …sounds familiar?

It looks like some of you got lost in all the Math.NET subprojects and codenames. Math.NET evolved over time, with projects splitting into separate new projects, the introduction of codenames and new projects replacing older ones with a slightly different focus and approach. Unfortunately this lead to a mess (sorry for that!), so I’m trying to throw light on it by the following small chart, depicting the Math.NET Project history:

It all started with MathLib which was a very verbose object oriented computer algebra approach, including all kind of numeric routines to back the symbolics, including basic linear algebra. At the same time dnAnalytics was founded independently and unrelated to Math.NET, focusing entirely on numerics and statistics, leveraging highly optimized native libraries for better performance.

Soon it became obvious that it would make sense to refactor out the numerical aspects of MathLib to a separate project and to develop it independently, so Numerics was born, as well as several other non-numeric subprojects. Numerics became Iridium, and in 2009 Iridium and dnAnalytics finally decided to join forces and work together on the new Math.NET Numerics project, replacing both Iridium and dnAnalytics and entirely unrelated to the early Numerics 0.1-0.4 back in 2004.

Mostly thanks to Marcus and Jurgen, Math.NET Numerics is very well alive and active. Check out our source code repository and forums.

Wednesday
Dec232009

Connect from Azure to an SQL Server Named Instance

In some situations you can’t or don’t want to move all your data completely to the cloud. Be it to connect to your existing infrastructure, a company policy, to remain multi-tenant or simply when migrating slowly step by step. Common to these cases is often the requirement to synchronize with or connect from Azure to some local or offsite SQL Server database. For synchronization you may want to try the Microsoft Sync Framework. This post is about the other option: connecting to an external named SQL Server instance.

Connecting to Named SQL Server Instances

In addition to its own storage options like SQL Azure and Azure Table Storage, Azure also allows you to connect to external SQL Servers over TCP/IP. However, there’s a pitfall right now when using named SQL Server instances:

System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: SQL Network Interfaces, error: 26 - Error Locating Server/Instance Specified) at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection) at … (source)

Provided your connection string is correct, this is likely to be an issue with how SQL server finds your named instance.

Instance Resolution using SQL Server Browser

Since SQL Server 2005, the SQL Server Browser service is responsible for enumerating available instances on a machine, and to resolve instance names to the actual named pipe or TCP port (for SQL Server 2000 it was the SQL Server Resolution Protocol).

In order to resolve the TCP port of a named instance, the client sends an UDP datagram to port 1434, to which the server browser replies with another datagram listing the instance endpoint to which the client then connects to. Thanks to this mechanism it is no longer required to have the server listen on the standard SQL server TCP port 1433, so it can fully support multiple (named) instances. In fact, the default for named instances is to use a dynamic random TCP port.

Azure vs. SQL Server Browser

When connecting from Azure this resolution mechanism fails, simply because the UDP datagrams never reach their target (this may change in the future). So there’s no way the client can find the actual probably random TCP port to connect to, and will throw the SqlException cited above.

Solution

To work around this issue, you can configure your named instance to listen on a static TCP port instead of randomly selecting a new dynamic one on every restart (related kb). You can then specify this static port directly in the connection string in your Azure worker role:

Data Source={domain/ip},{port};Network Library=DBMSSOCN;
Initial Catalog={dbname};User ID={user};Password={pw}

Note that in this case there’s no need to specify the name of the instance in the connection string. The network library parameter tells the client to use TCP/IP instead of e.g. Named Pipes.