Using The Guid.Comb Identifier Strategy

12 commentsWritten on May 21st, 2009 by
Categories: NHibernate, Performance

As you may have read by now, it's a good idea to avoid identity-style identifier strategies with ORM's. One of the better alternatives that i kinda like is the guid.comb strategy. Using regular guids as a primary key value leads to fragmented indexes (due to the randomness of the guid's value) which leads to bad performance. This is a problem that the guid.comb strategy can solve quite easily for you.

If you want to learn how the guid.comb strategy really works, be sure to check out Jimmy Nilsson's article on it. Basically, this strategy generates sequential guids which solves the fragmented index issue. You can generate these sequential guids in your database, but the downside of that is that your ORM would still need to insert each record seperately and fetch the generated primary key value each time. NHibernate includes the guid.comb strategy which will generate the sequential guids before actually inserting the records in your database.

This obviously has some great benefits:

  • you don't have to hit the database immediately whenever a record needs to be inserted
  • you don't need to retrieve a generated primary key value when a record was inserted
  • you can batch your insert statements

Let's see how we can use this with NHibernate. First of all, you need to map the identifier of your entity like this:

    <id name="Id" column="Id" type="guid" >
      <generator class="guid.comb" />
    </id>

And that's actually all you have to do. You don't have to assign the primary key values or anything like that. You don't need to worry about them at all.

Take a look at the following test:

        [Test]
        public void InsertsAreOnlyExecutedAtTransactionCommit()
        {
            var insertCountBefore = sessionFactory.Statistics.EntityInsertCount;
 
            using (var session = sessionFactory.OpenSession())
            using (var transaction = session.BeginTransaction())
            {
                for (int i = 0; i < 50; i++)
                {
                    var category = new ProductCategory(string.Format("category {0}", i + 1));
                    // at this point, the entity doesn't have an ID value yet
                    Assert.AreEqual(Guid.Empty, category.Id);
                    session.Save(category);
                    // now the entity has an ID value, but we still haven't hit the database yet
                    Assert.AreNotEqual(Guid.Empty, category.Id);
                }
 
                // just verifying that we haven't hit the database yet to insert the new categories
                Assert.AreEqual(insertCountBefore, sessionFactory.Statistics.EntityInsertCount);
                transaction.Commit();
                // only now have the recors been inserted
                Assert.AreEqual(insertCountBefore + 50, sessionFactory.Statistics.EntityInsertCount);
            }
        }

Interesting, no? The entities have an ID value after they have been 'saved' by NHibernate. But they haven't actually been saved to the database yet though. NHibernate always tries to wait as long as possible to hit the database, and in this case it only needs to hit the database when the transaction is committed. If you've enabled batching of DML statements, you could severly reduce the number of times you need to hit the database in this scenario.

And in case you're wondering, the generated guids look like this:

81cdb935-d371-4285-9dcb-9bdb0122f25f
a44baf99-58e9-4ad7-9a59-9bdb0122f25f
a88300c2-6d64-4ae3-a55b-9bdb0122f25f
032c7884-da2f-4568-b505-9bdb0122f25f
....
70d7713c-b38d-4341-953d-9bdb0122f25f

Notice the last part of the guids... this is what prevents the index fragmentation.

Obviously, this particular test is not a realistic scenario but i'm sure you understand how much of an improvement this identifier strategy could provide throughout an entire application. The only downside (IMO) is that guid's aren't really human readable so if that is important to you, you should probably look into other identifier strategies. The HiLo strategy would be particularly interesting in that case, but we'll cover that in a later post ;)

  • Em

    Hello,

    Thanks for this post. However, it might not be completely precise from the database point of view.
    We recently ran some tests to try and validate the guid.comb generator as identifier field. SQL Server 2005 was the back end storage. It works pretty well, provided the triggered inserts statements targeting the same table are separated by at least 3.333… milliseconds period.

    As Nillson states “A DATETIME has the “precision” of 1/300th of a second”. Thus, massive inserts (as in your test), leads to a higher fragmentation than the “standard” identity generator, because of the similar end of the 6 last bytes. However, this higher index fragmetation is, of course, far from what one would encounter by relying on plain Guids.

    What we finally stated was that, for most standard purposes, very little number of Session.Save() calls would be performed on the same kind of entity within this 1/300th of a second. For the more intensive inserts processes, index defragmentation job would have to be run more often.

    I may still have the results of our tests that I could share with you, if this would be of any interest to you.

    Em.

    BTW : I’m not a database guy, but have been compelled to learn and understand some of the indexing SQL Server behavior in order have proper arguments to oppose to the standard “Use the identity. It’s fast. It’s safe. It’s recommended by Microsoft” DBA answer and eventually get the DBA accept the proposal.

  • http://randomcode.net.nz Neal Blomfield

    How do you address the issue of allowing the database / orm to control object identity?

    I was using the guid.comb approach when building my first NH based app but the problem I found was that implementing things such as object equality and hashcodes get very complex very fast when you cannot rely on the oid (as this is not assigned until the object is saved). I use numbered versioning now and assign guid based ids at object creation as this greatly simplifies aspects such as equality and hashcode generation.

    See http://www.onjava.com/pub/a/onjava/2006/09/13/dont-let-hibernate-steal-your-identity.html for an interesting discussion on why.

  • http://davybrion.com Davy Brion

    @Em

    i’d love to see the results of those tests :)

    and the purpose of the test in this post was more to show that the insert statements can be batched (which is an important benefit of the Unit Of Work pattern), while avoiding unnecessary roundtrips and still getting good primary key values that cause less index fragmentation than plain guids. For typical usage scenario’s, guid.comb will be great IMO.

    For tables that routinely undergo intensive insert processes, i’d probably go with HILO which i think avoids all of the problems you mention, and also avoids the identity-style roundtrips.

  • http://davybrion.com Davy Brion
  • Pingback: Reflective Perspective - Chris Alcock » The Morning Brew #353

  • Em

    >i’d love to see the results of those tests
    I’ve got a folder containing VS project used to generate the results + XLS file containing aggregated results. Could you send me through PM an email address or network storage I coup upload the archive to ?

    >For typical usage scenario’s, guid.comb will be great IMO.
    I agree with you. However, I think that index related subtleties should also be examined. But, you’re right, this is a rather advanced topic, definitely out the scope of your article.

    >For tables that routinely undergo intensive insert processes, i’d probably go with HILO
    Once again, I agree. Our DBA, however, was not feeling at ease with the generated table contained seeding primary key information. He was afraid that someone would/could (un)intentionally tamper with it and cause unexpected duplicate keys exceptions.

  • http://davybrion.com Davy Brion

    @Em

    just mail it to ralinx at davybrion.com, thx :)

  • http://mrpmorris.blogspot.com Peter Morris

    I don’t like having the ID as a property on my class so I usually leave it out. I have recently had a scenario where I need to know the GUID before the object was saved to the DB. Is there a way in NH to set the ID manually?

  • http://davybrion.com Davy Brion

    @Peter

    just use assigned id’s, as mentioned in the docs

  • BV

    Great article.

    I’d like to learn more about this HILO things as well. Could anyone here point me in the right direction?

  • Aaron Palmer

    My team recently hashed over the differences between identities vs. guid.comb and hilo. We ended up settling on the hilo strategy because several of our IDs find their way into a url for one reason or another. It’s just as simple to use as guid.comb. Ayende has done a lot to support hilo as well. Check this post of his http://ayende.com/Blog/category/482.aspx/ , looks like he uses hilo for all of his IDs.

  • Pingback: Wireless Microphones