The Inquisitive Coder – Davy Brion's Blog

Trying to walk that thin line between intelligence and ignorance

Archive for the 'Performance' Category

Using Copy-On-Write In Multithreaded Code To Reduce Locking Overhead

Posted by Davy Brion on 8th March 2010

I recently posted some code that i asked you to review.  When i posted it, the code had never even executed (that’s right, not even through a test) and i only thought it would do what i needed it to do.  I consider the actual implementation non-obvious (at least for those who don’t know the copy-on-write approach to avoid traditional locking) so i just wanted to hear some reactions to the code from people who didn’t knew the context.  I promised to do a follow-up post to discuss the code in its entirety so here it is.

First, i’ll show the whole class again:

    public class TenantSessionFactoryManager : ITenantSessionFactoryManager

    {

        private readonly ITenantContext tenantContext;

        private readonly ITenantInfoHolder tenantInfoHolder;

        private readonly string mappingAssemblyName;

 

        private readonly object writeLock = new object();

        private Dictionary<Guid, ISessionFactory> sessionFactories;

 

        public TenantSessionFactoryManager(ITenantContext tenantContext, ITenantInfoHolder tenantInfoHolder, string mappingAssemblyName)

        {

            this.tenantContext = tenantContext;

            this.tenantInfoHolder = tenantInfoHolder;

            this.mappingAssemblyName = mappingAssemblyName;

            sessionFactories = new Dictionary<Guid, ISessionFactory>();

        }

 

        public ISession CreateSessionForCurrentTenant()

        {

            var tenantId = tenantContext.CurrentTenantId;

 

            if (!sessionFactories.ContainsKey(tenantId))

            {

                CreateSessionFactoryForCurrentTenant();

            }

 

            return sessionFactories[tenantId].OpenSession();

        }

 

        private void CreateSessionFactoryForCurrentTenant()

        {

            lock (writeLock)

            {

                var tenantId = tenantContext.CurrentTenantId;

 

                if (!sessionFactories.ContainsKey(tenantId))

                {

                    var connectionString = tenantInfoHolder.GetDatabaseConnectionString(tenantId);

 

                    var sessionFactory = new Configuration()

                        .Configure()

                        .AddProperties(new Dictionary<string, string>

                                {

                                    { "connection.connection_string", connectionString },

                                    { "cache.region_prefix", "Tenant_" + tenantId }

                                })

                        .AddAssembly(mappingAssemblyName)

                        .BuildSessionFactory();

 

                    var newDictionary = new Dictionary<Guid, ISessionFactory>(sessionFactories);

                    newDictionary[tenantId] = sessionFactory;

                    sessionFactories = newDictionary;

                }

            }

        }

 

        public void RemoveSessionFactoryForTenant(Guid tenantId)

        {

            if (!sessionFactories.ContainsKey(tenantId))

            {

                return;

            }

 

            lock (writeLock)

            {

                if (!sessionFactories.ContainsKey(tenantId))

                {

                    return;

                }

 

                var sessionFactory = sessionFactories[tenantId];

                var newDictionary = new Dictionary<Guid, ISessionFactory>(sessionFactories);

                newDictionary.Remove(tenantId);

                sessionFactories = newDictionary;

 

                sessionFactory.Dispose();

            }

        }

    }

 

Basically, the purpose of this class is to hold a set of ISessionFactory instances, each of which belongs to a particular tenant in a multi-tenant application.  Tenants can be added on the fly (without restarting the application) and when an ISessionFactory doesn’t exist yet for a particular tenant, it must be created when the first request for an ISession for that tenant comes in.  Obviously, access to the sessionFactories dictionary must be thread-safe since multiple threads will be reading from the dictionary as well as occasionally writing to it.

I considered 3 options to make sure access to the dictionary would be thread-safe:

  1. Traditional locking (through the lock statement or the Monitor class)
  2. Using the ReadWriterLockSlim class
  3. Using the copy-on-write pattern

Traditional locking was quickly scratched from the list because that would require me to lock for every read of the dictionary as well as every write.  Now, pretty much every single request requires an NHibernate session which means that pretty much every single request results in a lookup in the sessionFactories dictionary.  If i need to lock for every read, this significantly hurts overall throughput of the system. 

The ReadWriterLockSlim might be a good solution here… after all, the short description of this class in MSDN says this:

Represents a lock that is used to manage access to a resource, allowing multiple threads for reading or exclusive access for writing.

Sounds like what i need, right?  But the thing is, i’ve never used the ReadWriterLockSlim class before and it hasn’t really gained my trust yet.  I know that’s a terrible excuse for not using it, but here me out.  While the ReadWriterLockSlim likely reduces locking overhead over traditional locking substantially, there still has to be some overhead for read operations, even if it is small.  In most situations, that small overhead wouldn’t bother me but in this case, that little overhead would be added to pretty much every single request in the system.  Now, writing to a dictionary implies that a new tenant has been added to the system.  In the context of this system, that’s not even gonna happen on a daily basis.  Hell, once a week is probably a best-case estimation and even that is highly optimistic.  So i really don’t want any kind of overhead on read operations when the write operation is only going to happen very occasionally.

That leaves the copy-on-write pattern.  I’ve used it before with success (though at the time, i didn’t know it was a known pattern) so this approach has already gained my trust.  It basically implies that we don’t do any locking on the read operations, but whenever a write operation occurs we copy the original set of objects, perform the write on the newly copied set and then set the reference of the original set to the newly created and modified instance.  During this whole time, every single read is safe.  Successive reads within the same logical operation however aren’t, so the following code would not be thread-safe:

            if (sessionFactories.ContainsKey(tenantId))

            {

                return sessionFactories[tenantId].OpenSession();

            }

 

Because there’s no locking on the reads, the code within the if-block could fail because the sessionFactories reference could be pointing to a new dictionary which no longer contains the element for that key. 

Of course, if you have frequent writes, the overhead of copying the set of objects every time you need to add/remove one might be bigger than you want, so this isn’t a pattern that you should use whenever you need to protect access to a shared resource. For this situation however, i think it’s ideal… though i’d obviously like to hear about better solutions :)

Now, let’s take a closer look at the pieces of code that perform the write operations.  First, adding a new ISessionFactory to the dictionary:

        private void CreateSessionFactoryForCurrentTenant()

        {

            lock (writeLock)

            {

                var tenantId = tenantContext.CurrentTenantId;

 

                if (!sessionFactories.ContainsKey(tenantId))

                {

                    var connectionString = tenantInfoHolder.GetDatabaseConnectionString(tenantId);

 

                    var sessionFactory = new Configuration()

                        .Configure()

                        .AddProperties(new Dictionary<string, string>

                                {

                                    { "connection.connection_string", connectionString },

                                    { "cache.region_prefix", "Tenant_" + tenantId }

                                })

                        .AddAssembly(mappingAssemblyName)

                        .BuildSessionFactory();

 

                    var newDictionary = new Dictionary<Guid, ISessionFactory>(sessionFactories);

                    newDictionary[tenantId] = sessionFactory;

                    sessionFactories = newDictionary;

                }

            }

        }

 

As you can see, the entire operation is put between a lock on the writeLock object instance.  The downside of this is that creating an ISessionFactory instance is an expensive operation, which means the lock will be held for a long time (could easily be one or more seconds).  Then again, i don’t anticipate this happening frequently so it’s not that big of an issue… especially since reads aren’t being blocked by this anyway.  This approach also prevents the creation of 2 ISessionFactory instances for the same tenant.  Well, unless i missed a bug here :p

Now, once the ISessionFactory instance is created, we create a new Dictionary based on the contents of the old one and then we add the new ISessionFactory instance to it.  After that, we replace the sessionFactories references with the new dictionary and from that point on, every read will use the new dictionary instance.  During this entire operation, no read operation was impacted negatively. 

Now lets take a look at the other write operation, removing an ISessionFactory instance from the dictionary:

        public void RemoveSessionFactoryForTenant(Guid tenantId)

        {

            if (!sessionFactories.ContainsKey(tenantId))

            {

                return;

            }

 

            lock (writeLock)

            {

                if (!sessionFactories.ContainsKey(tenantId))

                {

                    return;

                }

 

                var sessionFactory = sessionFactories[tenantId];

                var newDictionary = new Dictionary<Guid, ISessionFactory>(sessionFactories);

                newDictionary.Remove(tenantId);

                sessionFactories = newDictionary;

 

                sessionFactory.Dispose();

            }

 

The first if-check, which happens outside of the lock is a bug that i missed but that was pointed out in the comments of the original post.  If CreateSessionFactoryForCurrentTenant and RemoveSessionFactoryForTenant would execute concurrently for the same tenant, it’s possible that the ISessionFactory instance of that tenant is never removed from the dictionary (and also never disposed of…) since the check happens outside of the lock and could be executed before the ISessionFactory of the tenant was added to the dictionary.  In that case, the ISessionFactory instance would stay in the dictionary as long as the application stays up.  This is definitely a race condition that you want to avoid in every other situation though in this case, the odds that we’re simultaneously adding and removing the same tenant are slim to none.  Nevertheless, i don’t want to be accused of promoting race conditions so we’ll make the change anyway :)

        public void RemoveSessionFactoryForTenant(Guid tenantId)

        {

            lock (writeLock)

            {

                if (!sessionFactories.ContainsKey(tenantId))

                {

                    return;

                }

 

                var sessionFactory = sessionFactories[tenantId];

                var newDictionary = new Dictionary<Guid, ISessionFactory>(sessionFactories);

                newDictionary.Remove(tenantId);

                sessionFactories = newDictionary;

 

                sessionFactory.Dispose();

            }

        }

 

Now, as you can see we once again create a new dictionary based on the previous one, then remove the ISessionFactory instance for the current tenant and then we overwrite the sessionFactories instance once again.

Finally, there’s the read operation that i specifically didn’t want suffering from locking overhead:

        public ISession CreateSessionForCurrentTenant()

        {

            var tenantId = tenantContext.CurrentTenantId;

 

            if (!sessionFactories.ContainsKey(tenantId))

            {

                CreateSessionFactoryForCurrentTenant();

            }

 

            return sessionFactories[tenantId].OpenSession();

        }

 

The only time this code will block is when a new ISessionFactory for the current tenant needs to be created.  Luckily, that only happens once for each tenant.  As i mentioned earlier in the post, using this pattern doesn’t guarantee that successive reads within the same logical operation are thread safe, so there is a bug in here.  If a tenant already has an ISessionFactory instance, it’s possible that the RemoveSessionFactoryForTenant method has been executed between the if-check and accessing the ISessionFactory based on the tenantId.  In that particular scenario, the ISessionFactory instance is no longer in the dictionary which will cause this code to throw an exception.

That’s a bug that i don’t feel like fixing though… Once a tenant has been removed, they are no longer a paying customer.  If they are no longer paying for the software, there is no reason whatsoever why i should care about any possible exceptions they could get while running the software :)

Seriously though, if the RemoveSessionFactoryForTenant method is called, users of that tenant won’t even have access to the system anymore so it’s really a non-issue.

Anyways, i think i’ve covered the implementation in more detail than you probably cared for.  So, any thoughts? Are there still issues that i haven’t thought of? Is there another approach that you would use for this specific scenario?

Posted in Multithreading, Performance | 5 Comments »

Virtual Method Performance Penalty

Posted by Davy Brion on 10th January 2010

You often hear/read that one of the reasons why C# methods aren’t virtual by default is because of performance.  Calling a virtual method is more expensive than calling a regular instance method, because the CLR has to determine the correct override to call at runtime, instead of being able to simply call the instance method directly.  Another reason why virtual methods are more expensive to call is because they can never be inlined.

Is it really that much more expensive though? I ran a little experiment and i’d like to share the results with you.

Suppose you have the following 2 classes:

    public class MyClass

    {

        protected long someLong;

 

        public void IncreaseLong()

        {

            someLong++;

        }

 

        public virtual void VirtualIncreaseLong()

        {

            someLong++;

        }

    }

 

    public class MyDerivedClass : MyClass

    {

        public override void VirtualIncreaseLong()

        {

            someLong += 2;

        }

    }

 

As you can see, there is no difference between the IncreaseLong and VirtualIncreaseLong methods, except that the latter is virtual and the former is a regular instance method.  According to many people, calling VirtualIncreaseLong instead of IncreaseLong will be more expensive.  I also have a derived class which overrides the VirtualIncreaseLong method with a slightly different implementation.

If we call these methods a bunch of times (like 1000000000 times), we should notice quite a difference according to many people.

I wrote the following test code which calls these methods a bunch of times, times it, and outputs the results.

    class Program

    {

        const int numberOfTimes = 1000000000;

 

        static void Main(string[] args)

        {

            var myObject = new MyClass();

            var myDerivedObject = new MyDerivedClass();

 

            // we do this so there’s no first-time performance cost while timing

            EnsureThatEverythingHasBeenJitted(myObject);

            EnsureThatEverythingHasBeenJitted(myDerivedObject);

 

            TestNormalIncreaseMethod(myObject);

            TestVirtualIncreaseMethod(myObject);

 

            TestNormalIncreaseMethod(myDerivedObject);

            TestVirtualIncreaseMethod(myDerivedObject);

 

            Console.ReadLine();

        }

 

        static void EnsureThatEverythingHasBeenJitted(MyClass theObject)

        {

            theObject.IncreaseLong();

            theObject.VirtualIncreaseLong();

        }

 

        static void TestNormalIncreaseMethod(MyClass theObject)

        {

            Console.WriteLine(string.Format("calling the IncreaseLong method of type {0} {1} times", theObject.GetType().Name, numberOfTimes));

 

            var stopwatch = Stopwatch.StartNew();

            for (var i = 0; i < numberOfTimes; i++)

            {

                theObject.IncreaseLong();

            }

            stopwatch.Stop();

 

            Console.WriteLine("Elapsed milliseconds: " + stopwatch.ElapsedMilliseconds);

        }

 

        static void TestVirtualIncreaseMethod(MyClass theObject)

        {

            Console.WriteLine(string.Format("calling the VirtualIncreaseLong method of type {0} {1} times", theObject.GetType().Name, numberOfTimes));

 

            var stopwatch = Stopwatch.StartNew();

            for (var i = 0; i < numberOfTimes; i++)

            {

                theObject.VirtualIncreaseLong();

            }

            stopwatch.Stop();

 

            Console.WriteLine("Elapsed milliseconds: " + stopwatch.ElapsedMilliseconds);

        }

    }

 

The output of running this code might surprise you.  On my machine, i got the following results when the code was compiled in debug mode:

manual_compile_debug

The difference between calling the regular instance method and the virtual method is quite small.  I’d even say it’s negligible.

When compiling in release mode, i got the following output:

manual_compile_optimized

I ran the test a bunch of times, and there was no consistent observable performance penalty when calling the virtual methods.  In fact, the virtual methods often performed faster than the regular instance methods and in most cases were equally fast.  I’m not claiming that virtual methods are faster than regular instance methods, but if there really was an extra real-world performance cost associated with virtual methods, it surely should be observable with this test code, no?

Obviously, this test isn’t scientific in any way.  But still, i think it does show that the so called performance cost associated with virtual methods is highly overrated.  There definitely will be cases where virtual methods are more expensive than regular instance methods, but i’m willing to bet that those cases are rare and that the vast majority of .NET developers will never be negatively impacted by it. 

Side note: have you ever noticed that most people who recommend to avoid virtual methods due to their performance cost never put the same emphasis on avoiding the cost of say, frequent remote operations?  Which is odd, since i wouldn’t be surprised if that would be the most common reason for performance problems with .NET applications.  Then again, that’s what you get when the biggest company pushing a platform advocates meaningless performance improvements while at the same time pushing bad architectural decisions/guidelines on the world because the resulting code is supposedly easier to write, use and maintain.

You can download the example code here so you can run the test yourself. 

Posted in C#, Performance | 18 Comments »

Monitoring Production Performance

Posted by Davy Brion on 7th September 2009

It’s always interesting to know how well your applications perform in production. To get a better view on this, i recently added a bit of performance related logging to the RequestProcessor class of my Request/Response service layer. I have the following 2 loggers set up:

        private readonly ILog logger = LogManager.GetLogger(typeof(RequestProcessor));

        private readonly ILog performanceLogger = LogManager.GetLogger("PERFORMANCE");

And here’s a simplified version of the Process method:

        public Response[] Process(params Request[] requests)

        {

            if (requests == null) return null;

 

            var responses = new List<Response>(requests.Length);

 

            var batchStopwatch = Stopwatch.StartNew();

 

            foreach (var request in requests)

            {

                try

                {

                    using (var handler = (IRequestHandler)IoC.Container.Resolve(GetHandlerTypeFor(request)))

                    {

                        var requestStopwatch = Stopwatch.StartNew();

 

                        try

                        {

                            // NOTE: the real code has a lot more stuff in this block for dealing with

                            // failed requests etc...

                            responses.Add(GetResponseFromHandler(request, handler));

                        }

                        finally

                        {

                            requestStopwatch.Stop();

 

                            if (requestStopwatch.ElapsedMilliseconds > 100)

                            {

                                performanceLogger.Warn(string.Format("Performance warning: {0}ms for {1}",

                                    requestStopwatch.ElapsedMilliseconds, handler.GetType().Name));

                            }

 

                            IoC.Container.Release(handler);

                        }

                    }

                }

                catch (Exception e)

                {

                    // NOTE: every single thrown exception in the service layer (and everything below it)

                    // is caught here and logged only once

                    logger.Error(e);          

                    throw;

                }

            }

 

            batchStopwatch.Stop();

 

            if (batchStopwatch.ElapsedMilliseconds > 200)

            {

                var builder = new StringBuilder();

 

                foreach (var request in requests)

                {

                    builder.Append(request.GetType().Name + ", ");

                }

                builder.Remove(builder.Length - 2, 2);

 

                performanceLogger.Warn(string.Format("Performance warning: {0}ms for the following batch: {1}",

                    batchStopwatch.ElapsedMilliseconds, builder));

            }

 

            return responses.ToArray();

        }

The part that is simplified is simply the part that deals with getting the actual responses for each request, including how to deal with failed requests and how to handle subsequent requests in the batch. It’s not relevant to this post and it only clutters the code more so i left that out. But anyways, the interesting part here is the performance logging. Ok, the code itself isn’t interesting but what you get out of it is pretty nice. Each single request that takes more than 100 milliseconds is logged. Also, each batch of requests that takes more than 200 milliseconds is logged. Both of these ‘events’ are logged to the performance logger which, in our case, is set up to use a different logfile than the typical error log.

Which gives us some interesting looking data, like this:


WARN 2009-09-05 08:58:45 - Performance warning: 122ms for GetOpportunityCardHandler
WARN 2009-09-05 09:00:45 - Performance warning: 159ms for GetCompanyCardHandler
WARN 2009-09-05 09:01:23 - Performance warning: 187ms for GetOpportunityCardHandler
WARN 2009-09-05 09:01:32 - Performance warning: 155ms for GetCompanyCardHandler
WARN 2009-09-05 09:01:41 - Performance warning: 189ms for GetOpportunityCardHandler
WARN 2009-09-05 09:01:41 - Performance warning: 336ms for the following batch: GetSalesTaskCardRequest, GetContactCardRequest, GetOpportunityCardRequest, GetCompanyCardRequest

336ms for a single batch of requests is rather slow, and if we see this output regularly in the logfile, we definitely know that these specific request handlers really need to be optimized because we can be pretty sure that is too slow in the real production environment.

It’s also useful to identify single request handlers who could use some optimization:


WARN 2009-09-04 02:55:40 - Performance warning: 108ms for PersistIssueHandler
WARN 2009-09-04 05:54:50 - Performance warning: 148ms for PersistTimesheetEntryHandler

We’ve already deployed this for an internal version of one of our applications, which helped us identify some parts that we really needed to optimize before our next external deployment (for our customers). Even better, when we deploy for our customers, each log entry will also contain the name of the tenant (customer) for which the performance warning was generated. Which will only make it easier for us to identify real hotspots, based on real usage from our customers, and try to optimize them as soon as possible, preferably before customers start complaining about specific parts.

I’m sure some of you are already doing something similar, but i’m also pretty sure that most of you aren’t doing this yet. I’d definitely recommend doing something like this, because it definitely makes it easier to fix _real_ performance issues instead of the theoretical ones ;)

Posted in Performance | 7 Comments »

Reducing ViewState Size

Posted by Davy Brion on 1st September 2009

I dislike ViewState as much as the next guy, but when you’re working with ASP.NET WebForms, you just can’t avoid it. In some cases, the size of the ViewState can become so big that it significantly increases the load time of pages due to the extra bandwidth consumption. The correct solution would obviously be to reduce the size of the ViewState in those pages as much as you can, but it’s not always feasible to do so. So we wanted a more general ’solution’, and i found this post which discusses compressing the ViewState before you send it to the client and decompressing it when the client sends it back. We used pretty much the same approach, but with some differences.

First of all, ViewState is persisted in the resulting HTML page through an IStateFormatter object. We’ll provide our own CompressedStateFormatter which implements the IStateFormatter interface, and uses the standard IStateFormatter that ASP.NET uses:

    public class CompressedStateFormatter : IStateFormatter

    {

        private readonly IStateFormatter actualFormatter;

 

        public CompressedStateFormatter(IStateFormatter actualFormatter)

        {

            this.actualFormatter = actualFormatter;

        }

 

        public string Serialize(object state)

        {

            string decompressedState = actualFormatter.Serialize(state);

 

            using (MemoryStream memoryStream = new MemoryStream())

            {

                using (Stream zipStream = new GZipStream(memoryStream, CompressionMode.Compress))

                using (StreamWriter writer = new StreamWriter(zipStream))

                {

                    writer.Write(decompressedState);

                }

 

                return Convert.ToBase64String(memoryStream.ToArray());

            }

        }

 

        public object Deserialize(string serializedState)

        {

            byte[] data = Convert.FromBase64String(serializedState);

 

            using (MemoryStream memoryStream = new MemoryStream(data))

            using (Stream zippedStream = new GZipStream(memoryStream, CompressionMode.Decompress))

            using (StreamReader reader = new StreamReader(zippedStream))

            {

                return actualFormatter.Deserialize(reader.ReadToEnd());

            }

        }

    }

The idea is very simple: when the Serialize method is called, we first call the real formatter’s Serialize method, compress its return value and then return the Base64-encoded string of the compressed serialized state. And in the Deserialize method, we do the exact opposite: we first decompress the Base64-encoded string and then we use the real formatter to deserialize the actual ViewState.

In Mamanze’s example, he checks to see if the compressed version is actually smaller than the decompressed version and if so, uses the decompressed version instead of the compressed one. And when decompressing he first checks to see if it’s a compressed or decompressed version and obviously only decompresses in case of a compressed version. The only page where i found the compressed version of the ViewState to be larger than the decompressed version was in our log in page, so i just got rid of that piece of the code.

Now we still have to plug this into ASP.NET’s behavior somehow… first we add a pagestate.browser file to the App_Browsers folder of your web application (if it doesn’t exist, just create it) with the following content:

<browsers>

  <browser refID="Default">

    <controlAdapters>

      <adapter controlType="System.Web.UI.Page" adapterType="Our.Application.CompressedPageStateAdapter" />

    </controlAdapters>

  </browser>

</browsers>

The CompressedPageStateAdapter looks like this:

    public class CompressedPageStateAdapter : PageAdapter

    {

        public override PageStatePersister GetStatePersister()

        {

            return new CompressedHiddenFieldPageStatePersister(Page);

        }

    }

And the CompressedHiddenFieldPageStatePersister class looks like this:

    public class CompressedHiddenFieldPageStatePersister : HiddenFieldPageStatePersister

    {

        public CompressedHiddenFieldPageStatePersister(Page page) : base(page)

        {

            FieldInfo field = typeof(PageStatePersister).GetField("_stateFormatter", BindingFlags.NonPublic | BindingFlags.Instance);

            // retrieving this property instantiates the default IStateFormatter

            var defaultFormatter = base.StateFormatter;

            var formatter = new CompressedStateFormatter(defaultFormatter);

            field.SetValue(this, formatter);

        }

    }

The HiddenFieldPageStatePersister is the class that ASP.NET WebForms will use by default to store your ViewState into a hidden field in the resulting HTML. By default, the HiddenFieldPageStatePersister uses the default IStateFormatter type that ASP.NET uses, which only uses Base64 encoding but no compression. Unfortunately, there is no clean way to instruct ASP.NET to use a different implementation for IStateFormatter, so we need to use a bit of reflection to overwrite the value of HiddenFieldPageStatePersister’s _stateFormatter field. Luckily, this also enables us to first get the value of the StateFormatter property so we can pass this reference (which is the ‘real’ formatter) to our CompressedStateFormatter.

And that is all there is to it… all of your pages will now use this CompressedHiddenFieldPageStatePersister so you get the benefit of ViewState compression in each of your pages. You can also do this selectively if you want, by not using the pagestate.browser file and overriding the PageStatePersister property of your ASPX page:

        private CompressedHiddenFieldPageStatePersister persister;

 

        protected override PageStatePersister PageStatePersister

        {

            get

            {

                if (persister == null)

                {

                    persister = new CompressedHiddenFieldPageStatePersister(this);

                }

 

                return persister;

            }

        }

This way, only the pages that contain this code will use the CompressedHiddenFieldPageStatePersister.

Instead of inheriting from HiddenFieldPageStatePersister, you could also inherit from SessionPageStatePersister. SessionPageStatePersister will store your ViewState in the HttpSessionState, and will only include a little bit of ViewState in your HTML page instead of everything. But you do need to be aware of the fact that using the CompressedStateFormatter when inheriting from SessionPageStatePersister will only result in compressing the little bit of ViewState that is included in the HTML, and not the ViewState that is stored in the HttpSessionState.

In case you’re wondering: why should i use this instead of using typical HTTP compression on the IIS level? I believe it has a couple of advantages to HTTP compression. First of all, AFAIK, HTTP compression does not have any benefit on postbacks. And since ViewState is always posted back to the server, this can make a pretty big difference. Also, with this approach, the client will not have to decompress the entire ViewState (which isn’t used client-side anyway) and the browser doesn’t have to waste time on it in general.

I haven’t used this in production yet, but i will very soon… unless someone knows of a good reason why i shouldn’t ;)

Posted in ASP.NET, Performance | 12 Comments »

Of Course NHibernate Is Slow When You Use It Incorrectly

Posted by Davy Brion on 19th August 2009

Just saw the following post where the performance of NHibernate and Entity Framework is compared for a couple of different operations. Spoiler alert: NHibernate loses. As it typically does in these kinds of ‘benchmarks’ or ‘comparisons’ that seem to pop up frequently lately.

For some reason, a lot of people seem to think that opening an NHibernate session and performing thousands of operations is a valid use case. It’s not. Far from it actually. And with all of the features that NHibernate offers, it can’t possibly perform well in such a scenario. See, an NHibernate session is a unit of work. A unit of work is a business transaction which is typically short and small, but it should never be something huge. Your DBA probably won’t appreciate huge database transactions on the database either.

Whenever you load an object through NHibernate, it will be tracked by the session that loaded it. That means that the NHibernate session keeps a reference to it, and performs a series of checks on it periodically, depending on what you’re doing and some configuration settings such as the FlushMode. For instance, suppose you’ve loaded a hundred different entities in one session. If the FlushMode is set to automatic, it means that NHibernate will perform a dirty check for each entity associated with the session before each query is executed. The more entity instances you’ve loaded, the longer this takes (obviously). If you take this to an extreme level, like loading thousands of entities like most of these ‘benchmarks’ do, performance will naturally be horrible.

Each entity instance is also stored in the first level (or session level) cache. That means that whenever you retrieve a row from the database, NHibernate will check if an instance of that row already exists in the first level cache. Again, the more instances you’ve loaded, the larger the overhead of this will be. There are also a lot of possible extension points where you can plug in custom logic. Again, there is a very minor cost that comes with this extensibility and as you can expect, that minor cost can add up to something much more noticable once you start dealing with an unreasonably large number of instances in your session.

Always keep in mind that an ORM (and this goes for every ORM) is most suitable for OLTP. Using an ORM for batch processing jobs or large data processing operations in general is simply put a bad idea. And they will never perform as well as other solutions in these scenarios. So please don’t bother even benchmarking ORM performance in non OLTP usage because it quite simply doesn’t make sense to do so, and the results will be completely untrustworthy anyway.

An ORM can offer you nice performance gains in OLTP scenarios simply by trying to minimize database connectivity, minimizing the number of database operations, and relatively sane caching usage. Unfortunately, these are aspects that are never tested in these ‘benchmarks’ or ‘comparisons’.

Posted in NHibernate, Performance | 14 Comments »

Avoid Using NHibernate With NUnit 2.4.6

Posted by Davy Brion on 24th June 2009

We just spent about 2 hours trying to find out why our NHibernate tests were about 10x slower on our build server than they were on our local machines. I had noticed lately that the build for one of our projects was taking longer and longer but i hadn’t really timed the difference. This project has about 1200 tests that use NHibernate and they run in about 45-60 seconds on my local machine. It turns out they took around 15 minutes on the buildserver when running them through TeamCity.

I logged into the buildserver and ran the tests manually using nunit’s console runner (with an NUnit-2.4.7 build that i happened to have installed somewhere on the machine) and they only took about 45 seconds. After a lot of guesswork and screwing around, it turned out that we never modified our base build script (why yes, i do believe in build script inheritance) to use a newer version of NUnit. We set up the buildserver about 1 year ago, and at that time, the latest stable NUnit version that TeamCity supported was NUnit 2.4.6. Our base build script was still referring to NUnit 2.4.6, which apparently sets log4net to use debug level logging. Now, NHibernate logs a huge amount of information at the debug level, so this turned out to slow down all of our builds that had NHibernate tests.

We changed the the 2.4.6 version in our script to 2.4.7 and the build time of this particular project decreased from around 50 minutes to about 35 minutes. Yes, that’s still a lot but this is a huge project with a lot of legacy tests and the entire build process is pretty complex. Other projects went from build times from around 7 minutes to about 2 minutes.

That’s a pretty nice improvement for simply changing a “6″ to a “7″ ;)

Posted in NHibernate, Performance, Test Driven Development | 3 Comments »

Keep An Eye On Those Indexes

Posted by Davy Brion on 21st May 2009

We have a multi-tenant application, where each tenant has its own database. We recently were informed about a particular performance problem that one tenant (which we’ll refer to as Tenant A) was experiencing in every screen where data of a certain type needed to be shown. None of the other tenants experienced this problem though.

We tracked down the query that was causing the bad performance and ran it on the database of Tenant B. Tenant B actually had a lot more data in the main table that was used in the query and the query executed immediately whereas it took about 25 seconds to complete for Tenant A. So the query runs fast on another database that actually has more data… at this point i was convinced that it had to be related to indexes.

Turns out that someone recently ran an import process to import a bunch of data in Tenant A’s database. I know very little about databases, but one thing i’ve seen time and time again (with both Oracle and SQL Server) is that you really need to make sure that your indexes are in good shape after any process that performs a lot of inserts (or removals). A couple of years ago, i had a very intensive nightly import process for a particular project that used an Oracle database. As time went on, the application’s queries became painfully (unacceptably even) slow. I managed to restore the performance of those queries by simply instructing Oracle to recalculate all of the statistics of the indexes of tables that were affected heavily during the nightly import.

With that in mind, we simply rebuilt the indexes for Tenant A’s database, and the same query that took 25 seconds completed almost instantly from then on. Now, we did had a weekly job running on that database server to keep the indexes in a healthy shape but that job didn’t really do a good umm… job of it, apparently.

Lessons learned: make sure that you:

  • Have a proper maintenance job set up which keeps your indexes healthy and schedule it to run regularly
  • Run that job manually if you need to perform a manual import process
  • Execute that job in an automated fashion whenever an intensive automated import process has completed

Oh, and consult with your DBA’s or at least people who know what they’re doing when it comes to your particular database on how to keep those indexes healthy. In this case, we rebuilt them. In other cases it’s sufficient to recalculate the statistics… i’m not sure which way is the best but you should at least keep an eye on this possible problem :)

Posted in Performance | 1 Comment »

Using The Guid.Comb Identifier Strategy

Posted by Davy Brion on 21st May 2009

As you may have read by now, it’s a good idea to avoid identity-style identifier strategies with ORM’s. One of the better alternatives that i kinda like is the guid.comb strategy. Using regular guids as a primary key value leads to fragmented indexes (due to the randomness of the guid’s value) which leads to bad performance. This is a problem that the guid.comb strategy can solve quite easily for you.

If you want to learn how the guid.comb strategy really works, be sure to check out Jimmy Nilsson’s article on it. Basically, this strategy generates sequential guids which solves the fragmented index issue. You can generate these sequential guids in your database, but the downside of that is that your ORM would still need to insert each record seperately and fetch the generated primary key value each time. NHibernate includes the guid.comb strategy which will generate the sequential guids before actually inserting the records in your database.

This obviously has some great benefits:

  • you don’t have to hit the database immediately whenever a record needs to be inserted
  • you don’t need to retrieve a generated primary key value when a record was inserted
  • you can batch your insert statements

Let’s see how we can use this with NHibernate. First of all, you need to map the identifier of your entity like this:

    <id name="Id" column="Id" type="guid" >

      <generator class="guid.comb" />

    </id>

And that’s actually all you have to do. You don’t have to assign the primary key values or anything like that. You don’t need to worry about them at all.

Take a look at the following test:

        [Test]

        public void InsertsAreOnlyExecutedAtTransactionCommit()

        {

            var insertCountBefore = sessionFactory.Statistics.EntityInsertCount;

 

            using (var session = sessionFactory.OpenSession())

            using (var transaction = session.BeginTransaction())

            {

                for (int i = 0; i < 50; i++)

                {

                    var category = new ProductCategory(string.Format("category {0}", i + 1));

                    // at this point, the entity doesn't have an ID value yet

                    Assert.AreEqual(Guid.Empty, category.Id);

                    session.Save(category);

                    // now the entity has an ID value, but we still haven't hit the database yet

                    Assert.AreNotEqual(Guid.Empty, category.Id);

                }

 

                // just verifying that we haven't hit the database yet to insert the new categories

                Assert.AreEqual(insertCountBefore, sessionFactory.Statistics.EntityInsertCount);

                transaction.Commit();

                // only now have the recors been inserted

                Assert.AreEqual(insertCountBefore + 50, sessionFactory.Statistics.EntityInsertCount);

            }

        }

Interesting, no? The entities have an ID value after they have been ’saved’ by NHibernate. But they haven’t actually been saved to the database yet though. NHibernate always tries to wait as long as possible to hit the database, and in this case it only needs to hit the database when the transaction is committed. If you’ve enabled batching of DML statements, you could severly reduce the number of times you need to hit the database in this scenario.

And in case you’re wondering, the generated guids look like this:

81cdb935-d371-4285-9dcb-9bdb0122f25f
a44baf99-58e9-4ad7-9a59-9bdb0122f25f
a88300c2-6d64-4ae3-a55b-9bdb0122f25f
032c7884-da2f-4568-b505-9bdb0122f25f
….
70d7713c-b38d-4341-953d-9bdb0122f25f

Notice the last part of the guids… this is what prevents the index fragmentation.

Obviously, this particular test is not a realistic scenario but i’m sure you understand how much of an improvement this identifier strategy could provide throughout an entire application. The only downside (IMO) is that guid’s aren’t really human readable so if that is important to you, you should probably look into other identifier strategies. The HiLo strategy would be particularly interesting in that case, but we’ll cover that in a later post ;)

Posted in NHibernate, Performance | 11 Comments »

Put Performance Concerns Into Perspective

Posted by Davy Brion on 18th April 2009

There is a reported performance issue with NHibernate that i wanted to look into. The reported issue was related to retrieving objects through a generically typed List or through an IList reference.

The following code simulates the issue:

    class MyClass {}

 

    class Program

    {

        static void Main(string[] args)

        {

            List<MyClass> list = new List<MyClass>();

 

            Stopwatch stopwatch = Stopwatch.StartNew();

 

            for (int i = 0; i < 100000; i++)

            {

                list.Add(new MyClass());

            }

 

            stopwatch.Stop();

 

            Console.WriteLine("Elapsed ms: " + stopwatch.ElapsedMilliseconds);

 

            IList iList = new List<MyClass>();

 

            stopwatch = Stopwatch.StartNew();

 

            for (int i = 0; i < 100000; i++)

            {

                iList.Add(new MyClass());

            }

 

            stopwatch.Stop();

 

            Console.WriteLine("Elapsed ms: " + stopwatch.ElapsedMilliseconds);

 

            Console.ReadLine();

        }

    }

The only difference between both Add operations is that in the first case, the typed Add method of the generically typed List reference is called. In the second case, the untyped Add method of the generically typed List is called through the IList reference.

On my slow Macbook, the first Add operation typically took between 10 and 20 ms. The second Add operation typically took almost twice as long as the first Add operation. As you can see, that is a very minor performance issue, and it actually is only consistently noticeable once you’re dealing with 100000 elements. At 50000 elements, both operations typically take the same amount of time with only minor variations in performance on certain runs.

So yes, once you’re dealing with a large enough set of elements, there is indeed a performance difference. But it’s extremely minor and the extra cost of the Add operation is most definitely the least of your concerns if you’re retrieving that many entity instances through an ORM. The extra amount of memory that needs to be used for those entities and the extra cost of pulling all of that data over the wire is what’s really going to bite you, not the extra cost of the Add operation ;)

Posted in Performance | No Comments »

Transparent Query Batching Through Your Repository

Posted by Davy Brion on 1st April 2009

All of our projects that use NHibernate (which is all of them except those where the customer explicitly doesn’t want us to use it or where it wouldn’t make sense to use it) use the same Repository implementation. After the Future and FutureValue queries were added to NHibernate, i modified the implementation of that Repository class.

Two of the FindAll methods now look like this:

        public virtual IEnumerable<T> FindAll()

        {

            return Session.CreateCriteria<T>().Future<T>();

        }

 

        public virtual IEnumerable<T> FindAll(DetachedCriteria criteria)

        {

            return criteria.GetExecutableCriteria(Session).Future<T>();

        }

The only thing i changed in those methods is calling the Future method, instead of the List method. That’s it. All of our specific Find-methods (those that execute specific queries) pass through the FindAll(DetachedCriteria criteria) method so they all benefit from this change.

That means that all of our queries are suddenly batched transparently whenever possible, without impacting any of the calling code. And that is pretty nice if you ask me. Batching queries can offer a substantial performance benefit, and we didn’t even have to change any of the calling code to achieve it.

Obviously, this only works for the queries that return IEnumerables (in our case, that’s every query that doesn’t return a single value). I also added a few more methods to enable query batching for queries that return a single entity, or a scalar value (i kept the original methods in this code snippet as well so you can see the difference):

        public virtual T FindOne(DetachedCriteria criteria)

        {

            return criteria.GetExecutableCriteria(Session).UniqueResult<T>();

        }

 

        public virtual IFutureValue<T> FindFutureOne(DetachedCriteria criteria)

        {

            return criteria.GetExecutableCriteria(Session).FutureValue<T>();

        }

 

        public virtual K GetScalar<K>(DetachedCriteria criteria)

        {

            return (K)criteria.GetExecutableCriteria(Session).UniqueResult();

        }

 

        public virtual IFutureValue<K> GetFutureScalar<K>(DetachedCriteria criteria)

        {

            return criteria.GetExecutableCriteria(Session).FutureValue<K>();

        }

 

        public virtual int Count(DetachedCriteria criteria)

        {

            return Convert.ToInt32(QueryCount(criteria).GetExecutableCriteria(Session).UniqueResult());

        }

 

        public virtual IFutureValue<int> FutureCount(DetachedCriteria criteria)

        {

            return QueryCount(criteria).GetExecutableCriteria(Session).FutureValue<int>();

        }

 

So let’s recap. Queries that return IEnumerables are all batched transparently whenever it’s possible to do so. No calling code had to be modified to get this benefit. Queries that return single values (an entity instance or a scalar value) that still use the ‘old’ FindOne, GetScalar and Count methods obviously couldn’t benefit from the transparent batching without breaking backwards compatibility, but the new methods that were introduced do enable transparent batching for these queries from now on.

Does all of this sound too good to be true? I’d be skeptic too if i were you but i made these changes a few months ago actually and we have been using this stuff on a couple of projects with zero problems.

Obviously, you need NHibernate 2.1 Alpha 1 (or later) for this or the current trunk, both of which i would recommend over NH 2.0 at this point.

Posted in NHibernate, Performance | 15 Comments »