The Inquisitive Coder – Davy Brion's Blog

Trying to walk that thin line between intelligence and ignorance

Monitoring Production Performance

Posted by Davy Brion on September 7th, 2009

It’s always interesting to know how well your applications perform in production. To get a better view on this, i recently added a bit of performance related logging to the RequestProcessor class of my Request/Response service layer. I have the following 2 loggers set up:

        private readonly ILog logger = LogManager.GetLogger(typeof(RequestProcessor));

        private readonly ILog performanceLogger = LogManager.GetLogger("PERFORMANCE");

And here’s a simplified version of the Process method:

        public Response[] Process(params Request[] requests)

        {

            if (requests == null) return null;

 

            var responses = new List<Response>(requests.Length);

 

            var batchStopwatch = Stopwatch.StartNew();

 

            foreach (var request in requests)

            {

                try

                {

                    using (var handler = (IRequestHandler)IoC.Container.Resolve(GetHandlerTypeFor(request)))

                    {

                        var requestStopwatch = Stopwatch.StartNew();

 

                        try

                        {

                            // NOTE: the real code has a lot more stuff in this block for dealing with

                            // failed requests etc...

                            responses.Add(GetResponseFromHandler(request, handler));

                        }

                        finally

                        {

                            requestStopwatch.Stop();

 

                            if (requestStopwatch.ElapsedMilliseconds > 100)

                            {

                                performanceLogger.Warn(string.Format("Performance warning: {0}ms for {1}",

                                    requestStopwatch.ElapsedMilliseconds, handler.GetType().Name));

                            }

 

                            IoC.Container.Release(handler);

                        }

                    }

                }

                catch (Exception e)

                {

                    // NOTE: every single thrown exception in the service layer (and everything below it)

                    // is caught here and logged only once

                    logger.Error(e);          

                    throw;

                }

            }

 

            batchStopwatch.Stop();

 

            if (batchStopwatch.ElapsedMilliseconds > 200)

            {

                var builder = new StringBuilder();

 

                foreach (var request in requests)

                {

                    builder.Append(request.GetType().Name + ", ");

                }

                builder.Remove(builder.Length - 2, 2);

 

                performanceLogger.Warn(string.Format("Performance warning: {0}ms for the following batch: {1}",

                    batchStopwatch.ElapsedMilliseconds, builder));

            }

 

            return responses.ToArray();

        }

The part that is simplified is simply the part that deals with getting the actual responses for each request, including how to deal with failed requests and how to handle subsequent requests in the batch. It’s not relevant to this post and it only clutters the code more so i left that out. But anyways, the interesting part here is the performance logging. Ok, the code itself isn’t interesting but what you get out of it is pretty nice. Each single request that takes more than 100 milliseconds is logged. Also, each batch of requests that takes more than 200 milliseconds is logged. Both of these ‘events’ are logged to the performance logger which, in our case, is set up to use a different logfile than the typical error log.

Which gives us some interesting looking data, like this:


WARN 2009-09-05 08:58:45 - Performance warning: 122ms for GetOpportunityCardHandler
WARN 2009-09-05 09:00:45 - Performance warning: 159ms for GetCompanyCardHandler
WARN 2009-09-05 09:01:23 - Performance warning: 187ms for GetOpportunityCardHandler
WARN 2009-09-05 09:01:32 - Performance warning: 155ms for GetCompanyCardHandler
WARN 2009-09-05 09:01:41 - Performance warning: 189ms for GetOpportunityCardHandler
WARN 2009-09-05 09:01:41 - Performance warning: 336ms for the following batch: GetSalesTaskCardRequest, GetContactCardRequest, GetOpportunityCardRequest, GetCompanyCardRequest

336ms for a single batch of requests is rather slow, and if we see this output regularly in the logfile, we definitely know that these specific request handlers really need to be optimized because we can be pretty sure that is too slow in the real production environment.

It’s also useful to identify single request handlers who could use some optimization:


WARN 2009-09-04 02:55:40 - Performance warning: 108ms for PersistIssueHandler
WARN 2009-09-04 05:54:50 - Performance warning: 148ms for PersistTimesheetEntryHandler

We’ve already deployed this for an internal version of one of our applications, which helped us identify some parts that we really needed to optimize before our next external deployment (for our customers). Even better, when we deploy for our customers, each log entry will also contain the name of the tenant (customer) for which the performance warning was generated. Which will only make it easier for us to identify real hotspots, based on real usage from our customers, and try to optimize them as soon as possible, preferably before customers start complaining about specific parts.

I’m sure some of you are already doing something similar, but i’m also pretty sure that most of you aren’t doing this yet. I’d definitely recommend doing something like this, because it definitely makes it easier to fix _real_ performance issues instead of the theoretical ones ;)

7 Responses to “Monitoring Production Performance”

  1. Jan Van Ryswyck Says:

    What made you choose for integrating the logging code into the processor itself instead of taking the AOP route or implementing a custom WCF behavior?

  2. Davy Brion Says:

    custom WCF behavior isn’t an option because i need to do 2 kinds of performance logging here… one for the entire batch of requests, and one for each kind of request. With a custom WCF behavior i’d only get the performance warning for the entire batch. If you’re using a classic WCF service layer though, then a custom behavior is obviously good enough.

    As for the AOP solution… i wouldn’t be able to use Windsor for this because that would intercept each method call with the same interceptor, essentially giving me only kind of performance logging. So i could use postsharp, if i moved the handling of a single request (and its timing) to a separate method and used separate attributes for both methods.

    But first of all, this library doesn’t have a dependency on Postsharp yet, and i’m unwilling to introduce it because this piece of code (which granted, is not very clean or even nice) only occurs once, and it’s in the infrastructure code. If this kind of code would occur in multiple places then i’d obviously come up with a cleaner approach, but if it’s only in one place? I don’t really see the point then… i’d even argue that it would purely be intellectual masturbation ;)

  3. Ryan Roberts Says:

    Was the fact your app is multitennant the reason you avoided creating performance counters? I have done similar tricks using a bit of type discovery to install perf counters for messages. I guess in multitennant you would need to install a set per customer, which could be gnarly with the..quirky api.

    You do get some good stuff with perf counters though – it makes it easy to cross reference your application specific performance information with memory pressure / requests per second etc and the potential to integrate into existing health monitoring infrastructure if IT is on the ball enough to use it (a rare event indeed :) ).

  4. Davy Brion Says:

    @Ryan

    i actually hadn’t even considered performance counters (i hardly have any experience with them)

    it wasn’t so much because of the multi-tenant thing since that is just one application that uses the library which contains the request/response wcf stuff… but with the current approach, all of our applications benefit from it, provided that they enabled the performance logger and have an appender for it of course ;)

  5. Reflective Perspective - Chris Alcock » The Morning Brew #429 Says:

    [...] Monitoring Production Performance – Davy Brion shows a simple method of gaining performance reporting of your code allowing you to define a threshold for a particular process and log when it takes longer – simple stuff, and something I’m sure can be built on. [...]

  6. Daily tech links for .net and related technologies - September 9-12, 2009 - Sanjeev Agarwal Says:

    [...] Web Development Optimize Download Time for AJAX Web ApplicationsASP NET MVC Application Screencast Fluent wrapper for ASP.NET MVC TagBuilder Getting Started with ASP.NET MVC 1.0 Refcard Monitoring Production Performance [...]

  7. Simon Says:

    This is what i am trying to achive with this project

    http://accelero.codeplex.com/

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>