Archive for May, 2008

How a simple foreach statement can waste an afternoon

4 commentsWritten on May 27th, 2008 by
Categories: Performance

We all like the foreach statement, right? It's easy to use. It looks good. It does a good job of what it's supposed to do. What's not to like? Well... today i learned it can actually be a great hiding place for performance issues.

I wrote the following code a while ago:

        public void ProcessGroupsAndTheirMembers(ActiveDirectoryConfiguration adConfig)

        {

            List<GroupPrincipal> groupPrincipals = GetABunchOfGroupsFromActiveDirectory(adConfig);

 

            foreach (var groupPrincipal in groupPrincipals)

            {

                HandleGroup(groupPrincipal);

                DealWithMembers(groupPrincipal.Members);

            }

        }

it's actually a simplified version of the code i wrote, but you get the idea. It doesn't look so bad, right? It fetches a bunch of groups from an Active Directory store, then it processes the groups and the members of those groups. It turns out there are actually a few problems with this code. First of all, when you retrieve a GroupPrincipal, there's no way to make it fetch its Members collection in the same roundtrip (if i'm mistaken, please do correct me). So the Members property of the GroupPrincipal is a lazy-loaded collection. When you access it, it goes back to the Active Directory to fetch all the member Principals. There's not really anything i can do about that, due to the limitations in how you can retrieve GroupPrincipals (again, unless i'm mistaken).

So basically, we fetch a bunch of data (the groups) and then when we loop through the retrieved data we fetch more data (the members) for each item in the loop. So we are making a hell of a lot of roundtrips if we have a lot of groups. I despise situations like that. And i never do this unless i can't avoid it. As unfortunate as that is, it's not the real problem that lurks in this code.

If you don't have a lot of groups, then this code works perfectly and the data is processed quickly and the memory is cleaned up pretty soon after we leave ProcessGroupsAndTheirMembers method. Unless you suddenly have to loop through 6000 groups. And almost all of them have at least a few Members, some even have a lot of them. Keep in mind that for each group, we go back to the Active Directory store to retrieve the members. So that is at least one big query (to retrieve all of the groups) and another 6000 to retrieve all the members. As if that's not bad enough, the Active Directory store turns out to be pretty slow. All of a sudden, the code that used to run in a matter of seconds takes 9 minutes.

So you fire up your tools to help you diagnose the problem... the profiler quickly shows that the code spends most of its time in the ProcessGroupsAndTheirMembers method. Process Explorer shows stable cpu usage (low at 25%, but stable... no peaks) and ever-increasing memory usage (all the way up to 400mb). This is the time where you get that warm and fuzzy "oh fuck..."-feeling. So you start experimenting with changes, and you test it... each time you test it you basically have to wait 9 minutes if the change didn't have any effect. Joy...

It's actually really simple once you figure it out... each GroupPrincipal object takes up some memory space. If its Members collection is filled up, the GroupPrincipal will hold references to each member Principal in the collection. The object graph that you are holding in memory basically increases each time you pass through the loop because each GroupPrincipal will hold all of its Members after we've processed it.

But hey, we have garbage collection! It'll clean up the used memory! Yea it does... eventually. Do you know how many garbage collections could occur in a period of 9 minutes? A lot of them actually. Especially if your code is aggressively requesting more and more memory space.

The problem, of course, is with the foreach statement (duh, i already gave it away in the title). As you can see, we don't really do anything with the GroupPrincipal once we've processed it. Yet it's still kept in the groupPrincipals list, for the duration of the entire loop. And we can't remove it from the list while we're in the foreach because then the underlying iterator will throw exceptions once we move to the next item. The trick was simply to replace the foreach with a do-while-loop (how old-school!) and to get rid of the GroupPrincipal once it was processed:

        public void ProcessGroupsAndTheirMembers(ActiveDirectoryConfiguration adConfig)

        {

            List<GroupPrincipal> groupPrincipals = GetABunchOfGroupsFromActiveDirectory(adConfig);

 

            do

            {

                GroupPrincipal groupPrincipal = groupPrincipals[0];

                HandleGroup(groupPrincipal);

                DealWithMembers(groupPrincipal.Members);

                groupPrincipals.RemoveAt(0);

                groupPrincipal.Dispose();

            } while (groupPrincipals.Count > 0);

        }

When i ran this code, memory usage remained stable and cpu usage actually went down to 5%. The time needed to process the groups went from 9 minutes to 5. Still a lot, but as evidenced by the very low cpu usage, the code is constantly waiting for the data from Active Directory to cross the wire and then it quickly processes it, and then it waits for the next bunch of data.

So, as this story clearly demonstrates, the foreach statement can be quite the evil bitch even though it's usually the nice girl-next-door kinda statement. It's too bad i wasted a few hours on this... well... honestly, a part of me loves situations like these in a weird, sick and twisted kinda way. You always learn something very interesting from it :)

Hope you enjoyed this episode of How The Code Turns.

The Multithreaded Task Executor

4 commentsWritten on May 26th, 2008 by
Categories: Multithreading, Patterns

Sometimes, you've got a bunch of actions that you need to execute in a loop. The problem is that those actions are all performed synchronously so this could take some time depending on the action. But, if the action itself is thread-safe, and the actions are not dependent on the results of previous actions, you might get much better performance if you spread that workload over a few different threads. Especially if you have multiple CPU cores.

Wouldn't it be cool if you could do something like this:

            MultiThreadedTaskExecutor taskExecutor = new MultiThreadedTaskExecutor(numberOfThreadsToUse);

 

            foreach (Input input in inputs)

            {

                Input newVariable = input;

                taskExecutor.QueueTask(() => ProcessInput(newVariable));

            }

 

            taskExecutor.RunTasksAndWait();

Note: the reason why you need to use a new variable inside the loop is to avoid that the 'input' variable (not the reference to the object, but the actual variable) is captured by the anonymous method. If you don't use a new variable, each created anonymous method would refer to the 'input' loop variable, which by the time all the tasks are executed points to the last Input instance in the inputs collection. The result would be that each task is executed on the same input instance. This is a known issue with variable capturing and anonymous methods.

Anyways... the code above basically spreads the workload over the given number of threads.

The rough, not-quite-production-ready code of the MultiThreadedTaskExecutor class looks like this:

    public class MultiThreadedTaskExecutor

    {

        private readonly List<Thread> threads = new List<Thread>();

        private readonly List<EventWaitHandle> eventWaitHandles = new List<EventWaitHandle>();

        private readonly List<Type> swallowedExceptionTypes;

 

        private readonly Queue<Action> taskQueue;

        private readonly object queueMonitor = new object();

 

        public MultiThreadedTaskExecutor(int numberOfThreads)

        {

            swallowedExceptionTypes = new List<Type>();

            taskQueue = new Queue<Action>();

            CreateThreads(numberOfThreads);

        }

 

        public void QueueTask(Action task)

        {

            taskQueue.Enqueue(task);

        }

 

        public void AddExceptionTypeToSwallow(Type type)

        {

            swallowedExceptionTypes.Add(type);

        }

 

        public void RunTasksAndWait()

        {

            foreach (Thread thread in threads)

            {

                thread.Start();

            }

 

            WaitHandle.WaitAll(eventWaitHandles.ToArray());

        }

 

        private void CreateThreads(int number)

        {

            for (int i = 0; i < number; i++)

            {

                var eventWaitHandle = new EventWaitHandle(false, EventResetMode.ManualReset);

                eventWaitHandles.Add(eventWaitHandle);

                threads.Add(new Thread(() => ProcessTasks(eventWaitHandle)));

            }

        }

 

        private void ProcessTasks(EventWaitHandle eventWaitHandle)

        {

            try

            {

                Action action;

 

                while ((action = GetTask()) != null)

                {

                    try

                    {

                        action();

                    }

                    catch (Exception e)

                    {

                        if (!swallowedExceptionTypes.Contains(e.GetType())) throw;

                    }

                }

            }

            finally

            {

                eventWaitHandle.Set();

            }

        }

 

        private Action GetTask()

        {

            lock (queueMonitor)

            {

                if (taskQueue.Count == 0) return null;

                return taskQueue.Dequeue();

            }

        }

    }

So how does it work? It uses a queue to hold each task that was added by the consumer of the class. It creates the given amount of threads and also creates an EventWaitHandle for each thread. Then when the user starts the execution with a call to RunTasksAndWait, each thread is started and then the call to RunTasksAndWait will wait until each thread is finished. In the meantime, each thread will get the next task off the queue and executes it. If an exception is thrown within the task, it is caught and is either swallowed or rethrown (the consumer can add exception types that can be swallowed). Each thread keeps doing this until it can't get a new task off the queue. When that happens, the thread signals the EventWaitHandler and then it dies. When all threads are dead, RunTasksAndWait will stop blocking and control is returned to the caller. All of the tasks have been executed and the workload has been spread over the given amount of threads.

Note: due to the call to WaitHandle.WaitAll, this won't work on STA threads because the implementation of WaitHandle.WaitAll simply throws a NotSupportedException on STA threads.

Keep in mind that this is a rough version of this code... it definitely needs a bit more polish (better exception handling for when the threads are interrupted and stuff like that mostly) but you get the idea :)

But if you know a better way to do this, or if you spot flaws in this implemenation, i'd love to hear about it :)

The First 100 Posts

4 commentsWritten on May 25th, 2008 by
Categories: About The Blog

This is my 100th post. I started this blog on June 17th in 2007 so it took me little less than a year to reach this first milestone. To be honest, when i started i never thought i'd stay interested in maintaining a blog for so long. And i sure didn't expect to reach 100 posts. Granted, about 20 of those posts are posts that merely link to other content, but a good 80 of them are indeed 'original content'... not a bad ratio if i may say so myself ;)
Anyways, it's always good to look back when you reach a milestone (even if it is one you made up yourself), so lets have a look shall we?

These are the posts that have received the most views (ordered descending):

  1. NHibernate Mapping Examples

    I remember spending a full week of vacation time working on this... i guess the work payed off because this post really brought a lot of people to this blog. I can only hope that it actually helped a few people as well :)

  2. Visual Studio 2008 Release Date

    Wow, what a fluke. I merely thought it was an interesting link for the 3 people who were following my blog at that time. A couple weeks later it was the 3rd result in Google when you searched for 'visual studio 2008 release date'. That was actually a bit frustrating because posts that i was actually proud of didn't get any attention at all, while this one was getting a lot of views, but you knew the visitors would'nt stick around.

  3. Multilingual Data And NHibernate

    I kinda liked this one... I think it's a pretty nice approach but due to some unfortunate circumstances i was never able to use this in a real project. Too bad cause i really wanted to see how it would turn out.

  4. Read Only Generic Dictionary

    Not much to say about this one... apparantly i'm not the only one who'd like something like this to be included in the .NET Framework :)

  5. Integration Tests With NHibernate

    This one wasn't bad either... I wouldn't go with this kind of integration testing anymore, but the research and experiments that led to this post resulted in my first (and only, so far) patch to the nhibernate source code which was accepted and has been included since the 1.2.1 version. Yay :)

  6. Creating A Sortable Bindinglist

    Meh... there's not really a lot i can say about this one since it's a pretty simple and short post. Apparantly a lot of people where looking for something like this though

  7. Native ID Generation With NHibernate

    Apparantly, NHibernate's native ID generator is something that leaves a few people confused.

  8. Easy And Fast Sorting Of Objects

    Again, something a lot of people are looking for. I could've done a much better job on that post, but it is what it is. It did introduce me to Marc Brooks' DynamicComparer which i really like a lot. It's too bad that a lot of people will probably use LINQ's ordering capabilities instead of this since this approach will definitely have much better performance than LINQ's ordering capabilities.

  9. ObjectDataSource And Sorting Collections

    I liked this one... it's short yet not too short. I love using this approach :)

  10. Sending NHibernate Entities Over The WCF Wire

    Mixed feelings about this one... The post itself wasn't bad, but i'd never use this approach on a real project. (in case you're wondering: nhibernate entities at the business side, flat-and-screen-based DTO's client side)

So those are the posts that have gotten the most views so far... and none of them is among my list of personal favorites (simply ordered by date):

Simply browsing the entire list of posts made me realize just how much i've learned in the past year. A lot has changed in that year as well... due to an unfortunate situation i quit working at a client i'd been working for for over 5 years. It was an awkward situation for everyone involved, but i did learn a lot from it. It's too bad some friendships were hurt or even destroyed in the process, but then again, the mere fact that those friendships were impacted probably says a lot about how real or valuable those friendships were. Oh well, fake friends were lost, real friends were recognized and valuable lessons were learned. So for the last couple of months i've been working at another client, and it's definitely nice to see what kind of things are different, and also what kind of things are the same in these large companies. And i've come to the conclusion that it's just not right for me. I don't wanna work at large companies anymore, at least not for the next few years. I've got about one month to go, and then i'll get some experience in another scenario: working at a relatively small software development company. I don't know yet what to expect, but i'm looking forward to find out. Oh, and i'm also looking forward to writing the next 100 posts obviously :)


For those of you who've been following this blog: thanks, i appreciate it :)

Storing data in the HttpSession

No Comments »Written on May 25th, 2008 by
Categories: ASP.NET

As we all know (i'd hope!), storing data in the HttpSession is pretty bad. HttpSessions remain in memory long after (depending on the configured timeout) the user has performed his/her last action on your site. Think about that for a second... everything you store in the HttpSession will remain in memory for a while even though it has no chance of being used again, since the user is already gone. If you don't have a large amount of concurrent users, this might not cause any noticeable problems. But you have to think about the multiplier effect. As your user base increases, so do your memory requirements. You'll need more memory to keep serving those users. But if you store a lot of data in the HttpSession, your wasted memory will increase along with your increased user base. While you're serving your active users, you're still holding a lot of data in memory from users who've left your site 5 or 10 minutes ago. Because that unneeded data is still referenced from HttpSessions that will never be used again, it prevents the garbage collector from releasing that memory. When your memory usage gets to the point where the operating system needs to start paging, you'll notice dramatic slowdowns in your site's performance.

So you really want to limit the information you store in the HttpSession to that which pertains to the actual 'session' of the user. Identification of the user, chosen language, stuff like that. But don't use the HttpSession to store data that is related to the current screen the user is in. This is unfortunately commonly done, but this can easily lead to the problems mentioned above. It's better to try to work as stateless as possible as this allows your application to scale to a large user base more easily. Don't store retrieved data in the HttpSession, just retrieve it again when it's needed. However, some kinds of data are very expensive to retrieve. In these cases, it might actually be better to store it in the HttpSession to avoid having to retrieve it excessively. So how do you avoid that these stored pieces of data have a negative impact on the memory usage of your application?

The Release It book mentions a great trick for this. I haven't used it yet in an application, but it definitely makes a lot of sense. Instead of storing a reference to data in your HttpSession (and thus, preventing that data from being garbage collected until the HttpSession is garbage collected), you should use a 'soft reference'. A soft reference is kinda like a simple pointer to the data, but it does not prevent the data from being garbage collected. The soft reference doesn't count as a real reference to the garbage collector, so when the collector needs to clear memory, the data you reference with a soft reference will be garbage collected (if it's not referenced anywhere else that is...). If that happens your application code has to deal with it. In that case, you should retrieve the data again. Will this cause you to retrieve the expensive piece of data more times than you would have to if you had stored it with a normal reference in the HttpSession? Possibly. You can't give a definitive answer to that question because it depends on how frequently the garbage collector needs to clear memory. But you will most likely benefit from possibly still having that data in memory. If it's still there, great! Use it. If it's no longer there, fetch it again. You'll probably reduce the amount of times you need to retrieve it, but you'll also avoid keeping the data from being garbage collected when the server is low on memory.

In .NET, you can use the WeakReference class to obtain a 'soft reference'. Lets demonstrate this approach with a quick-n-dirty example:

When you want to store data in the HttpSession, instead of doing this:

            IEnumerable<OrderView> outstandingOrders

                = orderManagementServiceProxy.GetOverviewOfAllOutstandingOrders();

 

            Session["outstandingOrders"] = outstandingOrders;

You could do this:

            IEnumerable<OrderView> outstandingOrders

                = orderManagementServiceProxy.GetOverviewOfAllOutstandingOrders();

 

            Session["outstandingOrders"] = new WeakReference(outstandingOrders);

And when you'd need to retrieve that data in a later request, you could do this:

            WeakReference reference = Session["outstandingOrders"] as WeakReference;

 

            if (reference != null && reference.IsAlive)

            {

                outstandingOrders = reference.Target as IEnumerable<OrderView>;

            }

            else

            {

                outstandingOrders = orderManagementServiceProxy.GetOverviewOfAllOutstandingOrders();

            }

Like i said, this is a quick-n-dirty example... normally you'd want to prevent direct access to the HttpSession and you'd probably write some kind of class that takes care of wrapping the reference in a WeakReference and unwrapping it from a WeakReference but this code is just to illustrate the approach.

NHibernate and virtual methods/properties

1 Comment »Written on May 24th, 2008 by
Categories: NHibernate, PostSharp

I love NHibernate but one of the things that bothers the hell out of me is that i keep forgetting to add the virtual keyword to each method or property in my entities. And since NHibernate needs your classes' properties and methods to be virtual, this causes run-time errors when i run my tests. Since i'm already using custom compile time checks, i figured i might as well add another one... from now on, i want my compilation to fail if any of my NHibernate entities have public methods/properties that aren't marked virtual.

Once again, it's PostSharp to the rescue:

using System;

using System.Reflection;

 

using PostSharp.Extensibility;

using PostSharp.Laos;

 

namespace Northwind.Aspects

{

    [Serializable]

    [AttributeUsage(AttributeTargets.Assembly | AttributeTargets.Method | AttributeTargets.Property)]

    [MulticastAttributeUsage(MulticastTargets.Method, TargetMemberAttributes = MulticastAttributes.Managed |

        MulticastAttributes.NonAbstract | MulticastAttributes.Instance |

        MulticastAttributes.Protected | MulticastAttributes.Public)]

    public class RequireVirtualMethodsAndProperties : OnMethodBoundaryAspect

    {

        public override bool CompileTimeValidate(MethodBase method)

        {

            if (!method.IsVirtual)

            {

                string methodName = method.DeclaringType.FullName + "." + method.Name;

 

                var message = new Message(SeverityType.Fatal, "MustBeVirtual",

                    string.Format("{0} must be virtual", methodName), GetType().Name);

                MessageSource.MessageSink.Write(message);

 

                return false;

            }

 

            return true;

        }

    }

}

And then we make sure this check is applied to my NHibernate entities:

#if DEBUG

[assembly: RequireVirtualMethodsAndProperties(AttributeTargetTypes = "Northwind.Domain.Entities.*")]

#endif

Now, whenever i forget to mark my properties/methods as virtual, i get this:

EXEC : error MustBeVirtual: Northwind.Domain.Entities.Region.RemoveTerritory must be virtual EXEC : error MustBeVirtual: Northwind.Domain.Entities.Region.AddTerritory must be virtual EXEC : error MustBeVirtual: Northwind.Domain.Entities.Region.get_Territories must be virtual EXEC : error MustBeVirtual: Northwind.Domain.Entities.Region.set_Description must be virtual EXEC : error MustBeVirtual: Northwind.Domain.Entities.Region.get_Description must be virtual EXEC : error MustBeVirtual: Northwind.Domain.Entities.Region.set_Id must be virtual EXEC : error MustBeVirtual: Northwind.Domain.Entities.Region.get_Id must be virtual

...

Done building project "Northwind.csproj" -- FAILED.

And there we go :)