How a simple foreach statement can waste an afternoon

4 commentsWritten on May 27th, 2008 by
Categories: Performance

We all like the foreach statement, right? It's easy to use. It looks good. It does a good job of what it's supposed to do. What's not to like? Well... today i learned it can actually be a great hiding place for performance issues.

I wrote the following code a while ago:

        public void ProcessGroupsAndTheirMembers(ActiveDirectoryConfiguration adConfig)

        {

            List<GroupPrincipal> groupPrincipals = GetABunchOfGroupsFromActiveDirectory(adConfig);

 

            foreach (var groupPrincipal in groupPrincipals)

            {

                HandleGroup(groupPrincipal);

                DealWithMembers(groupPrincipal.Members);

            }

        }

it's actually a simplified version of the code i wrote, but you get the idea. It doesn't look so bad, right? It fetches a bunch of groups from an Active Directory store, then it processes the groups and the members of those groups. It turns out there are actually a few problems with this code. First of all, when you retrieve a GroupPrincipal, there's no way to make it fetch its Members collection in the same roundtrip (if i'm mistaken, please do correct me). So the Members property of the GroupPrincipal is a lazy-loaded collection. When you access it, it goes back to the Active Directory to fetch all the member Principals. There's not really anything i can do about that, due to the limitations in how you can retrieve GroupPrincipals (again, unless i'm mistaken).

So basically, we fetch a bunch of data (the groups) and then when we loop through the retrieved data we fetch more data (the members) for each item in the loop. So we are making a hell of a lot of roundtrips if we have a lot of groups. I despise situations like that. And i never do this unless i can't avoid it. As unfortunate as that is, it's not the real problem that lurks in this code.

If you don't have a lot of groups, then this code works perfectly and the data is processed quickly and the memory is cleaned up pretty soon after we leave ProcessGroupsAndTheirMembers method. Unless you suddenly have to loop through 6000 groups. And almost all of them have at least a few Members, some even have a lot of them. Keep in mind that for each group, we go back to the Active Directory store to retrieve the members. So that is at least one big query (to retrieve all of the groups) and another 6000 to retrieve all the members. As if that's not bad enough, the Active Directory store turns out to be pretty slow. All of a sudden, the code that used to run in a matter of seconds takes 9 minutes.

So you fire up your tools to help you diagnose the problem... the profiler quickly shows that the code spends most of its time in the ProcessGroupsAndTheirMembers method. Process Explorer shows stable cpu usage (low at 25%, but stable... no peaks) and ever-increasing memory usage (all the way up to 400mb). This is the time where you get that warm and fuzzy "oh fuck..."-feeling. So you start experimenting with changes, and you test it... each time you test it you basically have to wait 9 minutes if the change didn't have any effect. Joy...

It's actually really simple once you figure it out... each GroupPrincipal object takes up some memory space. If its Members collection is filled up, the GroupPrincipal will hold references to each member Principal in the collection. The object graph that you are holding in memory basically increases each time you pass through the loop because each GroupPrincipal will hold all of its Members after we've processed it.

But hey, we have garbage collection! It'll clean up the used memory! Yea it does... eventually. Do you know how many garbage collections could occur in a period of 9 minutes? A lot of them actually. Especially if your code is aggressively requesting more and more memory space.

The problem, of course, is with the foreach statement (duh, i already gave it away in the title). As you can see, we don't really do anything with the GroupPrincipal once we've processed it. Yet it's still kept in the groupPrincipals list, for the duration of the entire loop. And we can't remove it from the list while we're in the foreach because then the underlying iterator will throw exceptions once we move to the next item. The trick was simply to replace the foreach with a do-while-loop (how old-school!) and to get rid of the GroupPrincipal once it was processed:

        public void ProcessGroupsAndTheirMembers(ActiveDirectoryConfiguration adConfig)

        {

            List<GroupPrincipal> groupPrincipals = GetABunchOfGroupsFromActiveDirectory(adConfig);

 

            do

            {

                GroupPrincipal groupPrincipal = groupPrincipals[0];

                HandleGroup(groupPrincipal);

                DealWithMembers(groupPrincipal.Members);

                groupPrincipals.RemoveAt(0);

                groupPrincipal.Dispose();

            } while (groupPrincipals.Count > 0);

        }

When i ran this code, memory usage remained stable and cpu usage actually went down to 5%. The time needed to process the groups went from 9 minutes to 5. Still a lot, but as evidenced by the very low cpu usage, the code is constantly waiting for the data from Active Directory to cross the wire and then it quickly processes it, and then it waits for the next bunch of data.

So, as this story clearly demonstrates, the foreach statement can be quite the evil bitch even though it's usually the nice girl-next-door kinda statement. It's too bad i wasted a few hours on this... well... honestly, a part of me loves situations like these in a weird, sick and twisted kinda way. You always learn something very interesting from it :)

Hope you enjoyed this episode of How The Code Turns.

  • http://musingmarc.blogspot.com Marc Brooks

    You realize that the exact symptom you are running into can simply be handled in the original code by calling the groupPrinciple.Dispose() method at the end of the loop. You’re not changing the collection (which would assert), you’re changing the objects it points at.

  • http://davybrion.com Davy Brion

    Depending on how GroupPrincipal.Dispose is implemented, that might not release every reference the instance uses. If i only call Dispose without removing it from the list, i’ll still hold a reference to it, and it won’t be eligible for garbage collection even though it will indeed use less memory already due to the call to Dispose.

    I’d need to look at the GroupPrincipal.Dispose method in reflector to be sure, but i’d rather dispose it and get rid of the reference entirely once i no longer need it, especially if we’re talking about a loop that still takes a few minutes.

  • Luc

    Davy,

    Yes, you can retrieve all members in a AD group with one LDAP request and suppress request roundtrips to the AD.
    I don’t know your application, but my experience is that using the DirectorySearcher class in such situations has some advantages.

    1. You can work with disconnected search results
    2. By specifying the attributes in which you are interested in, you can limit the amount of data transported between the AD and the application.
    3. You can use Attribute scope queries. This might be important for your application.
    Example: when your are interested in for example the e-mail addresses of the users in a group (or distribution list), you can use the group as root for the directory search, but specify which objects (and which attributes, in this case mail address) you want to return based on an attribute (in this case ‘member’) of the search root. As such the search root must not be transported to the application, and the results which may be spread around in the AD can be retrieved whithout any navigation.

    When accessing AD from .NET, following thing are imho important:
    - Choose the correct root object.
    - Use disconnected model if possible.
    - Be specific about attributes you want to access
    - Use the correct scoping (Base, OneLevel or Subtree)
    - Avoid sorting
    - Use indexed attributes in filters
    - Do not treat AD as a classic relational database, it isn’t one. So learn LDAP queries

    Example code:

    static void Main(string[] args)
    {
    string[] attribs = new string[]{“distinguishedName”, “sAMAccountName”, “name”, “mail”};
    DirectoryEntry users = new DirectoryEntry(“LDAP://S2008:389/CN=Users,DC=lvk,DC=local”,
    @”User”, “Password”, AuthenticationTypes.Secure);

    DirectorySearcher ds = new DirectorySearcher(users, “(objectClass=group)”,
    new string[] { “distinguishedName” ,”member”});
    ds.CacheResults = true;

    using (SearchResultCollection src = ds.FindAll())
    {
    Console.WriteLine(“Returning {0}”, src.Count);

    foreach (SearchResult sr in src)
    {
    Console.WriteLine(“Processing group:{0}”, sr.Properties["distinguishedName"][0]);

    DirectorySearcher ds2 = new DirectorySearcher(sr.GetDirectoryEntry(),
    “(&(objectClass=user)(objectCategory=person))”, attribs);
    ds2.SearchScope = SearchScope.Base;
    ds2.AttributeScopeQuery = “member”;
    ds2.CacheResults = true;

    using (SearchResultCollection src2 = ds2.FindAll())
    {
    foreach (SearchResult sr2 in src2)
    {
    foreach (string s in attribs)
    {
    if (sr2.Properties.Contains(s) && sr2.Properties[s].Count > 0)
    {
    Console.WriteLine(“{0}: {1}”, s, sr2.Properties[s][0]);
    }
    }
    Console.WriteLine();
    }
    }
    Console.WriteLine(“——————————————————————–”);
    }
    }
    Console.ReadLine();
    }

    If you use a sniffer, you should see that all the members in a group (and the requested attributes) are retrieved with one LDAP query.

  • http://davybrion.com Davy Brion

    @Luc

    excellent stuff… i no longer have access to that particular application because i’m not working at that client anymore, but i’ll definitely use this if i do any future AD work :)