Efficient data fetching
Posted by Davy Brion on June 8th, 2008
Suppose you have the following table structure:
And we have the following classes:
What is the most efficient way to retrieve all Product instances, fully populated (with complete Supplier and ProductCategory references)? There are a couple of ways to retrieve this data, depending on what kind of data access techniques you’re using. These are the goals we should aim to achieve:
- We only want one roundtrip to the database
- No joins… Depending on the amount of data in each of the tables, a 3-table join could easily lead to a query which is way too expensive for what we’re trying to achieve
- We don’t want to write boring code to construct the object graph. The object graph should be built up automagically
Ok… let’s see. Since we only want one trip to the database, that rules out using lazy loading (which would be a terrible idea in this case anyway). And we don’t want to join either, so our best bet is probably to fetch all the categories, all the suppliers, and all the products with 3 simple queries. Obviously, we’d have to batch those 3 queries so the 3 results are retrieved with only one roundtrip. And then we need to figure out a way to join the results together in an easy to use object graph.
Luckily for me i’m using NHibernate which makes all of this incredibly easy. First, we create the 3 queries:
IMultiCriteria multiCriteria = Session.CreateMultiCriteria();
multiCriteria.Add(Session.CreateCriteria(typeof(ProductCategory)));
multiCriteria.Add(Session.CreateCriteria(typeof(Supplier)));
multiCriteria.Add(Session.CreateCriteria(typeof(Product)));
A Criteria instance is kinda like a programmatic query. If you create a Criteria that is only based on the type of an object, you basically create a query which returns each record from the table that is associated with that type without any other conditions. In the code above, we create 3 queries this way to retrieve all the instances of the ProductCategory, Supplier and Product types. At this point, the queries have only been created, they haven’t been executed yet.
To retrieve the results of these 3 queries with one database call, we simply do this:
IList results = multiCriteria.List();
When this line is executed, we can see the following happening on the database (through SQL Server Profiler):
Alright, so we’ve accomplished the goal of using only one database roundtrip without using joins. Now, we want to use the data we retrieved without having to write boring code to make sure each product refers to the correct ProductCategory and Supplier references. Actually, we don’t really have to do anything for that. The list of products was the third Criteria instance that we added, so it is also the third item in the result list.
IEnumerable<Product> products = ((IList)results[2]).Cast<Product>();
Ok, so now we have a list of Products… Now how do we make sure that each product points to the correct ProductCategory and Supplier instances? Fortunately, we don’t even have to worry about that. When we access a Product’s Category or Supplier properties, NHibernate first checks its current session’s identity map to see if those objects are already present in the session’s first-level cache. Because we retrieved the ProductCategories and the Suppliers, all of them are already present. So if we would write the following code, no more extra queries would be executed:
foreach (Product product in products)
{
string dummy = product.Name + " - " +
product.Supplier.CompanyName + " - " +
product.Category.Name;
}
This code is pretty much pointless, but it does show that we can simply access whatever data we need, without having to write boring code, and do it in an efficient way as well.
Now, you might be thinking "who cares if you retrieve the data in one roundtrip instead of three?". Well, you should care about that stuff... what if you used this technique throughout your application (where applicable of course)? Even if the data you need to retrieve is not related, this is still a potentially large performance improvement if you simply try to batch statements wherever you can. The more you reduce the number of network roundtrips, the fewer time your application spends waiting on data to cross the wire. And as your number of concurrent users increases, you really want to minimize the places in your code base where the application is just waiting on something.


