On October 20th, i wrote this in a comment:
i have zero interest in LINQ To Sql because i think it will ultimately be neglected (or even dropped) by Microsoft to increase adoption of Entity Framework
On October 20th, i wrote this in a comment:
i have zero interest in LINQ To Sql because i think it will ultimately be neglected (or even dropped) by Microsoft to increase adoption of Entity Framework
In my previous post, i showed how you can configure NHibernate to batch create/update/delete statements and what kind of performance benefits you can get from it. In this post, we're going to take this a bit further so we can actually use NHibernate in bulk data operations, an area where ORM's traditionally perform pretty badly.
First of all, let's get back to our test code from the last post:
var testObjects = CreateTestObjects(500000);
var stopwatch = new Stopwatch();
stopwatch.Start();
using (ITransaction transaction = Session.BeginTransaction())
{
foreach (var testObject in testObjects)
{
Session.Save(testObject);
}
transaction.Commit();
}
stopwatch.Stop();
var time = stopwatch.Elapsed;
The only thing that changed since the previous post is the amount of objects that are created. In the previous post we only created 10000 objects, whereas now we'll be creating 500000 objects.
The batch size is configured like this:
<property name="adonet.batch_size">100</property>
This means that NHibernate will send its DML statements in batches of 100 statements instead of sending all of them one by one. The above code runs in 2 minutes and 24 seconds with a batch size of 100.
However, if we use NHibernate's IStatelessionSession instead of a regular ISession, we can get some nice improvements. First of all, here's the code to use the IStatelessSession:
var testObjects = CreateTestObjects(500000);
var stopwatch = new Stopwatch();
stopwatch.Start();
using (IStatelessSession statelessSession = sessionFactory.OpenStatelessSession())
using (ITransaction transaction = statelessSession.BeginTransaction())
{
foreach (var testObject in testObjects)
{
statelessSession.Insert(testObject);
}
transaction.Commit();
}
stopwatch.Stop();
var time = stopwatch.Elapsed;
As you can see, apart from the usage of the IStatelessSession instead of the regular ISession, this is pretty much the same code.
With a batch-size of 100, this code creates and inserts the 500000 records in 1 minute and 26 seconds. While not a spectacular improvement, it's definitely a nice improvement in duration.
The biggest difference however is in memory usage while the code is running. A regular NHibernate ISession keeps a lot of data in its first-level cache (this enables a lot of the NHibernate magical goodies). The IStatelessSession however, does no such thing. It does no caching whatsoever and it also doesn't fire all of the events that you could usually plug into. This is strictly meant to be used for bulk data operations.
To give you an idea on the difference in memory usage, here are the memory statistics (captured by Process Explorer) after running the original code (with the ISession instance):
And here are the memory statistics after running the modified code (with the IStatelessSession instance):
Quite a difference for what is essentially the same operation. We could even improve on this because the code in its current form keeps all of the object instances in its own collection, preventing them from being garbage collected after they have been inserted in the database. But i think this already demonstrates the value in using the IStatelessSession if you need to perform bulk operations.
Obviously, this will never perform as well as a bulk data operation that directly uses low-level ADO.NET code. But if you already have the NHibernate mappings and infrastructure set up, implementing those bulk operations could be cheaper while still being 'fast enough' for most situations.
Just wanted to give you guys a bit of a heads up about some of the upcoming content i've got planned for this blog. I've been posting more about NHibernate lately, and i've actually got 12 other NHibernate-related posts planned for the coming weeks/months. For those of you who aren't really interested in the NHibernate-related content, i hope you bear with me and keep reading for the other stuff as well. I write all of this stuff in my spare time, so naturally i always pick subjects that i like to spend that spare time on. Right now, NHibernate is at the top of that list but i'll obviously keep covering other topics as well ![]()
And since the subscriber count finally surpassed 600 today, i guess this is a good time to thank all of you who've started following this blog, and obviously everyone who's been reading for a while now. So thanks, and i hope you enjoy your stay ![]()
An oft-forgotten feature of NHibernate is that of batching DML statements. If you need to create, update or delete a bunch of objects you can get NHibernate to send those statements in batches instead of one by one. Let's give this a closer look.
I have an 'entity' with the following mapping:
<class name="CrudTest" table="CrudTest">
<id name="Id" column="Id" type="guid" >
<generator class="assigned" />
</id>
<property name="Description" column="Description" type="string" length="200" not-null="true" />
</class>
Nothing special here, just a Guid Id field and a string Description field.
First, let's see how much time it takes to create 10000 records of this without using the batching feature. I use the following method to create a bunch of dummy objects:
private IEnumerable<CrudTest> CreateTestObjects(int count)
{
List<CrudTest> objects = new List<CrudTest>(count);
for (int i = 0; i < count; i++)
{
objects.Add(new CrudTest { Id = Guid.NewGuid(), Description = Guid.NewGuid().ToString() });
}
return objects;
}
Then, the code to persist these objects:
var testObjects = CreateTestObjects(10000);
var stopwatch = new Stopwatch();
stopwatch.Start();
using (ITransaction transaction = Session.BeginTransaction())
{
foreach (var testObject in testObjects)
{
Session.Save(testObject);
}
transaction.Commit();
}
stopwatch.Stop();
Without enabling the batching, this code took 23 seconds to run on my cheap MacBook. Now let's enable the batching in the hibernate.cfg.xml file:
<property name="adonet.batch_size">5</property>
A batch size of 5 is still very small, but for this test it means that it only has to do 2000 trips to the database instead of the original 10000. The code above now runs in 5.5 seconds. Setting the batch size to 100 made it run in 1.8 seconds. Going from 23 to 1.8 seconds with a small configuration change is a pretty nice improvement with very little effort. Obviously, these aren't real benchmarks so your results may vary but i think it does show that you can easily get some performance benefits from it.
You can get performance benefits like this whenever you need to create/update/delete a bunch of records simply by enabling this setting. Keep in mind that this batching of statements doesn't apply to select queries... for that you need to use NHibernate's MultiCriteria or MultiQuery features ![]()
Another thing to keep in mind is that for this test i used the 'assigned' Id generator... which means that the developer is responsible for providing the Id value for new objects. One of the consequences of this is that NHibernate does not have to go to the database to retrieve the Id values like it would have to do if you were using (for instance) Identity Id values. If you were using the Identity Id generator, this configuration setting would have no effect whatsoever for inserts, although the benefits would still apply to update and delete statements.
Note that this approach is good for regular applications, but it's still not good enough if you need to process very large data sets (like import processes and things of that nature). Obviously, an ORM isn't well suited for those purposes, but we will examine another NHibernate feature in a future post which makes it possible to use NHibernate in such bulk operations with a pretty low performance overhead.
One of the new features that NHibernate 2.0 introduced is NHibernate Statistics. This feature can be pretty useful during development (or while debugging) to keep an eye on what NHibernate is doing. Not a lot of people know about this feature, so i've decided to write a short series of posts about it. In this first episode, we'll explore some stats which can show you some useful information regarding the efficiency of your (simple) data fetching strategies. Later episodes will cover insert/update/delete statistics, query specific statistics and caching statistics. I don't know yet when the other episodes will be posted, but they are definitely on my TODO list so they will get written eventually ![]()
First of all, here's how you can enable this feature. In your hibernate.cfg.xml file, you can add the following setting within the session-factory element.
<property name="generate_statistics">true</property>
Now, there are two levels of statistics. The first is at the level of the SessionFactory. These statistics basically keep count of everything that happens for each Session that was created by the SessionFactory. You can access these stats through the Statistics property of the SessionFactory instance, which you can access from your Session instance if you don't have a reference to the SessionFactory, for instance:
var count = Session.SessionFactory.Statistics.EntityFetchCount;
To give you an idea of what kind of stats are available on the SessionFactory level, here's a quick listing of most of the available properties: EntityDeleteCount, EntityInsertCount, EntityLoadCount, EntityFetchCount, EntityUpdateCount, QueryExecutionCount, QueryExecutionMaxTime, QueryCacheHitCount, QueryCacheMissCount, QueryCachePutCount, FlushCount, ConnectCount, SecondLevelCacheHitCount, SecondLevelCacheMissCount, SecondLevelCachePutCount, SessionCloseCount, SessionOpenCount, CollectionLoadCount, CollectionFetchCount, CollectionUpdateCount, CollectionRemoveCount, CollectionRecreateCount, SuccessfulTransactionCount, TransactionCount, PrepareStatementCount, CloseStatementCount, OptimisticFailureCount.
There are even more properties and methods available. For instance to retrieve the executed queries, statistics for a specific query, statistics for a specific entity type, for a specific collection role, and to get the second level cache statistics for a specific cache region.
As you can see, lots of useful properties to help you examine where your NHibernate usage might not be the way it should be. Or just useful in case you're tying to figure out what kind of stuff NHibernate is doing behind the scenes for features you don't fully understand and want to experiment with. There's also a useful LogSummary() method which (obviously) logs a summary of these stats to NHibernate's logger.
Let's get into a couple of examples that will tell us more about the efficiency of our simple data access strategies. We'll start with some really simple stuff, and then move to some more interesting statistics.
For brevity, i'm using the following simple property to quickly access the SessionFactory's statistics:
private IStatistics GlobalStats
{
get { return Session.SessionFactory.Statistics; }
}
For the first example, we'll retrieve all of the records in the Product table. After fetching this result, the EntityLoadCount statistic will reflect the number of entities we've loaded:
[Test]
public void TestEntityLoadCount()
{
long entityLoadCountBefore = GlobalStats.EntityLoadCount;
var allProducts = Session.CreateCriteria(typeof(Product)).List<Product>();
Assert.AreEqual(entityLoadCountBefore + allProducts.Count, GlobalStats.EntityLoadCount);
}
What happens if we start using NHibernate's lazy loading features? For instance, if we access a Product's Category reference NHibernate has to retrieve that record from the database if it's not already loaded. Does this count as an EntityLoad, or an EntityFetch? It counts as both an EntityLoad and an EntityFetch actually:
[Test]
public void TestEntityFetchCountForManyToOnePropertiesWithLazyLoading()
{
long entityLoadCountBefore = GlobalStats.EntityLoadCount;
long entityFetchCountBefore = GlobalStats.EntityFetchCount;
var allProducts = Session.CreateCriteria(typeof(Product)).List<Product>();
foreach (var product in allProducts)
{
// this makes NHibernate fetch the Category from the database
var categoryName = product.Category.Name;
}
var entitiesFetched = GlobalStats.EntityFetchCount - entityFetchCountBefore;
Assert.AreEqual(entityLoadCountBefore + allProducts.Count + entitiesFetched, GlobalStats.EntityLoadCount);
Assert.That(entitiesFetched != 0);
}
With the data in my database, this test loads 77 Product entities, and fetches 8 Categories for a total of 85 loaded entities. The Product entities are loaded in one roundtrip, but for each Category another roundtrip is performed which is not ideal from a performance perspective. The stats actually reflect that through the PrepareStatementCount property. Let's modify the previous test to highlight this:
var entitiesFetched = GlobalStats.EntityFetchCount - entityFetchCountBefore;
Assert.AreEqual(entityLoadCountBefore + allProducts.Count + entitiesFetched, GlobalStats.EntityLoadCount);
Assert.AreEqual(entitiesFetched + 1, GlobalStats.PrepareStatementCount);
Assert.That(entitiesFetched != 0);
We get the count of the fetched entities, and add 1 to it to reflect the roundtrip to fetch all the products. This total equals the value of the PrepareStatementCount property.
This example shows that it can be pretty important to try to keep the EntityFetchCount and PrepareStatementCount property values as low as possible when you need to fix a performance problem. Let's give it a shot:
[Test]
public void TestEntityFetchCountForManyToOnePropertiesWithoutLazyLoading()
{
long entityFetchCountBefore = GlobalStats.EntityFetchCount;
var allProducts = Session.CreateCriteria(typeof(Product))
.CreateCriteria("Category", JoinType.InnerJoin)
.List<Product>();
foreach (var product in allProducts)
{
// the Categories have already been retrieved, so this doesn't cause a db roundtrip
var categoryName = product.Category.Name;
}
var entitiesFetched = GlobalStats.EntityFetchCount - entityFetchCountBefore;
Assert.AreEqual(1, GlobalStats.PrepareStatementCount);
Assert.AreEqual(0, entitiesFetched);
}
Instead of fetching all of the Products and then relying on NHibernate's lazy loading to fetch the Categories, we fetch all of them in one go. The result is that NHibernate only performs one DB statement and it still loads all 85 records.
These statistics are nice if you just want to get information about loading entities, but what about collections? Well, we can use the CollectionLoadCount and CollectionFetchCount properties for this:
[Test]
public void TestCountsForOneToManyPropertyWithLazyLoading()
{
var loadCountBefore = GlobalStats.EntityLoadCount;
var collectionLoadCountBefore = GlobalStats.CollectionLoadCount;
var collectionFetchCountBefore = GlobalStats.CollectionFetchCount;
var prepareStatementCountBefore = GlobalStats.PrepareStatementCount;
var allRegions = Session.CreateCriteria(typeof(Region)).List<Region>();
var territoryCount = 0;
foreach (var region in allRegions)
{
// uses lazy-loading to fetch the Territories for this region
territoryCount += region.Territories.Count();
}
Assert.AreEqual(loadCountBefore + allRegions.Count + territoryCount, GlobalStats.EntityLoadCount);
Assert.AreEqual(collectionLoadCountBefore + allRegions.Count, GlobalStats.CollectionLoadCount);
Assert.AreEqual(collectionFetchCountBefore + allRegions.Count, GlobalStats.CollectionFetchCount);
Assert.AreEqual(prepareStatementCountBefore + allRegions.Count + 1, GlobalStats.PrepareStatementCount);
}
Btw, using the Count() result of the Territories property is something i'd never do, but i'm just using it here to illustrate these stats. In this case, we have 4 Regions, which we retrieve in one roundtrip, and then we fetch each Region's Territories in seperate trips which brings the total of PrepareStatementCount up to 5. As you can see, the CollectionLoadCount is equal to the CollectionFetchCount. In some cases, it can be better to retrieve the Territories while we're also retrieving the Regions:
[Test]
public void TestCountsForManyToOnePropertyWithoutLazyLoading()
{
var loadCountBefore = GlobalStats.EntityLoadCount;
var collectionLoadCountBefore = GlobalStats.CollectionLoadCount;
var collectionFetchCountBefore = GlobalStats.CollectionFetchCount;
var prepareStatementCountBefore = GlobalStats.PrepareStatementCount;
var allRegions = Session.CreateCriteria(typeof(Region))
.SetFetchMode("Territories", FetchMode.Join)
.SetResultTransformer(CriteriaUtil.DistinctRootEntity)
.List<Region>();
var territoryCount = 0;
foreach (var region in allRegions)
{
// the Territories have already been retrieved, so this doesn't use lazy-loading
territoryCount += region.Territories.Count();
}
Assert.AreEqual(loadCountBefore + allRegions.Count + territoryCount, GlobalStats.EntityLoadCount);
Assert.AreEqual(collectionLoadCountBefore + allRegions.Count, GlobalStats.CollectionLoadCount);
Assert.AreEqual(collectionFetchCountBefore, GlobalStats.CollectionFetchCount);
Assert.AreEqual(prepareStatementCountBefore + 1, GlobalStats.PrepareStatementCount);
}
With this code, there's only one roundtrip, yet all of the Regions and their Territories are loaded. This approach won't always be better than retrieving the collections seperately though... it kinda depends on the size and shape of the resultset of the joined query compared to the size and shapes of the resultsets of retrieving the root entities and their child collections seperately.
This post only showed a couple of the (many) interesting statistics that NHibernate can give you, but it could already help you troubleshoot bad-performing parts of your application. Keep an eye on those EntityFetchCount and PrepareStatementCount values... If the EntityFetchCount is rather low compared to the total EntityLoadCount, then there's probably nothing bad going on. If the EntityFetchCount is a rather large percentage of the total EntityLoadCount value, then you can be pretty sure that you can get some solid performance improvements in the code that drives up the EntityFetchCount value.