Build Your Own Data Access Layer: Enabling Bulk Inserts
Posted by Davy Brion on October 12th, 2009
Note: This post is part of a series. Be sure to read the introduction here.
I know i wrapped up the series already, but i just had to add the ability to do bulk inserts to this data layer so i figured i’d might as well post about it. Ayende already talked about how to enable the ability to batch inserts (or updates and deletes) here and here so i’m going to skip that part. I used the exact same trick and created a PublicSqlCommandSet class which wraps the hidden SqlCommandSet class. Again, if you have no idea what i’m talking about in that last sentence then you need to read Ayende’s 2 posts that i just linked to
After that, adding the bulk insert feature to the DAL was as simple as creating this class:
public class BulkInsertAction : DatabaseAction
{
public BulkInsertAction(SqlConnection connection, SqlTransaction transaction, MetaDataStore metaDataStore,
EntityHydrater hydrater, SessionLevelCache sessionLevelCache)
: base(connection, transaction, metaDataStore, hydrater, sessionLevelCache) {}
public void Insert<TEntity>(IEnumerable<TEntity> entities, int batchSize, int commandTimeOut)
{
var tableInfo = MetaDataStore.GetTableInfoFor<TEntity>();
var insertStatement = tableInfo.GetInsertStatementWithoutReturningTheIdentityValue();
var sqlCommandSet = new PublicSqlCommandSet { CommandTimeout = commandTimeOut, Connection = GetConnection(), Transaction = GetTransaction() };
foreach (var entity in entities)
{
var currentCommand = CreateCommand();
currentCommand.CommandText = insertStatement;
foreach (var parameterInfo in tableInfo.GetParametersForInsert(entity))
{
currentCommand.CreateAndAddInputParameter(parameterInfo.DbType, "@" + parameterInfo.Name, parameterInfo.Value);
}
sqlCommandSet.Append(currentCommand);
if (sqlCommandSet.CommandCount == batchSize)
{
ExecuteCurrentBatch(sqlCommandSet);
sqlCommandSet = new PublicSqlCommandSet { CommandTimeout = commandTimeOut, Connection = GetConnection(), Transaction = GetTransaction() };
}
}
if (sqlCommandSet.CommandCount > 0)
{
ExecuteCurrentBatch(sqlCommandSet);
}
}
private void ExecuteCurrentBatch(PublicSqlCommandSet sqlCommandSet)
{
try
{
sqlCommandSet.ExecuteNonQuery();
}
finally
{
sqlCommandSet.Dispose();
}
}
}
And then adding this to the Session class:
public void BulkInsert<TEntity>(IEnumerable<TEntity> entities)
{
CreateAction<BulkInsertAction>().Insert(entities, 50, 200);
}
Obviously, the method signature was also added to the ISession interface. The batch-size and commandtimeout parameters are currently hardcoded but they should come from some kind of configuration file.
All in all, pretty easy stuff
October 13th, 2009 at 9:44 am
[...] Build Your Own Data Access Layer: Enabling Bulk Inserts – Davy Brion resumes his series on creating his own ORM / Data Access Layer from scratch with a look at enabling his framework to support bulk inserts of data [...]
October 13th, 2009 at 11:26 am
Some good posts.
What would you suggest in regards to deep loading and using stored procedures, say you have a list of 100 of something each having a foreign key to another table … (using batch queries)
We have to use stored procedures at our place and I am looking at building our own basic ORM.
1. Get all of the list and the FK records
2. Get all of the list and attempt to get unique FK records (so you bring back less from DB).
October 13th, 2009 at 11:40 am
@Tom
depends on the dataset in the referenced table… if it’s more likely that the number of needed records is somewhat similar to the number of records in the referring table then i’d just join the two in one query
if the number of needed records in the referenced table will be much smaller than the number of records in the referring table, i’d go for two statements and try to bring back only the ones i really need in the second statement. your stored procedure could then still send back both resultsets… with a datareader you can process each resultset returned by a statement so that wouldn’t be an issue
either way you do it, as long as you combine both operations in one stored procedure then you don’t really need to worry about the ‘overhead’ of a second query since you’re not really paying for the network latency of multiple remote calls
October 13th, 2009 at 2:53 pm
So really you need some sort of fetching strategy (not sure how you would implement)? because as always.. ‘it depends’..
My example query:
Fixture (say 100 of them)
Fixture.HomeTeam (type Team), Fixture.AwayTeam (type Team)
Stored procedures would be as its for an ORM … FixtureListAll, FixtureGetByID, TeamGetByID
Options I can see are either:
1.
FixtureList
and then work out what unique TeamIDs there is and then do a batch call on
TeamGetByID
So 2 trips to DB
OR
2.
FixtureList and deep load 1 lot of children underneath, so bring back the Teams also, but this would be for all entities so performance would be hit and miss
So 1 trip to DB but bring back more than you need
OR
3.
A fetching strategy and you create a single stored procedure like option 2 but only when you need it, so default to lazy load or something
So 1 trip but because you programmed for this you are aware of what your bringing back
Any advice/suggestions welcome
October 13th, 2009 at 2:55 pm
Sorry when I say “for all entities” i mean for anything using the framework so like a default way of getting things from the DB
October 13th, 2009 at 3:13 pm
advice… umm, can you get a new job? :p
the fetching strategy is going to take the most effort out of the 3 options, but it gives you the flexibility to optimize for each specific scenario that you encounter
October 13th, 2009 at 4:02 pm
I assume you would just call something similar to .net tiers…?
Fetch(entity or collection, levels, types to include, types to exclude)
Fetch(fixtures, 1, typeof(Team), typeof(Venue))… you just call this in your service layer and default to lazy loading otherwise.. or define something on the domain entity to say what to types to get by default? I would be using ActiveRecord under the service layer.
While this isn’t one trip it would still be less than several 100 potential
public class Fixture
{
// This would be ignored as the deeploading options excldues this
[DeepLoad]
public Venue Venue
{
return this.venue;
}
[DeepLoad]
public Team HomeTeam
{
get
{
return this.homeTeam;
}
}
[DeepLoad]
public Team AwayTeam
{
get
{
return this.awayTeam;
}
}
}
October 15th, 2009 at 1:25 pm
Any thoughts in regards to fetching strategys?
October 15th, 2009 at 2:09 pm
Sorry, haven’t really had the time to look into this