Keeping our code readable and easy to understand is a goal that many of us try to reach as much as possible. Ideally, any developer should be able to look at a piece of code and be able to grasp exactly what the code does in a short amount of time. Data retrieval code is the kind of code that often fails to achieve that goal, at least IMO. A lot of applications will have at least a couple or more non-trivial queries that need to be executed. What do i mean by non-trivial queries? Queries with joins spanning several relationships, queries that need to take certain 'non-optimal' data structure specialties into account, pretty much any query that makes you read it twice (or more) before you understand what it does.
You can hide the complexity of those queries in views or stored procedures but by doing that, you're moving some very important details outside of your code. You're introducing one more little hurdle the next developer will have to get over when trying to comprehend a certain piece of code which has to use that view or that stored procedure.
Ideally, you want something in your code which clearly states the intention of a non-trivial query so that it's easy to understand immediately, or at least without having to read it multiple times. Personally, i think that straight SQL is a poor choice to achieve this. HQL (Hibernate Query Language) is a bit better, but even that isn't always as readable as it should be. NHibernate's Criteria API can offer you pretty good readability however. Of course, it does require some familiarity with the Criteria API before those queries are easy to comprehend, but i'd argue that SQL, HQL and LINQ suffer from the same problem.
I'm going to a use a query from a real application as an example. This application unfortunately uses a legacy database which we can't really change. Well we can change some parts of it, but not the whole thing. Anyways, there are some parts of the database model which are somewhat complex from a table structure point of view, but from a domain point of view, it's not that hard at all if you map everything properly. Do i want to be faced with the database complexity every time we need to use data from these areas of the data model in our code? Hell no. Do i want to know exactly what is going on from a functional point of view? Hell yes.
Alright, let's just get to the specific query:
var teamIdsForUser = DetachedCriteria.For<User>()
.Add(Restrictions.Eq("Id", userId))
.CreateCriteria("Teams", "team")
.SetProjection(Projections.Property("team.Id"));
var incidents = Session.CreateCriteria(typeof(Incident), "incident")
.CreateCriteria("Configurations", "configuration", JoinType.InnerJoin)
.CreateCriteria("configuration.Structure", "structure", JoinType.InnerJoin)
.Add(Subqueries.PropertyIn("configuration.OrganisationalItem.Id", teamIdsForUser))
.Add(Restrictions.EqProperty("incident.ApprovedPhase", "structure.Phase"))
.Add(Restrictions.IsNull("incident.EndDate"))
.SetResultTransformer(CriteriaSpecification.DistinctRootEntity)
.List<Incident>();
If you have experience with NHibernate, this query will probably make sense to you rather quickly. If you don't have NHibernate experience, this query will probably be as clear to you as a complex SQL query is clear to the average developer who claims to know SQL (not every developer who claims to know SQL actually knows more than the very basics of it).
I'm not going to explain the query, or what exactly makes it special, just yet. Read it again, try to figure out what it does, and try to see if there's really anything special there. I'll show the actual SQL below, but if you want to play along, try to figure it out first.
Ready?
Ok, here's the actual SQL that is generated (by NHibernate, so the readability of this query could definitely be improved):
SELECT this_.ID as ID21_2_,
this_.StartedByUserID as StartedB2_21_2_,
this_.EndedByUserID as EndedByN3_21_2_,
this_.Name as Name21_2_,
this_.Description as Descript5_21_2_,
this_.StartDate as StartDate21_2_,
this_.EndDate as EndDate21_2_,
this_.ScenarioID as ScenarioID21_2_,
this_.ApprovedPhaseID as Approved9_21_2_,
this_.ApprovedPhaseDate as Approve10_21_2_,
this_.ApprovedPhaseUserID as Approve11_21_2_,
this_.SuggestedPhaseID as Suggest12_21_2_,
this_.SuggestedPhaseDate as Suggest13_21_2_,
this_.SuggestedPhaseUserID as Suggest14_21_2_,
configurat1_.ID as ID15_0_,
configurat1_.OrgItemid as OrgItemid15_0_,
configurat1_.OrgChartID as OrgChartID15_0_,
configurat1_.IncidentID as IncidentID15_0_,
structure2_.orgchartid as orgchartid33_1_,
structure2_.isadviser as isadviser33_1_,
structure2_.roleid as roleid33_1_,
structure2_.parentroleid as parentro4_33_1_,
structure2_.phaseid as phaseid33_1_,
structure2_.descriptionentryid as descript6_33_1_
FROM console_incident this_
inner join console_incident_organisation configurat1_
on this_.ID = configurat1_.IncidentID
inner join org_chart structure2_
on configurat1_.OrgChartID = structure2_.orgchartid
WHERE configurat1_.OrgItemid in (SELECT team1_.orgitemid as y0_
FROM application_user this_0_
inner join org_item this_0_1_
on this_0_.userid = this_0_1_.orgitemid
inner join org_item_to_org_item teams3_
on this_0_.userid = teams3_.childid
inner join team team1_
on teams3_.parentid = team1_.orgitemid
left outer join org_item team1_1_
on team1_.orgitemid = team1_1_.orgitemid
WHERE this_0_.userid = 55397 /* ?p0 */)
and this_.EndDate is null
and this_.ApprovedPhaseID = structure2_.phaseid
First of all, if anyone were to write this SQL query manually, you'd hope the author would use much clearer aliases. That would obviously improve the readability of this specific SQL query, but there is another issue here. Take a close look at the SQL statement in the subselect, and compare it to the DetachedCriteria which is assigned to the teamIdsForUser variable in the C# code listed earlier.
The DetachedCriteria looks really simple, doesn't it? So why is the actual SQL for that part more complex than expected? Well, in this particular database there is an inheritance relationship which complicates things. A 'user' inherits from an 'org_item', and a 'team' inherits from an 'org_item' as well. This inheritance relationship was implemented using the Table Per Class strategy. But wait, there's another twist. There is a many-to-many relationship between records in the org_item table. Each user can be linked to multiple teams, and each team can be linked to multiple users. But since the many-to-many relationship uses the org_item table (the common base class really), many-to-many associations can be made with teams on both sides of the relationship, or users on both sides of the relationship. Confused already? I can imagine. Surely, this complexity is not something you want exposed in your actual code, right? Keep in mind that this complexity would need to be dealt with practically every time you deal with users and teams in code. Both concepts are pretty important in the domain of this application so you can imagine how often they are used ![]()
In our case, this complexity is handled once in our NHibernate mapping files. We can keep it out of our queries while we let NHibernate handle the complexity for us. At the same time, our code still clearly communicates the intent of each query. This, IMO, is one of the huge benefits of using a powerful ORM.
Thoughts?
Pingback: Weekly Links #44 | GrantPalin.com