There Is No Excuse For Failing Queries In Production

5 commentsWritten on August 30th, 2009 by
Categories: Opinions

Bertrand Le Roy left a few interesting comments on a previous post. Here's a 'transcript' of the comments (i'm only showing the parts that pertain to this particular topic):

Bertrand Le Roy: Would it make sense for the query to implement IDisposable so that you can put it in a using block? I would be a little concerned about the query failing and never getting disposed of. Me: well, if a query fails the underlying SqlCommand object will eventually be garbage collected so i wouldn’t really worry about that. Failed queries shouldn’t happen frequently anyway IMO because they typically are the result of incorrect code. Bertrand Le Roy:Incorrect code happens. When it happens, you may exhaust your connection pool before that stuff gets collected. Me:code that causes failing queries should never even make it to your source code repository… you do write tests for queries, no? If you deploy code which causes frequently failing queries, then you have some fundamental problems in the way you create, test and deliver software.

I wanted to expand on this since i think it's an important topic. Simply put, i believe that failing queries in a production environment are simply inexcusable.

First of all, every single query in your application should be tested properly with automated tests. Doing this consistently will avoid a lot of problems. In fact, i can only think of 3 reasons why queries could still fail in production, even if you do write tests for each and every one of them:

  • The query is composed dynamically, based on a set of conditions (for instance, a complex search screen with lots of filtering possibilities).
  • Queries that use a dynamic number of parameters (like with a WHERE yaddiyadda IN ({your parameters here}) clause) and the parameter list grows bigger than the maximum allowed by your database.
  • Some idiot modified parts of the database structure without redeploying an updated version of the application.

In case of dynamically composed queries, you obviously need some tests for each and every possible filter that can be added to the query. Parameter values shouldn't cause problems either, unless you're adding their values straight into the SQL statement (which is terrible for a host of reasons). Different filter combinations shouldn't cause issues either since you should have tests for those as well anyway. In any case, this can only fail due to not testing each possible combination.

The second option (dynamic list of parameters) is a bit more difficult. In most cases, you truly don't expect the number of parameters to exceed the limit (which is 2100 on SQL Server 2005) because you typically use this approach when you have a fixed set of data. I'd even go as far saying that using the IN clause with a parameter list that could very reasonably grow much larger over time (based on data in another table for instance) is negligent programming. Simply put, you should only use this approach when you know that the list of parameters won't grow too large. If the list of parameters can grow beyond the allowed limit, you simply need to come up with a different query.

And then there's the last situation... and it's an oh-so-typical one unfortunately. I've come across more than my fair share of people who think it's 'ok' to make changes directly into a production database without going through the proper deployment procedures. This is simply just asking for problems. Whenever a database change needs to be made, you should always make the change first in a development (and preferably an acceptance) environment, and at least run your entire test suite on the modified database. If there are failing tests, then you will at least have the option to fix the code and run the testsuite again until everything is alright. Only then should you upgrade both the application and the database in production.

Proper testing practices and proper deployment procedures can virtually eliminate the possibility of having failed queries in production. Having said all that, we all know that even the best of intentions and practices won't prevent all bugs from showing up in production. I'll once again quote from Bertrand's comment:

Incorrect code happens. When it happens, you may exhaust your connection pool before that stuff gets collected

He does have a point there. However, i personally don't think that you should keep this particular situation in mind whenever you need to execute a query in your code. Like i said, i consider failed queries in production inexcusable and therefore, i do not think it's worth the effort of protecting yourself against a situation that shouldn't occur in every possible place where it might occur.

For instance, in Bertrand's particular example of exhausting a connection pool, you could easily avoid this situation in one single part of the code by making use of a circuit breaker. As soon as you know that your application is having problems connecting to the database, why would you even bother to keep trying? It's much better to fail instantly when the database is known to be down instead of trying to connect over and over again. You could easily use a circuit breaker (either at the service level like in the example i linked to, or at the lowest possible level by using a custom DriverConnectionProvider in NHibernate with circuit breaker logic at that level) to prevent such a problem from getting out of hand.

Now, don't get me wrong... i don't want to advocate improper usage of the disposal pattern or incorrect resource utilization in general, but i don't agree that we should litter our code with checks and precautions for things that shouldn't occur frequently anyway. I do believe that protective countermeasures should be put in place, but only where they make sense. I think this is somewhat similar to proper exception handling practices: don't put a try/catch/finally block in every piece of code, use it where you can deal with it... etc.

  • http://weblogs.asp.net/bleroy Bertrand Le Roy

    Thanks for the explanation but your comment may not be entirely fair (yes, I do write test, thank you very much). Although I saw the testing argument coming from a mile away, it falls short for a number of reasons. Even if you were right that only those three reasons could cause that sort of problem, is there the slightest possibility that on this occasion or others, you might have missed something? On principle, we should never assume that we have thought of everything because most of the time we haven’t. That’s why we write tests, and that’s why we are aware that 100% code coverage is no guarantee for bug-free code.
    I thought that it would be fairly simple to make the objects IDisposable and be done with it, and that’s what I would do in my own code. In the same way, I don’t put try/catch finally around every bit of code because it would be wrong to do so but I do put using blocks around the use of any disposable resource. It’s fine that you disagree with the IDisposable thing.

    Thanks for the great series of posts by the way, I learned a lot reading it and I apologize for nitpicking.

    Cheers,
    Bertrand

  • http://weblogs.asp.net/bleroy Bertrand Le Roy

    mmh, nitpicking again, but incomplete quotations should include “[...]” so that it’s clear they are incomplete. Case in point, correctly managing your disposable objects or letting the calling code do it by implementing IDisposable was kind of the whole point I was trying to make. :)

  • http://davybrion.com Davy Brion

    you don’t need to apologize at all :)

    my ‘problem’ with your comment was simply the ‘incorrect code happens’ statement… yes, it _can_ happen but when it comes to queries it can very easily be avoided. And no, i don’t believe in code coverage to avoid it… But a comprehensive testsuite can.

    As for the whole disposable thing… i’m generally very sensitive when it comes to implementing disposable and using it correctly, but in this case i don’t really see the point. It’s harder to test for proper disposal usage than it is to test the correctness of your queries… and if a developer writes queries that can fail in production, i’d wonder if he/she also considers disposing disposable objects.

  • Richard Lowe

    my ‘problem’ with your comment was simply the ‘incorrect code happens’ statement… yes, it _can_ happen but when it comes to queries it can very easily be avoided. And no, i don’t believe in code coverage to avoid it… But a comprehensive testsuite can.

    A “comprehensive test suite” is presumably your automated tests, yes? Well, if you accept the premise that ‘incorrect code happens’ – which you, Bertrand and I apparently do, than you must also accept that incorrect *test* code can also happen. Incorrect in the real ‘bug’ sense, or in the ‘didn’t think of everything’ sense as Bertrand pointed out before.

    Maybe I misunderstand your points about testing, but no empirical process is going to guarantee 100% that something is true or false. Empirical science certainly doesn’t, and it sounds like you are being categorical, so I guess I’m nitpicking too ;)

  • Richard

    What if the sysadmin is installing an update on your SQL box, and forgets to take your application off-line?

    What if SQL or Windows crashes?

    “Incorrect” code is only one reason why a query might fail. Surely it’s better to code defensively? I’d rather have one page generating errors for a few minutes than an unresponsive server needing a restart.