The Inquisitive Coder – Davy Brion's Blog

Trying to walk that thin line between intelligence and ignorance

Continuous Integration On A Real, Big Project

Posted by Davy Brion on May 24th, 2009

Some of you may remember a post of mine where i showed the complete lack of attention that was being paid to the Continuous Integration of one of our projects. This particular project is pretty big. The development of this project has gone on for years, and will keep going for years to come as it is a strategically important project for us. The actual system is used internally by our company as well as a growing list of customers.

There are essentially 2 big problems with this project. One is a mountain of legacy code with so much technical debt (due to the evils of code generation) it sorta resembles the current economic recession (as in: it’ll take a long time to get everything sorted out). The other problem is more a matter of organization. We’re a pretty small company and while we have great ideas for products, we simply can’t assign a steady, stable team of developers to work on any of these projects on a continuous basis since we all often need to work on stuff that is simply more lucrative at that particular point in time. The result is that this particular project typically has an ever rotating group of developers working on it, usually for short periods of time. Some people work almost full-time on it, but that list is pretty limited.

Not exactly the ideal situation for a large project to apply CI and other agile development practices to, right? Luckily for us, we’re quite stubborn and we try to make the best out of every situation. So back when this project’s CI success rate was only showing a lack of success, we all agreed to follow the CI rules and at this point, i’m pretty proud to be able to show you guys the following picture:

image001

Note: we moved to Team City in July 2008 so i can’t show you any earlier data than that.

Anyways, as you can see, after the dismal months of February and March, the success rate of the CI build gradually started improving again. The average time to fix failing tests also decreased sharply. We still have failing tests from time to time, but at least now they’re all dealt with in a timely fashion. We still have build failures, though those have decreased a lot as well and are always dealt with pretty soon now. I’m not sure if this is because of my ‘Continuous Bitching’ whenever the build fails, or that everyone bought into the concept of CI again (for my own sake, i’ll just assume that it’s the latter instead of the former) but i’m pretty happy with the results that we’re getting.

Also, take a look at the number of tests (we’re at 16000+ now) and the duration of the build. The build time is about 48 minutes on average now, which is obviously way above the recommended 10 minutes. I’d love to see this go down to about 20 minutes (which i’d find very acceptable considering the size of the project) but that’s gonna take a long while. Of those 16000 tests, there are about 13000 tests that cover the legacy code and they all use the database. And since those 13000 tests use a generated data layer, we can’t just let it run on an in-memory database nor can we mock the database in those tests because all of that generated code, and pretty much everything that was written on top of it, is coupled more tightly than Siamese Twins. We also lose a couple of minutes of build time due to our Genesis processing but that is simply something that we can’t go without anymore so we don’t really mind the extra build time of that part.

So there you have it… the reason i wanted to post this is because when the topic of CI comes up, you always read about ‘instant feedback’ and really quick builds and things like that. It’s simply not always like that in the real world. But with a bit of effort and focus, you can get many of the benefits that are usually attributed to CI, even on huge projects with lots of legacy code.

7 Responses to “Continuous Integration On A Real, Big Project”

  1. whaley Says:

    The build time is about 48 minutes on average now, which is obviously way above the recommended 10 minutes. I’d love to see this go down to about 20 minutes (which i’d find very acceptable considering the size of the project) but that’s gonna take a long while.

    I have no clue how modular your system is but perhaps keep your builds on commits strictly relegated to compiling and running *unit tests* for a particular module/library and do a nightly/twice a day/four times a day “full” build that compiles and unit+integration test of everything.

    Also, how beefy is your build box/cluster? It never hurts to upgrade that thing. Hardware is cheap compared to developer time.

  2. Davy Brion Says:

    well, as i mentioned in the post, there are about 13000 legacy tests that all use the database… only running them once or twice a day would sorta defeat the whole purpose of CI.

    Also, the build server itself is plenty beefy… most of the time is wasted on running those old tests, during which the CPU activity of the build server isn’t really high (around 30%) so upgrading the buildserver isn’t going to make a huge difference

  3. Arjan`s World » LINKBLOG for May 25, 2009 Says:

    [...] Continuous Integration On A Real, Big Project – Davy Brion [...]

  4. Drew Says:

    IMHO, the purpose of CI is short feedback time for breaking code changes. Maybe tests running once an hour is good enough. Maybe not. I don’t know how frequent your checkins are or what your other constraints/requirements are. You know all of that better than I possibly can.

    If you’re trying to cut running time, though, here are some things I’d look into:
    - Have you profiled the test run to find the perf bottleneck(s)?
    - Can any of the tests be parallelized?
    - Assuming the tests have to run serially, is splitting the CI server into several instances (virtual servers or whatever), each running different subsets of the tests an option?
    - Is there a better way to test your generated code than hand-crafting unit tests for it? Maybe that’s not what you’re doing and I misunderstood you. If so, I apologize.
    - Alternately, is there a subset of those tests that give you the biggest test bang for the time it takes to run them (80/20 rule)? If a big bunch of tests always passes, it’s probably not helping you to find new problems to run everything as frequently as tests for problem areas. Yes, this is similar to Whaley’s suggestion. I like his suggestion, too, though.
    - Again, maybe I misunderstood you, but if you’re having problems stubbing db accesses in the generated code, why not create a “test build” wherein the generator creates stubbed code? Sure, that’s not exactly the same as release code and it’s a good idea to run against your “real” code, too, but if you ran against a test build frequently and a real build less frequently I bet you could speed up the feedback.

  5. Davy Brion Says:

    @Drew

    i think we’re averaging around 8 builds a day… it would be more if we allowed concurrent builds of the same project on our buildserver but we’ve disabled that so the builds for this project won’t prevent other projects’ builds from running frequently

    as for performance bottlenecks in the old tests… we did that once, there was one part that was executed before every test which caused a lot of slowdown. We modified that last year, which actually took the build time from 90 minutes to around 40 minutes back then. Other than that, there aren’t really any specific performance issues we can easily fix for those tests. Going with several instances of the CI server is unfortunately not an option.

    We’re not writing unit tests for the generated code… the codegenerator actually generates a whole bunch of tests for the actual code it generates. I seriously dislike that (IMO the codegen should have its own tests to make sure it generates correct code) but there’s really not that much we can change about the whole thing at this point. The whole codegen process is way too complicated for what it originally should’ve been and we’re kind of stuck with it for parts that haven’t been rewritten yet.

    Generating stubbed code purely for testing would make a lot of those old tests useless as well… the generated code is basically an extensive datalayer (and a pretty bad one at that) and a lot of those tests use parts that are generated, and parts that were added to it (such as custom SQL queries). If we were to generate stubbed code, the tests would fail the moment they would execute any custom query which happens very frequently.

    Making changes to the codegen is something we try to avoid as much as possible. Every time we try to make a change there, it causes problems and we end up reverting to the way it was before. It’s also hard to convince management to spend time on the codegen since the idea is to phase it out and basically rewrite everything eventually.

  6. Andreas Grabner Says:

    Thanks for sharing that post with us.

    Analyzing build process performance can be done by using the appropriate tools when executing your unit or integration tests. Profilers, Coverage Tools or diagnostics solution can identify which methods take how long and how often those methods are executed by your unit tests. This helps you to identify methods that already have a very good coverage – and also lets you find the impact a single method has on the overall test execution.

    Another interesting aspect is optimizing compile time. I’ve recently worked with a client that is running a build for every single check-in that is done by the developers. Here it is very critical to optimize compile times and unit test execution times. They use dynatrace to analyze test execution and are now also looking into diagnostics of their compile times.
    Alois – a colleague of mine – wrote an interesting article about <a href=”http://blog.dynatrace.com/2009/05/04/performance-management-in-continuous-integration/”Performance Management in CI.

    Hope my comment helps

  7. Drew Says:

    Ah. Nightmarish. On the bright side, the problems you’re running into might be justification for a future refactor.

    I can’t wait to see how things progress. Thanks for posting all of this.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>