Beware The Evils Of Code Generation
Posted by Davy Brion on 22nd September 2008
We have this rather large project at work, one that’s been in development for a few years now. This project has had many releases already, and from a business perspective, is quite successful. From a technical point of view, there were definitely some things that could’ve been better. The largest problem is that this project uses a very extensive code-generation process. This code generation process basically retrieves all of the database metadata and generates an entire Data Access Layer (based on basic ADO.NET), a shitload of automated tests to cover that entire DAL, a whole lot of extra classes which form a data-driven business layer (real business logic can still be added though), and again, a shitload of automated tests that cover the data-driven business layer.
Now, the people who originally came up with the code generation obviously had good intentions in mind. They wanted to maximize developer productivity so everyone could implement the required features as fast as possible. And that’s pretty much the reason why people turn to code generation: to increase productivity. Code generation however, is not a good way to do that. In the short term, it definitely increases productivity though. But it comes at a terrible cost: an incredibly large technical debt. The thing about code generation is that it’s basically a shortcut. When it comes to developing software, each shortcut brings some kind of technical debt with it, and sooner or later, you have to pay off that debt, or risk having your knee caps shattered (why yes, i am watching my Soprano DVD’s again!).
Now, the technical debt you incur by implementing a certain feature in the quick-n-dirty way instead of doing it properly from the start is in most cases small and easy to pay back if you don’t put it off too long (obviously, it’s best to simply avoid having to incur any technical debt in the first place). Generating an incredibly large library of code, and then using that code all over the place has an impact that you simply can not recover from when the project has been in development for a long time. If you want to make serious changes in your generated code so you could, for instance, reduce coupling to concrete classes, you may be in for a rough ride.
In our case, most of the things you could do with the generated code could be done in more than one way. Every possible usage scenario was being used at least somewhere in the application’s code base. And not just in a few places, but pretty much all over the code base. This made it pretty much impossible to make changes in the templates that would be used to generate the code, which would introduce compiler errors at least somewhere in the system. And not just a few of them, hundreds of them. Sure, you could start fixing them. And then you could start on your next change in the templates, and it would lead to a few hundred more compiler errors. If you’d simply have to change a bit of syntax here and there, it would be painful, but at least it wouldn’t be something you couldn’t overcome. But if you want to change the behavior of the generated code, you’re in pretty big trouble. All of the code that uses the generated code expects certain behavior to occur. Changing the templates would mean changing a lot of the real code.
When you’re in a situation like this, Michael Feathers (author of Working Effectively With Legacy Code) can’t even help you out. Hell, even Batman would run away like a frightened schoolgirl. Which is basically what we did too. We decided to leave the generated code and the real code as is for now, and we came up with a new architecture (with no code generation) which we’re going to use to develop the new functionality. The old code will slowly be migrated to the new architecture whenever we have room to do it, or whenever we need to make serious changes. It’s not perfect, but starting completely from scratch is an approach that wouldn’t make a lot of business sense anyway, and you have no guarantee whatsoever that the ‘new’ system will actually catch up with the ‘old’ system in a timely manner.
I think this serves as a nice example of how a code generation based approach can really come back to bite you in the ass like Jaws with a vengeance. Everything you gain in initial productivity costs you a lot of flexibility in the long term, which (also in the long term) ends up costing you more productivity than you ever gained in the first place.
Always keep in mind that code generation is usually a solution to a problem that wasn’t solved properly in the first place.
Posted in Opinions | 19 Comments »