Code Quality

Take Advantage Of Your BDD Framework

10 commentsWritten on August 3rd, 2011 by
Categories: Code Quality, testing

NOTE: the examples in this post use JavaScript and Jasmine, but the advice is applicable to whatever BDD framework you use.

I'm using Jasmine for the automated tests in my breakable toy project. It's a BDD framework, very similar to RSpec so it makes it really easy to write your tests in the given-when-then style which i'm (finally) starting to like more and more. What makes given-when-then testing interesting is that you have 3 explicit steps in a test: setting up the 'given' part, doing something in the 'when' part, and asserting on it in the 'then' part. It makes it especially easy to reuse the 'givens' and even the 'whens' if you just want to add a few 'thens'.

Let's first start off with a bad example, one that i wrote about 2 weeks ago:

describe('given an existing customer', function() {

    describe('when it is retrieved from the database', function() {

        it('should contain the same values that have been inserted', function() {
            var customer = new CustomerBuilder()
                .withIncludeContactOnInvoice()
                .build();

            customer.save(function(err) {
                Customer.findById(customer.id, function(err, result) {
                    helper.customersShouldBeEqual(result, customer);                    
                    asyncSpecDone();
                });
            });
            asyncSpecWait();
        });

    });

});

It's organized in a given-when-then style, but in a bad way. The only benefit that i'm getting from it here, is that the structure is sort of easy to read: given an existing customer, when it is retrieved from the database, it should contain the same values that have been inserted. When i wrote it, i knew that saving the customer should be done in the 'given' step, retrieving it should be done in the 'when' step and comparing the fields of the inserted customer and the retrieved customer should be done in the 'then' step. In this case, everything is being done in the 'then' step.

When i wrote that code, i figured it would just be easier to do it this way, because on Node.JS every I/O call is asynchronous and i thought it would hurt readability if i were to split everything up according to the given-when-then rules due to the asynchronous calls. But then i wanted to add tests for updating and deleting a customer. In both cases, the 'given' part would again be 'given an existing customer'. So i wanted to add them in the right place, which meant i had to choose between duplicating the code to save the customer in each test, or bite the bullet and split it up properly and deal with the asynchronous calls.

Let's start with our original example, and move the saving of the customer to the 'given' step, and the retrieval to the 'when' step:

describe('given an existing customer', function() {

    var customer = null;

    beforeEach(function(err) {
        customer = new CustomerBuilder()
            .withIncludeContactOnInvoice()
            .build();

        customer.save(function(err) {
            expect(err).toBeNull();
            asyncSpecDone();
        });
        asyncSpecWait();
    });

    describe('when it is retrieved from the database', function() {

        var retrievedCustomer = null;

        beforeEach(function() {
            Customer.findById(customer.id, function(err, result) {
                expect(err).toBeNull();
                retrievedCustomer = result;
                asyncSpecDone();
            });
            asyncSpecWait();
        });

        it('should contain the same values that have been inserted', function() {
            helper.customersShouldBeEqual(retrievedCustomer, customer);
        });

    });

});

Despite ending up with more lines of code, there are some notable improvements here. We take advantage of the beforeEach method, which is executed once before each spec (in the case of Jasmine, a call to the 'it' method is a spec) is executed and once before each spec in each nested suite (in the case of Jasmine, a call to the 'describe' method creates a new suite) is executed. Most BDD-frameworks have something similar. Obviously, due to the asynchronous nature of Node.JS we use the asyncSpecWait() and asyncSpecDone() calls (added to Jasmine by jasmine-node) to wait until the asynchronous calls have completed before we move to the next step. In production Node.JS code, you really don't want to do this since that completely takes away the benefits of the platform, but for automated tests, it makes sense to do so. This enables us to put the right code in the right place: saving the customer is done in the 'given' step, retrieving it is done in the 'when' step, and the 'then' step only contains the code to verify that both instances are equal. If we need to verify something else about retrieved customers, we could easily add more specs (calls to the 'it' method) without having to repeat any of the setup work.

Now we can also add the tests for the update and delete scenario, within the context of the 'given an existing customer' scenario.

describe('given an existing customer', function() {

    var customer = null;

    beforeEach(function(err) {
        customer = new CustomerBuilder()
            .withIncludeContactOnInvoice()
            .build();
            
        customer.save(function(err) {
            expect(err).toBeNull();
            asyncSpecDone();
        });
        asyncSpecWait();
    });

    describe('when it is retrieved from the database', function() {

        var retrievedCustomer = null;

        beforeEach(function() {
            Customer.findById(customer.id, function(err, result) {
                expect(err).toBeNull();
                retrievedCustomer = result;
                asyncSpecDone();
            });
            asyncSpecWait();
        });
    
        it('should contain the same values that have been inserted', function() {
            helper.customersShouldBeEqual(retrievedCustomer, customer);
        });
        
    });
    
    describe('when it is modified and updated', function() {
            
        beforeEach(function() { 
            customer.name = 'some other customer';
            customer.vatNumber = '0456.876.235';
            customer.address = {
                street: 'some other street',
                postalCode: '12345',
                city: 'some other city'
            };
            customer.phoneNumber = '123456789';
            customer.contact = {
                name: 'some name',
                email: 'some_email@hotmail.com'
            };
            customer.save(function(err) {
                expect(err).toBeNull();
                asyncSpecDone();
            });
            asyncSpecWait();
        });

        it('contains the updated values in the database', function() {
            Customer.findById(customer.id, function(err, result) {
                helper.customersShouldBeEqual(result, customer);
                asyncSpecDone();
            });
            asyncSpecWait();
        });

    });

    describe('when it is deleted', function() {
        
        beforeEach(function() {
            customer.remove(function(err) {
                expect(err).toBeNull();
                asyncSpecDone();
            });
            asyncSpecWait();
        });     

        it('can no longer be retrieved', function() {
            Customer.findById(customer.id, function(err, result) {
                expect(result).toBeNull();
                asyncSpecDone();
            });
            asyncSpecWait();
        });

    });
    
});

In this case, the only duplication we have are the calls to asyncSpecWait and asyncSpecDone, which can't really be avoided with this style of testing on the Node.JS platform. Other than that, each part of the code is focused solely on that what it needs to do. If you're using a BDD framework, be sure you leverage it to make sure each part of your testcode is as focused on its task as it can be.

Clean Code Versus Great Code

24 commentsWritten on July 21st, 2011 by
Categories: Code Quality, Opinions

I've had some interesting discussions with other developers about writing code recently. I often have the impression that some developers put too much emphasis on clean code. Don't get me wrong, i strive for clean code as well, and have written about its importance quite a lot in the past couple of years. But when i'm coding, clean code is my secondary goal and it could never take the place of my primary goal: making it work. And preferably, i want to make it work great.

A lot of people love to talk the talk when it comes to writing clean code. They'll stress their dedication to it, in some cases even wearing Uncle Bob's green band while coding so they'll never lose sight of what's incredibly important to them: writing clean code. Unfortunately, i've noticed on numerous occasions that many of these people don't always put as much emphasis on what the code is doing compared to how it looks. Sometimes they don't really bother to learn what their ORM is doing behind the scenes. Or they'll prefer to use something like Automapper to map entities to DTO's even though it is woefully inefficient compared to simply retrieving projected data. They don't always think about the cost of multiple remote calls or sending way too much data over the wire. And when they're not perfecting the art of writing bowling games over and over again, they just might hit the database in loop.

Clean code is not necessarily great code, nor is great code necessarily clean code. To me, great code is code that works great, performs great, is easy to understand and easy to change. In that order. I know all too well how important it is to easily understand code when you first read it, and to be able to easily and safely make changes to it. But no matter how easy it is to read or change, if it's not doing what it should be doing (including covering all the corner cases) or if it's taking more than its fair share of time to do it, it's not good code. Sure, it might be clean, but it's not great, is it?

That doesn't mean that you should indulge in premature optimization. Unless you have Neo-like skills in this coding Matrix, you're unlikely to be right in even a quarter of all scenarios that you want to optimize prematurely. There are however a few guidelines which will help you avoid most common performance problems. Most other situations are better left ignored until proven by a profiler to be a bottleneck. But you should at least think about what the code is actually doing and whether or not any downsides to that are worth the cleanliness. Don't hesitate to go with the slightly less clean looking code if that code makes more sense from a correctness and performance point of view.

By all means, strive to write clean code. But think twice before you sacrifice its ability to be great.

There’s Only One Valid Metric For Developer Productivity And Quality

17 commentsWritten on February 19th, 2011 by
Categories: Code Quality, Opinions, Software Development

Depending on who you ask, software developers have been programming for over 40 to 50 years. And we still have no truly objective way to measure a developer's productivity and quality level. People can think or say that a developer is fast or productive, but they can't truly quantify it. People can think or say that a developer is great, good or bad, but again, they can't truly quantify it with numbers. We often look at other metrics to give us an idea on quality or productivity such as code metrics, or velocity but those numbers often reflect the efforts of a group of people. After all, a codebase is more often the result of the efforts of more than one person, and a team's velocity is typically the result of the productivity of, well, a group of people. Those numbers can't really be used to gain insight to the quality or productivity of a specific developer.

Not only is the current way of trying to determine the productivity or quality of a developer rather subjective, it's quite often based on things that are only one part of the equation: what they did, and the time in which they did it. I've always felt that that only really tells you half the story. I don't hold much value on knowing how long a developer took to write a certain piece of code or to complete a specific feature. I also don't really care a lot about the code metrics for that piece of code or that specific feature. All i know is that it took some effort. The quality or productivity of that effort can't truly be measured without knowing how much extra effort will be introduced later on because of that effort.

Suppose you have 2 developers, John and Sally. You give them both the same task and you tell them that the estimated workload for that task is 12 hours. John completes the task in 10 hours, and Sally needs 14 hours to complete it. It would be easy to think that John is more productive than Sally based on this. Especially if results like those were to repeat quite often. But what if John's solution requires an extra 6 hours of effort down the line to fix some bugs, or because his code was such a mess to follow that it slowed down future tasks which required another developer to comprehend the code of John's task? That would put John's eventual total to 16 hours of effort, instead of the originally reported 10 hours. And what if Sally's solution didn't introduce any future effort whatsoever down the line? Hell, maybe Sally's solution saved someone else an hour here or there because her solution was so easy to comprehend or perhaps even to reuse. Her total for that task remains at 14 while at the same time maybe even reducing future numbers. In this scenario, Sally is clearly a more productive and better programmer than John though we wouldn't know about it if we'd based ourselves on the initial numbers.

Consider another scenario though. John again spends 10 hours on the same task, and Sally again spends 14 hours. John kept it as simple and as straightforward as he could. Hell, he actually took a shortcut or two to finish faster. Sally is an outspoken member of the Software Craftsmanship movement and stays up to date with all of the latest and greatest patterns and approaches that are making their rounds on the internet. Sally spent a little bit of extra effort to make sure the code is as great as it possibly can be. It's incredibly readable and uses a great new pattern that all the cool kids on the internet rave about. The code metrics clearly show that Sally's solution is of a much higher quality. Unfortunately, it turns out that the other team members are having trouble understanding that code and they often lose time trying to figure it out whenever they need to do something which involves that part of the codebase. While Sally may claim that she took more time because she did it 'better', the extra effort wasn't worth it because no matter how well her intentions with that code, it ended up costing other people time. And what if John's shortcuts didn't introduce any future effort?

There are countless variations on this scenario which makes it virtually impossible to come up with a way to measure productivity or quality based on hours spent or code metrics. The only way to objectively measure this is to define a new metric which holds into account the extra effort that will be introduced later on. I'd call this metric Eventual Efficiency. Quality or productivity by themselves don't really mean anything if one influences the other negatively. Efficiency however can be seen as a combination of the two. Did you write truly fantastic code which ended up not mattering at all? Not efficient. Did you write mediocre code that got the job done and didn't (or hardly) introduced future effort? Sounds pretty efficient to me. Did you write great code that reduced future effort? Very efficient! Did you write bad code quickly that ended up introducing a lot of effort? Not efficient at all.

Unfortunately, we'll probably never get to the point where we can actually measure the Eventual Efficiency of developers. Until we do, it's worth keeping in mind that whatever numbers we might be looking at don't really mean anything without the link to the related future effort.

What Do You Think Of This Hack?

14 commentsWritten on November 16th, 2010 by
Categories: C#, Code Quality

I have a class which exposes a fluent interface to build something. Instances of this class contain some state based on the methods of the fluent interface you called and the arguments you passed to those methods. Now, that state is currently private but it's not private in the "oh my god we need to encapsulate this so nobody can read it!11!!!1" sense. In fact, i actually want to access that state from another class. The only 'problem' is that i don't want to add methods or properties to the class to expose this state, because it sort-of pollutes the fluent interface. I know, that's not really a huge issue but still, it'd be nice to keep the fluent interface clean and focused.

One way to expose this state without polluting the fluent interface is to create a separate interface which defines the methods/properties and then have the class implement that interface explicitly. That way you could only access those methods/properties when you cast the instance to the interface type. While there's nothing really wrong with doing that, i kinda have a bad feeling about that because it introduces an interface which is only there to support this little exercise in Intellectual Masturbation.

Instead, i tried this:

    public class MyClassWhichUsesAFluentInterface
    {
        private List<string> someState = new List<string>();

        public MyClassWhichUsesAFluentInterface SomethingFluent(string blah)
        {
            someState.Add(blah);
            return this;
        }

        // ...

        public static List<string> GetState(MyClassWhichUsesAFluentInterface constructedThingie)
        {
            return constructedThingie.someState;
        }
    }

Is it fancy? nope. Is it cool? nope. Does it work? yup. Is it simple? yup.

Good enough for me.

The Truth Is In The Code! Unless It Isn’t…

12 commentsWritten on October 13th, 2010 by
Categories: Code Quality

I'm sure you've all heard phrases like "the truth is in the code" or "only the code is the truth" and stuff like that. These statements are typically used to make it clear that you can't just rely on documentation (which is very likely to be outdated anyway, if not downright incomplete) and that you need to look at the actual source code to know what a piece of software is doing in a certain situation. And in most cases, the code is indeed the truth. But i just wanted to share a little story from my past on this topic.

I was working at a client's office and one of their development policies/rules was to use stored procedures for everything you do with the database. Every single CRUD operation and every single query had to be done through stored procedures. During my time there, i was given the choice to either extend and 'refactor' (it was such a mess it would've been more like refracturing) an existing system, or to just rewrite it. I chose the rewrite option but since there wasn't any documentation at all, i had to figure out the behavior of the system by going through the current source code... which was in a pretty bad shape (hence the decision to rewrite instead of going forward with that code). But as bad as the code was, it was possible (though sometimes frustrating) to figure out what it was doing.

Except for one particular piece of code. I don't remember the actual details but there was a certain situation where the system should've created a new record, but for some reason it didn't. The code was (for once) very clear: it called a stored procedure to create a new record. But at runtime, i could never get it to insert that record at the time it should've been created. I actually spent about 2 hours debugging, going over the possibilities of why it wouldn't create that record even though the code was clearly telling me that it should've created a new record.

And then it hit me. "What if the stored procedure is silently preventing this insert?". Every single stored procedure that i had seen/reviewed from that system didn't contain any logic at all. They were all very straightforward and didn't do anything but the actual database operation. But for some reason, this insert stored procedure had a simple IF statement and it was silently ignoring a set of inserts based on a certain condition (the actual reason behind that IF statement was something i never figured out). I felt like an idiot for wasting time on that without checking with the stored procedure first, but then again, i didn't really have any reason to suspect the stored procedure since they normally didn't contain any logic.

In this case, the code did not contain the truth and actually caused me to waste time believing something would be the case when it wasn't. That just goes to show that as soon as an external system is involved which may contain additional logic which isn't immediately visible from the code, there is a small chance that the code is actually lying to you.