(Ab)Using Conventions To Enforce Good Practices

14 commentsWritten on January 26th, 2012 by
Categories: NHibernate, Performance

I always tell people to explicitly specify the lengths of their string properties in their NHibernate mappings for performance reasons. If you don't specify them, the ADO.NET parameter lengths of those strings will always be set to the length of the actual string value that's been assigned to the parameter. This is a problem for SQL Server, because it can't cache those statements as efficiently as it would if the parameter lengths were always the same for a given statement. Simply put, if you don't specify the lengths, SQL Server's statement cache gets polluted with a bunch of statements that are often the same, but they're considered to be different simply because of the lengths of those string parameters. And this can really hurt the performance of your application.

Of course, not everyone remembers to set those lengths, so I thought it'd be great if I could force people to do this. With a little creative use of Fluent NHibernate's conventions, it's quite easy to enforce this:

public class StringsMustHaveLengthConvention: IPropertyConvention, IPropertyConventionAcceptance
{
    public void Apply(IPropertyInstance instance)
    {
        var msg = string.Format("The string property '{0}' of type '{1}' does not have a length value specified, " +
            "which is required for performance reasons. Add something like this to your mapping override:\r\n" + 
            "\tmapping.Map(e => e.{0}).Length(50); // with an appropriate length for this property",
            instance.Property.Name, instance.EntityType.Name);

        throw new MappingException(msg);
    }

    public void Accept(IAcceptanceCriteria<IPropertyInspector> criteria)
    {
        criteria.Expect(x => x.Type == typeof(string)).Expect(x => x.Length == 0);
    }
}

With that convention in place, you won't even be able to run your code until you've specified the string lengths :)

Stop Storing Passwords Already!

16 commentsWritten on January 23rd, 2012 by
Categories: security

This is largely common sense already, but I still frequently run into people who don't know how dangerous this is or how to properly store user credentials. The many Anonymous hacks in the past year that resulted in the leaking of users' passwords also show that many sites still store passwords in either clear-text or encrypted form. It's actually quite simple to store credentials safely, so here's a quick recap and example.

The biggest issue with storing passwords is that you have to assume that it's always possible that someone can get access to your database. Yes, even if it's not directly exposed to the outside world, which it never should be. Whatever security measures you've put in place to protect your database, it's a good idea to assume that sooner or later, someone will be able to punch a hole through your security measures and be able to read the data. So obviously, you really don't want to store clear-text passwords. You also don't want to store encrypted passwords because encrypted data can always be decrypted. And if people get access to those encrypted passwords even if they weren't supposed to, it'd be wise to assume that they also know how to decrypt them, or that it won't take them long to figure it out.

A much better approach is to store a hashed representation of the password instead, using a strong one-way cryptographic algorithm and a unique salt value per password. If the cryptographic algorithm is one-way, it means you can't apply another algorithm to get the original source value again. The only way to compare passwords is to apply the cryptographic algorithm on a given password using the originally used salt value, and then compare the resulting hash with the one you've stored. If they are identical, the given password is the same as the one that was used originally. If they differ, the password is invalid.

Attackers can still employ rainbow tables to try to find password values that generate the same hashes as the ones in your database. Luckily, generating rainbow tables takes time and plenty of space as well so it makes it much harder for attackers to find the passwords. This is why it's so important to use a unique salt value per password. It effectively means that a rainbow table would have to be generated for every single salt value that you've used, making it practically infeasible to find the original password values.

Let's demonstrate this with a simple example. The example is from a Node.js application, but this technique can be applied with whatever technology stack you're using.

This is my User model:

var mongoose = require('mongoose'),
    crypto = require('crypto'),
    uuid = require('node-uuid'),
    Schema = mongoose.Schema,
    ObjectId = Schema.ObjectId;

var userSchema = new Schema({
    name: { type: String, required: true, unique: true },
    email: { type: String, required: true },
    salt: { type: String, required: true, default: uuid.v1 },
    passwdHash: { type: String, required: true }
});

var hash = function(passwd, salt) {
    return crypto.createHmac('sha256', salt).update(passwd).digest('hex');
};

userSchema.methods.setPassword = function(passwordString) {
    this.passwdHash = hash(passwordString, this.salt);
};

userSchema.methods.isValidPassword = function(passwordString) {
    return this.passwdHash === hash(passwordString, this.salt);
};

mongoose.model('User', userSchema);
module.exports = mongoose.model('User');

Notice that the salt property of my User type has its default value set to 'uuid.v1'. In this case, uuid.v1 is a function which will be invoked by Mongoose whenever a new User instance is created. Every User instance will thus have a UUID value stored in its salt property. You can also see that I'm not storing the given passwordString in the setPassword function, but that I calculate the hash value based on the passwordString and the UUID salt value.

Suppose I create a user with the following code:

var user = new User({
    name: 'test_user',
    email: 'blah'
});
user.setPassword('test');

user.save(function(err, result) {
    if (err) throw err;
}); 

Its database representation will look like this:

{ 
    "passwdHash" : "b604367796274cf64177eec345532fc6ca66c6f0501906f82bb03f7916265e9d", 
    "name" : "test_user", 
    "email" : "blah", 
    "_id" : ObjectId("4f1dbb2cfa6157b118000001"), 
    "salt" : "304a33f0-45fc-11e1-80d2-43c594a44fa0" 
}

If an attacker would get access to this, he'd have to generate a rainbow table using the salt value, which takes time, and even then he has no guarantee that the rainbow table will actually contain the correct password. Again, this is why it's so important to use a unique salt for every password. Also, you can use whatever value you want as the salt value so if you can determine it based on some other fields or by using a specific formula you don't need to store the actual salt value. It's recommended to use a long salt value though. Theoretically speaking, it's safer if the salt value isn't stored so clearly as I'm doing here, but even with the salt value clearly visible to a possible attacker, it would still be practically infeasible for him to generate all those rainbow tables.

And of course, my actual authentication function is still very simple as well:

var authenticate = function(username, password, callback) {
    User.findOne({ name: username }, function(err, user) {
        if (err) return callback(new Error('User not found'));
        if (user.isValidPassword(password)) return callback(null, user);
        return callback(new Error('Invalid password'));
    });
};

So as you can see, there's nothing hard or complicated about storing credentials in a secure manner. It's quite easy to do so and there are no downsides to doing this.

How Do You Pick Open Source Libraries?

22 commentsWritten on January 16th, 2012 by
Categories: Opinions, Software Development

I'm currently looking into which library I'm going to use to handle authentication in a breakable toy project. Now, despite it being just a breakable toy, I want to do it with as few constraints on technical quality as possible because I want to maximize the learning experience I'm going to get from it. That means I don't just want to quickly put something together that just works. I want something that works, but that would also hold up in real world scenarios, even though the project will at best only be used by myself. Which means that I'm going to be picky about any libraries that I take a dependency on, just as I would if this were a project that I'd be getting paid to work on.

So as I was browsing through a few possible alternatives for my authentication needs, I started thinking about my thought process when evaluating libraries/frameworks to use. I generally base my decision on the following items, listed in order of importance (to me):

  • How well does it work for my scenario? If a library satisfies all other items on this list, that certainly doesn't mean it's an automatic lock. How it works and the impact it has on my code is definitely the most important factor.
  • Popularity. I've noticed that I let the number of watchers/forks on sites like Github influence my opinion. If a project has many watchers and many forks, odds are high that there's a relatively large group of happy users as well as people involved with the project. It also increases the odds that the project will be around for a while. Of course, inactive Open Source projects often remain available as well but if nobody's working on it, I'm not exactly tempted to take a dependency on it. Log4net is a notable exception to this, obviously. But when a project has a lot of people interested in it, or better yet, contributing to it, it's a good sign that you'll easily get help if needed, it's only going to get better in the long run and that it might get forked should the original developers stop working on it. As the author of an Open Source project that doesn't have a lot of watchers/forks (Agatha), I'm aware that my point of view on this is rather hypocritical but hey, it is what it is.
  • Code quality. I don't have the time to do an in-depth review of the code as I'm sure most of us don't do either. But I do like to glance over the code to get a general feel of the quality of the code. I focus mostly on the clarity of the code and also keep an eye open for sloppiness or downright WTF's. I guess the questions I'm mostly trying to answer when doing this are: "is this code I'd like to try to improve or fix if I need to?" and "how easy would it be to debug this when I need to troubleshoot some non-obvious issues?".
  • Location of code and issue tracker. A lot of people will probably take issue with this, but I consider it to be a major plus if the project is on Github. Not just because of my personal preference of Github, but because they truly encourage people to collaborate and contribute to projects and they make it very easy to do so. Also, the site is fast! I cringe when I have to look over issues of projects on Codeplex because it's just terribly slow. And the UI doesn't come close to that of Github either. I've heard that Bitbucket is pretty similar to Github, but I've never even looked for projects there. In any case: I want to be able to download the latest version of the code at any time, or of a particular branch if I need to, as easily as possible. I also prefer an issue tracker which is fast, responsive and easy to search. It doesn't have to be Github, but those 2 requirements are important to me.
  • License. If it's GPL, I don't use it. Also, I check whether or not a commercial license needs to be purchased when you want to use the library/framework in production. Pay attention to dual-licensed projects because that Open Source license might not apply to commercial/production use!

I'd love to hear your thoughts on this. Did I miss any important factors? I just quickly put this post together so it's likely that I missed some good ones :)

Hosting A Node.js Site Through Apache

2 commentsWritten on January 15th, 2012 by
Categories: node.js

I recently lost some time trying to figure out how to host a Node.js site through Apache, so I figured I might as well write a post about how I got it working. Of course, this approach only makes sense if you already have a server that's running Apache and you want to add your Node.js site with minimal impact/changes on your server. If you're not using Apache already but just want to publish a Node.js site, you're better off using Node.js to host it directly, or to put Nginx in front of it since it's more lightweight than Apache. But anyways, here's how you get it working with Apache.

First of all, you need a way to start your Node.js process automatically when your system boots, and to shut it down when the system is shut down. This will depend on the type of server you're running on. In my case, it's a Debian server so I just went with the sysv init script from Nico Kaiser. Another popular alternative is the upstart utility, which is already preinstalled if you're using Ubuntu. Once you have a start|stop|restart script in place, you'll want something to monitor the Node.js process to restart it in case it goes down. An easy to use tool for this is monit. Nico Kaiser again has a good example script available for Node.js on Github.

Once you have your sysv init or upstart script in place, as well as monit, your Node.js process can stay running on your server. Of course, you probably have it set to listen to connections on some other port than port 80 because that's what your Apache server is listening on. So now, the only thing you have to do is configure Apache to proxy all requests coming in on port 80 through the URL of your Node.js site to your local Node.js process. You'll first need to install mod_proxy and mod_proxy_http. After that, the configuration to make it work is quite easy:

<VirtualHost 109.74.199.47:80>
    ServerAdmin davy.brion@thatextramile.be
    ServerName thatextramile.be
    ServerAlias www.thatextramile.be

    ProxyRequests off

    <Proxy *>
        Order deny,allow
        Allow from all
    </Proxy>

    <Location />
        ProxyPass http://localhost:3000/
        ProxyPassReverse http://localhost:3000/
    </Location>
</VirtualHost>

And that's it. Every request coming in at http://thatextramile.be or http://www.thatextramile.be will be forwarded to http://localhost:3000 where Node.js is listening. Note that the ProxyPassReverse is required to make sure that all HTTP response headers will contain the proxied URL instead of the real one (localhost).

If you need the raw throughput that Node.js offers, this solution is far from optimal. Every request that comes in through Apache will cause an Apache thread to wait/block until the response is returned from your Node.js process. This is essentially the same as when hosting PHP or Ruby through Apache, so it's not a problem, but it does take away one of the benefits of using Node.js. Again, this approach only makes sense if you're already using Apache to host other sites and you just want to add a Node.js site with minimal impact to your server.

Company Website Finally Online

2 commentsWritten on January 15th, 2012 by
Categories: Off Topic

I finally got around to putting my new company website online. I originally planned to do this sometime last year, but it just kept getting postponed for a variety of reasons. I'm pretty bad at graphic/web design so I had the design done by Ken Bekaert since I thought he had already done a great job on my logo and business cards. Next up was the slicing of the design to HTML and CSS. I could theoretically do that myself, but it would've taken me way too long and I would not have enjoyed it one bit, so I preferred to have it done by some old friends who know what they're doing. After that, it was finally my turn to play around with it :)

So of course, I couldn't resist doing the site with Express/Node.js. And I'm pretty happy with the result too: about 130 lines of server-side code that allows me to write the content of the site in Markdown (including caching and automatically updating the cache if the Markdown files are updated) and my preferred solution to displaying feed items. Does a simple site like this call for Express/Node.js? Of course not. But I knew it would be simple and fun, so that's why I used it.

The hardest part was actually writing the content of the site. Apparently, writing content for a company website is a lot harder than churning out a few blog posts occasionally, and I definitely underestimated that. Not because there's a lot of content (there's very little actually), but because you want to avoid ending up with a lot of content. It has to be concise and yet still contain everything you want it to say. After many revisions, it says everything I want it to say, and unlike on this blog I actually managed to keep it short :)

An extra benefit of doing it with Express/Node.js is that it reminded me how much fun I have working with that stack, so I'm going to get back to work on my breakable toy which I've been neglecting for too long. I suppose this will lead to a few future posts as well :)