Author Archive

DTO’s Should Transfer Data, Not Entities

6 commentsWritten on February 19th, 2012 by
Categories: Architecture, Code Quality

I've read a couple of posts recently where the authors were complaining about excessive mapping to DTO's and whether or not it's worth it. I've been a fan of DTO's for a long time, but I'm not a fan of how I frequently see them being used. More specifically, I very much dislike the all-too-common approach of creating a DTO for each entity in your domain model. In the worst cases, the DTO's actually reference each other and try too hard to mimic the structure of the entities. For example, an OrderDto which has a reference to a CustomerDto and a collection of OrderLineDto instances. In those cases, I absolutely agree that all of the mapping involved introduces way too much ceremony to the codebase without really offering any concrete benefits.

Of course, it really depends on the architecture of your system. If you're using services that are only used by the front-end of your system, then I'd still advise against making those entities available outside of the service boundary for reasons that I've discussed earlier. That doesn't mean that you should go the entity-mimicking-DTO-route though. In fact, entity-mimicking-DTO's introduce a few of the same downsides you'd get from exposing entities directly through your services. For an 'internal' service (i.e. only used by your application), I think it makes a lot more sense to use DTO's per service operation which are optimized for the use case that that service operation is meant to implement.

Essentially, an internal service's operations will be either queries or commands. When it comes to queries, why not just do the simplest thing possible? In most cases, it's sufficient to just return the data in the exact same form that the data will be displayed in. Quite often, that means a denormalized set of data where the data comes from more than one type of entity/table. There's no reason to send a bunch of entity-mimicking DTO's that reference each other to the client if you're going to use the data to populate a grid. Just send a list of DTO's which are already in the most optimal form. Populating those DTO's can then simply be done through straight SQL, a view, a stored procedure, a projection through your ORM, a map/reduce operation, or whatever else that makes sense. The point is that in most cases, you should just populate those DTO's as directly as you can instead of mapping to those DTO's. In this case, the DTO's offer a clear-cut benefit and don't really introduce tedious ceremony code in your codebase.

As for the commands (inserts, updates, anything that creates something or results in something happening), why even use DTO's in the first place? People will often argue that the ability to reuse the entity-mimicking-DTO's for these operations is a benefit. Personally, I don't really see the benefit. The mere presence of entity-mimicking-DTO's only encourages people to use them for the queries as well. Instead, I prefer to go with types that encapsulate all of the data relevant to the current command (the data to be inserted/updated, or whatever else the command needs). In a very simple scenario, this could be an InsertCustomerCommand type which simply has properties for the data that needs to be inserted. Nobody will be confusing these types with any other purpose than what their name communicates.

If you're building a web app where the server-side code does everything in process (i.e. without a WCF service), then you can in many cases just use the entities directly without a real drawback, though I'd recommend keeping an eye on unexpected select N+1 problems, since those will frequently come up when you're preparing your views. But even in this case, using DTO's that are optimized for the scenarios in which they're being used can really simplify the query-side of your system a lot.

There's nothing inherently wrong with DTO's if you use them to simply transfer data. If you're using them to transfer entities, you're robbing yourself of their biggest benefit while only making things more complex than they need to be.

NHibernate Training Course In Mechelen, Belgium – April 23/25

No Comments »Written on February 12th, 2012 by
Categories: NHibernate

I'm organizing a public NHibernate Training Course from April 23rd through April 25th in Mechelen, Belgium. This time, I'm taking care of the whole thing myself instead of working together with a more established training company (except for renting a suitable training location). This enables me to offer the course at a lower price than the previous public course. All details can be found here. Feel free to spread the word! :)

(Ab)Using Conventions To Enforce Good Practices

15 commentsWritten on January 26th, 2012 by
Categories: NHibernate, Performance

I always tell people to explicitly specify the lengths of their string properties in their NHibernate mappings for performance reasons. If you don't specify them, the ADO.NET parameter lengths of those strings will always be set to the length of the actual string value that's been assigned to the parameter. This is a problem for SQL Server, because it can't cache those statements as efficiently as it would if the parameter lengths were always the same for a given statement. Simply put, if you don't specify the lengths, SQL Server's statement cache gets polluted with a bunch of statements that are often the same, but they're considered to be different simply because of the lengths of those string parameters. And this can really hurt the performance of your application.

Of course, not everyone remembers to set those lengths, so I thought it'd be great if I could force people to do this. With a little creative use of Fluent NHibernate's conventions, it's quite easy to enforce this:

public class StringsMustHaveLengthConvention: IPropertyConvention, IPropertyConventionAcceptance
{
    public void Apply(IPropertyInstance instance)
    {
        var msg = string.Format("The string property '{0}' of type '{1}' does not have a length value specified, " +
            "which is required for performance reasons. Add something like this to your mapping override:\r\n" + 
            "\tmapping.Map(e => e.{0}).Length(50); // with an appropriate length for this property",
            instance.Property.Name, instance.EntityType.Name);

        throw new MappingException(msg);
    }

    public void Accept(IAcceptanceCriteria<IPropertyInspector> criteria)
    {
        criteria.Expect(x => x.Type == typeof(string)).Expect(x => x.Length == 0);
    }
}

With that convention in place, you won't even be able to run your code until you've specified the string lengths :)

Stop Storing Passwords Already!

16 commentsWritten on January 23rd, 2012 by
Categories: security

This is largely common sense already, but I still frequently run into people who don't know how dangerous this is or how to properly store user credentials. The many Anonymous hacks in the past year that resulted in the leaking of users' passwords also show that many sites still store passwords in either clear-text or encrypted form. It's actually quite simple to store credentials safely, so here's a quick recap and example.

The biggest issue with storing passwords is that you have to assume that it's always possible that someone can get access to your database. Yes, even if it's not directly exposed to the outside world, which it never should be. Whatever security measures you've put in place to protect your database, it's a good idea to assume that sooner or later, someone will be able to punch a hole through your security measures and be able to read the data. So obviously, you really don't want to store clear-text passwords. You also don't want to store encrypted passwords because encrypted data can always be decrypted. And if people get access to those encrypted passwords even if they weren't supposed to, it'd be wise to assume that they also know how to decrypt them, or that it won't take them long to figure it out.

A much better approach is to store a hashed representation of the password instead, using a strong one-way cryptographic algorithm and a unique salt value per password. If the cryptographic algorithm is one-way, it means you can't apply another algorithm to get the original source value again. The only way to compare passwords is to apply the cryptographic algorithm on a given password using the originally used salt value, and then compare the resulting hash with the one you've stored. If they are identical, the given password is the same as the one that was used originally. If they differ, the password is invalid.

Attackers can still employ rainbow tables to try to find password values that generate the same hashes as the ones in your database. Luckily, generating rainbow tables takes time and plenty of space as well so it makes it much harder for attackers to find the passwords. This is why it's so important to use a unique salt value per password. It effectively means that a rainbow table would have to be generated for every single salt value that you've used, making it practically infeasible to find the original password values.

Let's demonstrate this with a simple example. The example is from a Node.js application, but this technique can be applied with whatever technology stack you're using.

This is my User model:

var mongoose = require('mongoose'),
    crypto = require('crypto'),
    uuid = require('node-uuid'),
    Schema = mongoose.Schema,
    ObjectId = Schema.ObjectId;

var userSchema = new Schema({
    name: { type: String, required: true, unique: true },
    email: { type: String, required: true },
    salt: { type: String, required: true, default: uuid.v1 },
    passwdHash: { type: String, required: true }
});

var hash = function(passwd, salt) {
    return crypto.createHmac('sha256', salt).update(passwd).digest('hex');
};

userSchema.methods.setPassword = function(passwordString) {
    this.passwdHash = hash(passwordString, this.salt);
};

userSchema.methods.isValidPassword = function(passwordString) {
    return this.passwdHash === hash(passwordString, this.salt);
};

mongoose.model('User', userSchema);
module.exports = mongoose.model('User');

Notice that the salt property of my User type has its default value set to 'uuid.v1'. In this case, uuid.v1 is a function which will be invoked by Mongoose whenever a new User instance is created. Every User instance will thus have a UUID value stored in its salt property. You can also see that I'm not storing the given passwordString in the setPassword function, but that I calculate the hash value based on the passwordString and the UUID salt value.

Suppose I create a user with the following code:

var user = new User({
    name: 'test_user',
    email: 'blah'
});
user.setPassword('test');

user.save(function(err, result) {
    if (err) throw err;
}); 

Its database representation will look like this:

{ 
    "passwdHash" : "b604367796274cf64177eec345532fc6ca66c6f0501906f82bb03f7916265e9d", 
    "name" : "test_user", 
    "email" : "blah", 
    "_id" : ObjectId("4f1dbb2cfa6157b118000001"), 
    "salt" : "304a33f0-45fc-11e1-80d2-43c594a44fa0" 
}

If an attacker would get access to this, he'd have to generate a rainbow table using the salt value, which takes time, and even then he has no guarantee that the rainbow table will actually contain the correct password. Again, this is why it's so important to use a unique salt for every password. Also, you can use whatever value you want as the salt value so if you can determine it based on some other fields or by using a specific formula you don't need to store the actual salt value. It's recommended to use a long salt value though. Theoretically speaking, it's safer if the salt value isn't stored so clearly as I'm doing here, but even with the salt value clearly visible to a possible attacker, it would still be practically infeasible for him to generate all those rainbow tables.

And of course, my actual authentication function is still very simple as well:

var authenticate = function(username, password, callback) {
    User.findOne({ name: username }, function(err, user) {
        if (err) return callback(new Error('User not found'));
        if (user.isValidPassword(password)) return callback(null, user);
        return callback(new Error('Invalid password'));
    });
};

So as you can see, there's nothing hard or complicated about storing credentials in a secure manner. It's quite easy to do so and there are no downsides to doing this.

How Do You Pick Open Source Libraries?

22 commentsWritten on January 16th, 2012 by
Categories: Opinions, Software Development

I'm currently looking into which library I'm going to use to handle authentication in a breakable toy project. Now, despite it being just a breakable toy, I want to do it with as few constraints on technical quality as possible because I want to maximize the learning experience I'm going to get from it. That means I don't just want to quickly put something together that just works. I want something that works, but that would also hold up in real world scenarios, even though the project will at best only be used by myself. Which means that I'm going to be picky about any libraries that I take a dependency on, just as I would if this were a project that I'd be getting paid to work on.

So as I was browsing through a few possible alternatives for my authentication needs, I started thinking about my thought process when evaluating libraries/frameworks to use. I generally base my decision on the following items, listed in order of importance (to me):

  • How well does it work for my scenario? If a library satisfies all other items on this list, that certainly doesn't mean it's an automatic lock. How it works and the impact it has on my code is definitely the most important factor.
  • Popularity. I've noticed that I let the number of watchers/forks on sites like Github influence my opinion. If a project has many watchers and many forks, odds are high that there's a relatively large group of happy users as well as people involved with the project. It also increases the odds that the project will be around for a while. Of course, inactive Open Source projects often remain available as well but if nobody's working on it, I'm not exactly tempted to take a dependency on it. Log4net is a notable exception to this, obviously. But when a project has a lot of people interested in it, or better yet, contributing to it, it's a good sign that you'll easily get help if needed, it's only going to get better in the long run and that it might get forked should the original developers stop working on it. As the author of an Open Source project that doesn't have a lot of watchers/forks (Agatha), I'm aware that my point of view on this is rather hypocritical but hey, it is what it is.
  • Code quality. I don't have the time to do an in-depth review of the code as I'm sure most of us don't do either. But I do like to glance over the code to get a general feel of the quality of the code. I focus mostly on the clarity of the code and also keep an eye open for sloppiness or downright WTF's. I guess the questions I'm mostly trying to answer when doing this are: "is this code I'd like to try to improve or fix if I need to?" and "how easy would it be to debug this when I need to troubleshoot some non-obvious issues?".
  • Location of code and issue tracker. A lot of people will probably take issue with this, but I consider it to be a major plus if the project is on Github. Not just because of my personal preference of Github, but because they truly encourage people to collaborate and contribute to projects and they make it very easy to do so. Also, the site is fast! I cringe when I have to look over issues of projects on Codeplex because it's just terribly slow. And the UI doesn't come close to that of Github either. I've heard that Bitbucket is pretty similar to Github, but I've never even looked for projects there. In any case: I want to be able to download the latest version of the code at any time, or of a particular branch if I need to, as easily as possible. I also prefer an issue tracker which is fast, responsive and easy to search. It doesn't have to be Github, but those 2 requirements are important to me.
  • License. If it's GPL, I don't use it. Also, I check whether or not a commercial license needs to be purchased when you want to use the library/framework in production. Pay attention to dual-licensed projects because that Open Source license might not apply to commercial/production use!

I'd love to hear your thoughts on this. Did I miss any important factors? I just quickly put this post together so it's likely that I missed some good ones :)