The Inquisitive Coder – Davy Brion's Blog

Trying to walk that thin line between intelligence and ignorance

Archive for the 'Code Quality' Category

Compiler|Interpreter Warnings Are Important Learning Opportunities

Posted by Davy Brion on 28th August 2010

When i first started to write C# code, i payed a lot of attention to compiler warnings. I wanted to avoid them at all costs. And with that i don’t mean suppressing them, but preventing them from being issued in the first place. I learned quite a few things from actually trying to understand why a certain compiler warning was issued, instead of just ignoring it like so many other developers do. In fact, when it comes to C#, i’d recommend turning on the Treat Warnings As Errors option on each project since i’ve never come across a C# compiler warning that you couldn’t avoid. And in practically every single case, avoiding the warning led to better code. When writing C#, i’ve never seen a single warning that was pointless. There might be a few esoteric ones that aren’t worth fixing, but the vast majority of us will never run into those. So do yourself a favor: if you get a compiler warning, make sure you understand why the warning was issued, and fix your code based on what you just learned while researching the warning. There simply is no reason not to do so, unless you happen to bump into the few cases where it really doesn’t matter but those cases will be far and between. In fact, i’d bet that only people like Ayende will run into them while us mortals never will.

So, how do i feel about warnings in the context of my ongoing Ruby journey? I have so little experience with Ruby that i can’t state that every single warning should be avoided. But i am of the opinion that you should at least be aware of every warning, and investigate whether or not you should modify your code to avoid it. Today i wrote my first Rakefile which automatically runs all of my RSpec tests for my EventPublisher module and one of the options of the SpecTask was to enable warnings from the Ruby interpreter. I was actually surprised that i hadn’t yet ran my Ruby code with warnings turned on. Maybe i was just too busy being impressed with the whole Ruby + TextMate package. Anyways, i turned on the warnings, ran ‘rake’ and watched a few warnings show up, much to my disappointment. Well, i was disappointed at first because i thought the code was in good shape but then i figured “ok, this is no big deal… i just gotta fix my code and i’ll learn from it”. And i did learn from it. The first one was a simple RSpec assertion that i wrote which looked like this:

	it "should know about the subscribed method for the correct event" do
		@publisher.subscribe :first_event, method(:first_event_handler)
		subscribed = @publisher.subscribed? :first_event, method(:first_event_handler)
		subscribed.should == true
	end

This generated the following warning:
warning: useless use of == in void context

I looked into it, and learned that when using RSpec, the last line should’ve been written like this:

		subscribed.should be_true

You’re probably thinking “now that’s a tiny difference and not really worth it”. And you’d be wrong. While the resulting modification in code is indeed minor, i learned about RSpec’s Matchers and how they work. And that knowledge is gonna help me in future code.

I also had the following piece of a code:

			define_method getter do
				event = instance_variable_get variable

				if event == nil
					event = Event.new(symbol.to_s)
					instance_variable_set variable, event
				end

				event
			end

When i wrote it and saw that it worked, i was pretty happy. But it turns out that this code causes the interpreter to issue the following warning:

warning: instance variable @first_event not initialized

In this case, @first_event was the value of the ‘variable’ variable, which is why the warning looks like that when the code is actually executed. So again, i looked into it, and learned that i should’ve written that code like this:

			define_method getter do
				if !instance_variable_defined? variable
					event = Event.new(symbol.to_s)
					instance_variable_set variable, event
				end

				instance_variable_get variable
			end

Moral of the story? Do not ignore compiler/interpreter warnings. They are there to help you improve not only your current code, but also your future code. That is, if you’re willing to pay attention to them.

Note: if you have experience with a variety of C/C++ compilers from back in the day, i can imagine that your opinion differs greatly from mine. However, i’m not talking about C/C++, so please keep the context (not to mention the decade…) in mind before you start typing a reply about how warnings in C/C++ could easily be justified depending on the compiler you were using ;)

Posted in Code Quality | 2 Comments »

The Downside Of Providing An API Through Extension Methods

Posted by Davy Brion on 26th August 2010

I recently used the excellent Moq mocking library for the first time, and i noticed a difference between Moq and Rhino Mocks (what i usually use) that i found interesting.

Consider the following useless and contrived example code:

    public interface ISomeComponent

    {

        void DoSomething();

    }

 

    public class SomeClass

    {

        private ISomeComponent someComponent;

 

        public SomeClass(ISomeComponent someComponent)

        {

            this.someComponent = someComponent;

        }

 

        public void DoSomethingReallyImportant()

        {

            someComponent.DoSomething();

        }

    }

 

Now suppose that we want to verify in a test that the DoSomethingReallyImportant method of SomeClass actually calls the DoSomething method of its ISomeComponent dependency.

With Moq, we could do that like this:

    [TestFixture]

    public class TestWithMoq

    {

        [Test]

        public void CallsDoSomethingOnSomeComponent()

        {

            var mock = new Mock<ISomeComponent>();

            var someObject = new SomeClass(mock.Object);

 

            someObject.DoSomethingReallyImportant();

 

            mock.Verify(m => m.DoSomething());

        }

    }

 

And with Rhino Mocks, it would look like this:

    [TestFixture]

    public class TestWithRhinoMocks

    {

        [Test]

        public void CallsDoSomethingOnSomeComponent()

        {

            var mock = MockRepository.GenerateMock<ISomeComponent>();

            var someObject = new SomeClass(mock);

 

            someObject.DoSomethingReallyImportant();

 

            mock.AssertWasCalled(m => m.DoSomething());

        }

    }

 

Not much of a difference, right? Except that Rhino Mocks provides you with a proxy that implements the ISomeComponent interface and Moq provides you with a generic Mock object, which contains a proxy that implements the ISomeComponent interface and is exposed through the Object property. Other than that, the tests are very similar.

The key difference is what you experience when you write the tests, as the 2 pictures below will illustrate:

Since Moq’s API is not fully based on Extension Methods, you get a normal and clean IntelliSense experience. Rhino Mocks on the other hand provides its API (at least the non-legacy stuff) solely through extension methods, which leads to all of them being included in your IntelliSense, even when they don’t make any sense at all.

It’s obviously not a major issue, but i was suprised with how much i liked not seeing all of the extension methods all the time while writing tests.

Posted in Code Quality, Test Driven Development | 7 Comments »

Once Again: Comments In Code Aren’t Necessarily Bad

Posted by Davy Brion on 18th August 2010

I happened to come across some blog posts and tweets that once again mentioned how evil it is to use comments in code. Some people still like to advocate the “if you need comments, your code sucks!” sentiment. As is often the case with this kind of statement, the only correct (or dare i say realistic) point of view is: it depends!

I generally agree that you should strive to avoid comments in code. That is, you should write your code in a way that doesn’t require comments for the reader to easily understand and grasp what the code is doing. However… there are situations where comments are helpful (or even required), since you simply can’t write everything in such a manner that it doesn’t require any comments.

To back up my point, i’d like to point you to this challenge i posted over a year ago. That code is clean. But pretty much everyone could use some comments to understand why that certain approach was necessary since it’s just not that obvious when focusing solely on the code. That code applies a certain pattern which isn’t well known, but there is a very specific reason why the pattern is needed. And i still believe that most developers need comments to understand it properly. That’s not to say that the guy who wrote it (which happened to be me) is better than those who’re going to have to maintain it. Hell, i wrote the code, and i would certainly be glad to see some comments in that code if i had to make a change 6 months after the fact.

The key thing to remember is this: don’t blindly follow what the books and/or the ‘legends’ say. If you need to write some code in a very non-obvious way, then you could really make things easier for those who need to maintain it (which could very well be you, btw) by including some comments to explain why a certain solution/pattern was chosen. There are some details you simply can’t show through good naming practices or clean code in general… some things simply need to be explained and it’s quite possible that a WHY comment or two benefits not only you but whoever is going to read the code more than some kind of excessive and futile exercise in making it as easy to understand as possible without comments ever will be.

In short, strive to avoid the need to write comments by writing clean code. But don’t be afraid to use comments wisely either when clean code simply doesn’t cut it. There’s nothing wrong with that. After all, we don’t all have the luxury to restrict our efforts to contrived examples.

Posted in Code Quality, Opinions | 32 Comments »

Can An Object Grow?

Posted by Davy Brion on 17th August 2010

As you probably know, i’m currently learning Ruby. And as you may or may not know, Ruby is a very dynamic language. And one of the things that i found immensely interesting about it is the ability to add methods to existing classes or to existing objects, which isn’t the same as adding a method to a class in Ruby btw. And it sorta got me thinking about how the whole static vs dynamic thing can be applied to how we are often taught about objects and classes.

Chances are that when you learned about OO, you were introduced to the concept of objects using real world items. For instance, a door could be an object. It has certain attributes, and it has certain operations that can be invoked on it. But once a door has been produced, it will never gain new capabilities. It won’t learn anything new. If anything, the attributes and operations that it was created with will only deteriorate over time.

Then again, some of us will undoubtedly have heard that a person is an example of an object too. A Person is an instance of the Human class. There is a huge difference between a Person instance and a Door instance. While the Door will never change (except for possible deterioration), a Person will change during its lifetime. It will gain new capabilities depending on what the Person instance goes through during its lifetime. It will also lose capabilities. It will gain new attributes, and some of them will no longer apply to that Person instance after a while. Its relationships can vary wildly from each Person to the next. A Person instance is highly dynamic throughout its lifetime, while a Door will always be the same.

The weird thing is… a lot of us use OO to model or mimic real world behavior. But real world behavior is pretty dynamic. Not for everything, but it definitely is for some things. And all this time, a large majority of us has been trying to model or mimic that behavior with static languages. In a static language, a Person instance would never be able to grow. You could define multiple types of Persons, but hey, a lot of people generally don’t like to be put in groups based on attributes or capabilities and you’ll always run into problems when doing so.

I’m not trying to convince anyone about anything specific in this post. But i am trying to get you think about what i’m saying. I do believe that objects can ‘grow’ in some or maybe even a lot of cases. And i think a lot of us can agree that static languages aren’t exactly a great solution to dealing with this problem (though i prefer the term ‘reality’ instead). Dynamic languages on the other hand can offer better ways to deal with this, but then again you do have to keep in mind that they come with some downsides as well. In software development, there are very, very few win-win situations. Everything is a trade-off. For everything that you know and love, there will be an alternative that is more suitable in some cases, and less suitable in others.

So now i’d like to ask you the following 2 questions:
1) do you think that objects can ‘grow’?
2) can developers grow? and how does that answer correspond to your answer to the first question?

Posted in Code Quality | 20 Comments »

Is That How You Talk To People?

Posted by Davy Brion on 26th July 2010

I just spotted the following 2 methods in a piece of code:

        public void ShowPanelWindow(bool isVisible)

        {

            Visibility = isVisible ? Visibility.Visible : Visibility.Collapsed;

        }

 

        public void ShowBusy(bool isBusy)

        {

            BusyIndicator.ShowIsBusy = isBusy;

        }

 

And i cringed. Personally, i think it’s weird to read lines of code that say view.ShowPanelWindow(false) or view.ShowBusy(false).  Instead, i’d go for something like this:

        public void HidePanelWindow()

        {

            Visibility = Visibility.Collapsed;

        }

 

        public void ShowPanelWindow()

        {

            Visibility = Visibility.Visible;

        }

 

        public void LookBusy()

        {

            BusyIndicator.ShowIsBusy = true;

        }

 

        public void StopLookingBusy()

        {

            BusyIndicator.ShowIsBusy = false;

        }

 

Sure, i’m not gonna win the fewest-lines-of-code contest but then again, we’re not participating in that contest anyway.  And while my version is a little bit longer, it doesn’t take more than a few extra seconds to write that extra code, and the improved readability of the consuming code is more than worth it.  After all, you do prefer reading view.HidePanelWindow() over view.ShowPanelWindow(false), no?

I’ve always liked the following approach to avoid (or at least, mimimize) bad method names.  Just pretend that the classes are people and that the method names are messages between them.  There is after all a reason why we referred to it as “sending a message to an object” originally in OO-terms.  Give it a shot, and you’ll notice that your code will become more readable with hardly any extra effort.

Posted in Code Quality | 17 Comments »

The MVVM Pattern Is Highly Overrated

Posted by Davy Brion on 21st July 2010

Update: check out my MVP In Silverlight/WPF series which discusses the MVP approach as an alternative to MVVM

If you’re doing Silverlight or WPF, you’ve no doubt come across the MVVM (Model-View-ViewModel) pattern. It seems to be the most popular client-side architecture pattern used among Silverlight/WPF developers. I find the pattern to be highly overrated, and actually have some big issues with the whole thing.

First, let’s briefly cover what MVVM is about for those of you who don’t know yet. MVVM virtually eliminates all of the code that would typically be placed in the code-behind file of your View (a user control, a screen, whatever) by putting all of that logic in the ViewModel. The ViewModel itself is never tightly coupled to the View. If it does have a reference to it, it’s typically through an interface that the View implements instead of a reference of the actual type of the View. This increases, or better yet, introduces testability to a large part of your UI code that you normally wouldn’t be able to cover with unit tests if you’d go with the traditional “put it all in the code-behind”-approach.

The ViewModel typically contains properties for the data that is to be shown in the View, and also raises notification events when the data in those properties changes. The View uses the powerful data-binding capabilities of Silverlight/WPF to bind the properties of the ViewModel to other user controls that the View is composed of. User-events that are captured by the View are sent to the ViewModel through Commands. Typically, these commands execute a method in the ViewModel which contains some kind of logic… usually to either send the updated data in the ViewModel’s properties to the Model (usually a Service Layer in Silverlight, in WPF it could be either a true Domain Model or also a Service Layer), to perform some business logic in the Model, or to retrieve data from the Model so it can be placed in the ViewModel’s properties so the View can bind to it.

That is, in a nutshell, how the MVVM pattern works. So why do i have issues with this? You can develop and test most of the application’s logic without being dependent on a physical View and that is typically a Good Thing, no? It sure is! However, my problem with this approach is that too many responsibilities are concentrated within the ViewModel. Its main responsibilities are to facilitate databinding and to interact with the Model. And that, in my opinion, isn’t very clean. In some ways, you could actually think of the ViewModel as a glorified code-behind, with the only difference being that it’s not tightly coupled to the (physical) View.

In most (if not all) MVVM implementations, a ViewModel has many possible reasons to be changed. It might need to be changed because of different data-binding requirements. Then again, it might also require changes when a part of the Model is modified. Now, i’m sure many of you can agree with me when i say that two of the most important principles in software development are Seperation of Concerns (SoC) and the Single Responsibility Principle (SRP). Obviously, the ViewModel doesn’t fare well when it comes to both of these principles.

Lets forget about MVVM for a second and focus on the concerns and responsibilities that a user control can have… say, a user control that shows customer information and allows the user to edit that data so it can be persisted:

  • visualization of the actual control (its own layout as well drawing other user controls that it is composed of)
  • communication/interaction with the Model
  • making data (from the Model) available so it can be displayed
  • outputting data in the correct user controls (for instance: various textboxes)
  • (simple validation) of modified/inputted data (for instance: no string values for numeric fields, etc…)

Without MVVM, all of these would be taken care of in the View. Obviously, not a good idea right? After all, if it were a good idea, we’d never have had a reason to start using MVVM in the first place.

Now, with MVVM, a lot of people would divide these concerns and responsibilities like this:

View:

  • visualization of the actual control (its own layout as well drawing other user controls that it is composed of)
  • outputting data in the correct user controls (for instance: various textboxes)
  • (simple validation) of modified/inputted data (for instance: no string values for numeric fields, etc…)

ViewModel:

  • communication/interaction with the Model
  • making data (from the Model) available so it can be displayed

In this case, the View still has 3 responsibilities which is still too much according to ‘the guidelines’, but it’s not that big of a deal (though plenty of people would argue that the simple validation would be better placed in the ViewModel). You’re highly unlikely to actually want to write automated tests for pure visualization anyway and the SRP is not something that you absolutely need to follow to the extreme in every single place. For the View, this is really not a problem and very much acceptable.

The ViewModel however has 2 important responsibilities in this case, and i’d argue that these 2 things should not be done in the same place. Making data available is done through data-binding. To do this, you need a set of properties and you need to raise the necessary events. In most cases, raising those events is very straightforward, but in more complex controls you might need a bit of additional logic to determine which event should be raised at what point. The other important responsibility is the communication/interaction with the Model. In most Silverlight applications, the Model will be a Service Layer. To communicate with this Service Layer, you need Service Proxies. That means that your ViewModel is essentially responsible for communicating with the Service Layer, dealing with business exceptions that might be thrown by some service calls, and dealing with technical exceptions that can occur simply because of network-related problems. Group all of those together and i don’t think i’m going out on a limb here by saying that that is a lot of logic to put in one class, no?

(Sidenote: what i don’t really understand is that many people who vigorously advocate adherence to SRP and SoC in their domain and business code don’t seem to hold their UI code to the same standards. I do.)

At work, we do a lot of Silverlight development. We typically have around 5 Silverlight projects in active development at the same time. And it’s been that way for over a year now. That equals a lot of Silverlight code that we’ve written and experience and knowledge that we’ve built up. And we haven’t used MVVM for any of it. All this time, we’ve been using the MVP pattern (Supervising Controller variant) with a slight twist. That twist being a slimmed down version of a Presentation Model. Our Presentation Model’s sole responsibility is to facilitate data-binding, and in some cases, a touch of validation is added as well.

If we go back to our previous example of the customer screen, the responsibilities and concerns would be divided like this in our MVP-PMlight approach:

View:

  • visualization of the actual control (its own layout as well drawing other user controls that it is composed of)
  • outputting data in the correct user controls (for instance: various textboxes)
  • (simple validation) of modified/inputted data (for instance: no string values for numeric fields, etc…)

Presenter:

  • communication/interaction with the Model based on the contents of the Presentation Model

Presentation Model:

  • making data (from the Model) available so it can be displayed

Which leads to classes which are more focused on their task instead of trying to focus on too many things at the same time. In my opinion, this approach is much better/cleaner than MVVM. Not only is there a noticeable benefit in code quality (classes are more focused), there is also increased potential to reuse our ‘light Presentation Models’ in multiple controls. Testability is increased over MVVM as well since it’s always easier to test classes which are focused versus testing classes which have too many responsibilities. All in all, a couple of important benefits and we still haven’t thought of a real drawback compared to MVVM.

Posted in Architecture, Code Quality, Silverlight | 75 Comments »

Which Language Do You Code In?

Posted by Davy Brion on 18th July 2010

I got into an interesting discussion on twitter today on what language you use when coding. We weren’t talking about programming languages, but the actual real world language that is used for class names, method names, variable names, comments, etc… My take on the matter is to just always do it in English. And for the record, i’m a native Dutch speaker and i work in Belgium. Our official languages are Dutch and French, not English.

A few years ago, i worked at a large financial institution in Belgium where we developed software for internal use in the main offices. The UI was pretty much always done in Dutch, but most of the time we wrote all of our code in English. There were a few projects where the developers had used Dutch in their code though. At first, this wasn’t an issue and seemed to be more of a personal preference thing (though mixing within a project is just horrible). After a while, the company got involved in the outsourcing game and suddenly, they were sending a lot of the projects that were in maintenance to a group of Indian developers. It’s hard enough to get developers who use a different language and are part of a different culture on the same page when working on software, but it’s obviously even harder if the code they are supposed to read, maintain and extend is in a language they don’t even know.

At my current job, we also have a branch in Hungary. We often mix teams so naturally, we all write our code in English. We not only write our code in English, but we also use English as our language in any kind of documentation. Now, we do have one customer who insists that the functional designs are all done in Dutch. Maintaining 2 sets of documents is something that we won’t do, and since our development process requires developers to link our code to our functional designs, we’ll never be able to use our Hungarian developers on the projects that we do for this customer. To me, that is a huge downside to having to write the documents in Dutch. And it’s the exact same problem we’d have if we would write our code in Dutch.

Both those examples should tell you why i’m so much in favor of just writing code in English all the time. Generally speaking, if you pick a non-English language to code in, you are limiting the pool of possible future developers to those developers who know the language you’ve chosen. It’s that simple. The more people who know the language, the bigger the pool of future developers. And when it comes to languages that aren’t used by a lot of people… well, that’s a pretty big restriction on your future options.

Another downside to using non-English in your code, is that you’re effectively already mixing multiple languages in the code base. Your programming language already is in English, and using non-English in your code leads to something that just sounds (or reads, i guess) wrong.

The biggest problem that people bring up with always using English, is that you obviously need to translate business concepts. Especially for people who are doing DDD and who want to leverage the Ubiquitous Language, this isn’t always an easy decision to make. For one, your domain experts might not be willing to take on the translation burden. If they are unwilling to do so, you can still use a translated version of the Ubiquitous Language but then the entire team (except for the domain experts) has to deal with that burden on a daily basis.

Now, i’m not sure if that’s really that big of a problem. For starters, English terms are becoming more pervasive in ‘business language’ every day. I’m not saying that everyone is already using English for their business concepts/terms, but i would argue that it is increasing, that it’s only going to increase more and more and most importantly, that more and more business people don’t really have a problem with using English terms anymore.

And lets not forget that the percentage of teams that is truly doing DDD and leveraging the Ubiquitous Language is probably still small. After all, most true DDD experts will say that DDD is only a good idea for 10% of all projects ;) , so i can’t help but wonder how big of an issue the translation burden really is.

Thoughts?

Posted in Code Quality | 11 Comments »

Using The Dynamic Keyword To Avoid Difficulties With Generics

Posted by Davy Brion on 16th July 2010

A coworker was working on some kind of base EntityBuilder class to use in his tests.  One of the requirements of the EntityBuilder class was that it would need to automatically set the ID of an entity to a ‘real’ value (as in: not the default value of the type).  The EntityBuilder would use some kind of IdGenerator based on the type of the ID of the entity.   First of all, the example i’m gonna show is highly simplified and might not look like it makes much sense, but it’s only to illustrate some C# stuff with regards to generics and the dynamic keyword.  So bear with me, and just focus on the language details :)

Suppose you’ve got something like this:

    public abstract class Entity<TId>

    {

        public virtual TId Id { get; set; }

    }

 

    public interface IIdGenerator<TId>

    {

        TId GenerateId();

    }

 

    public class IntIdGenerator : IIdGenerator<int>

    {

        private static int lastIssuedId;

 

        public int GenerateId()

        {

            return ++lastIssuedId;

        }

    }

 

    public class GuidIdGenerator : IIdGenerator<Guid>

    {

        public Guid GenerateId()

        {

            return Guid.NewGuid();

        }

    }

 

The idea was to write the EntityBuilder somewhat along these lines:

    public abstract class TestEntityBuilder<TEntity, TId> where TEntity : Entity<TId>

    {

        public TEntity Build()

        {

            var entity = CreateEntityWithDefaultProperties();

            entity.Id = GetIdGeneratorFor(typeof(TId)).GenerateId();

            return entity;

        }

 

        protected abstract TEntity CreateEntityWithDefaultProperties();

 

        private IIdGenerator<TId> GetIdGeneratorFor(Type type)

        {

            if (type == typeof(int))

            {

                return new IntIdGenerator();

            }

 

            return new GuidIdGenerator();

        }

    }

 

Of course, that doesn’t even compile… you’ll get the following compiler errors:

error CS0266: Cannot implicitly convert type ‘MyProject.IntIdGenerator’ to ‘MyProject.IIdGenerator<TId>’. An explicit conversion exists (are you missing a cast?)
error CS0266: Cannot implicitly convert type ‘MyProject.GuidIdGenerator’ to ‘MyProject.IIdGenerator<TId>’. An explicit conversion exists (are you missing a cast?)

So, how exactly do you get this working with generics? That’s when he asked for my help, and i didn’t know the answer either… i’ve struggled with this exact problem in a few previous situations and i never really got a clean solution either.  But then i thought “wait, can’t we just avoid the problems with generics through the dynamic keyword?”

We changed the code to look like this:

    public abstract class TestEntityBuilder<TEntity, TId> where TEntity : Entity<TId>

    {

        public TEntity Build()

        {

            var entity = CreateEntityWithDefaultProperties();

            entity.Id = GetIdGeneratorFor(typeof(TId)).GenerateId();

            return entity;

        }

 

        protected abstract TEntity CreateEntityWithDefaultProperties();

 

        private dynamic GetIdGeneratorFor(Type type)

        {

            if (type == typeof(int))

            {

                return new IntIdGenerator();

            }

 

            return new GuidIdGenerator();

        }

    }

 

We just changed the return type of the GetIdGeneratorFor method to ‘dynamic’, and the call to the GenerateId method is now a dynamic call instead of a normal method call.  And it works.  No messing around with generics voodoo, no (direct) usage of reflection either.  Just clean code.

I’ll probably use this trick a lot more times in the future when i run into the limitations of generics :)

Posted in C#, Code Quality | 15 Comments »

Check Out QuickGenerate

Posted by Davy Brion on 30th June 2010

One of several interesting things in Mark Meyers’ QuickNet project is the whole input generation thing that you need for property-based testing.  It turns out that those input generators are very usable for far more purposes than just property-based testing, so it’s evolved into its own library.  It can generate object instances of almost any kind, while you can still have fine-grained control over the generation if you want to.  You can use it for simple types, complex objects or even entire object graphs. I wish i had time to write a more in-depth post about this, but for now i’m just gonna point you guys in the right direction, and i hope that you’ll see the value in this :)

The announcement of the first release can be found here, and an example can be found here.  Here’s a little glimpse at the code of one the examples:

quickgenerate

I think that piece of code is a nice illustration of how powerful and flexible this is :)

Posted in C#, Code Quality, Software Development, Test Driven Development | 1 Comment »

Why You Shouldn’t Expose Your Entities Through Your Services

Posted by Davy Brion on 17th May 2010

I sometimes still get questions from people who want to expose their entities through their WCF Services.  Regardless of whether these are entities that are populated through NHibernate or any other ORM, this is just not a good thing to do.  Many people prefer to accept and return entities through their services because they believe this is an easier programming model.  They believe that it takes less work than mapping to DTO’s and that as a whole, this solution is much more manageable.  Rest assured that this is a fallacy.  Any perceived benefit that you’ll get from exposing entities outside of your service layer will only last a very short time and will quickly be dwarfed by added complexity, increased maintenance overhead and a performance overhead which must not be ignored. 

In this post, i’d like to take the chance to explain the downsides to exposing entities through services.  Though i’ll probably miss quite a few of the downsides (feel free to add to the list through comments), the ones i will mention are IMO important enough to take note of.

Exposing entities to clients means your clients are very tightly coupled to your service(s)

Entities are a part of your domain.  These entities in your domain can change for various reasons.  Sometimes because functional changes are required, but quite often also for optimizations (whether they are for performance reasons or to improve the clarity and maintainability of your domain).  Functional changes can impact your clients, though that is not necessarily the case.  Optimizations hardly ever have an impact on your clients (other than possibly improved response times from your service calls obviously).  If your service layer accepts and returns domain entities, each possible change is highly likely to have an impact on your clients.  And this impact is not cheap.  In the best case scenario, it means updating your service contracts, regenerating your service proxies and redeploying your clients.  In the worst case scenario, it means making actual changes to the code of your clients.  And for what? Because of changes that shouldn’t have impacted your clients in the first place?

Ideally, your clients are as dumb as they can be.  They should know as little as possible about the actual implementation of the domain because that implementation is simply not relevant to them.  They should present users with data and give them the option to modify that data, to trigger actions and to perform certain tasks.  They should focus squarely on those tasks and pretty much everything else is typically better suited to be done behind your service layer.  If you build your clients with no real knowledge of the actual domain model, but of DTO’s and possible actions to be performed then you can reduce the level of coupling between your clients and your services substantially.

Many of the people who prefer to expose entities often claim that going for the DTO approach introduces too much extra work and too many extra, seemingly unnecessary classes.  For starters, they don’t want to write code that maps entities to DTO’s.  First of all, the amount of code that this requires is in reality very small, not to mention very easy.  Secondly, you can just as well use a library such as AutoMapper to take that pain away from you.  And contrary to what you might think, there is a big performance gain to be had from returning DTO’s over entities, but i’ll get to that in the next section.

Entities are hardly ever the most optimal representation of data

I think we can safely say that most applications need to show data in the following 3 ways:

  • In a grid view, either as a total listing of all instances of a certain type of data or the result of a search query or some kind of filtering action
  • In dropdown controls or anything else that lets users select pieces of data
  • In edit screens where a piece of data needs to be displayed in its entirety, perhaps even to be modified by the user

There are undoubtedly more ways in which data can be presented to the user but i think it’s safe to say that most business applications will certainly rely on the following 3 ways quite heavily.

In the case of a grid view, you’re frequently showing data that is related to more than one entity.  You’ll often need to include the name or the description of some associated entities.  So what exactly is it that you want to do in this situation?  Do you want to return a list of the main entities of the grid view, which all have their required association properties filled in so you can display the columns that you need in the grid view?  Do you actually need all of the properties of these entities (for both the main entities and the associated entities)?  Odds are high that you’re going to be returning a lot more data to the client than you actually need.  And that is what is realistically going to hurt the performance of your system.  Any piece of unnecessary data that you transmit to your clients has a cost associated with it.  The unnecessary data is retrieved from the database.  The entities are then serialized at the service end.  Then they are transmitted to the client.  Then they are deserialized by your client.  All of this is pretty costly, so the more unnecessary data that is included in this operation, the more your performance and the responsiveness of your client (not to mention your database and your server) is impacted negatively.

In the case of dropdown controls or anything else that lets users select pieces of data, you typically only need very few of the properties of that piece of data.  In many cases, the primary key and a name or a description are sufficient.  Do you really need to transmit the entire entity every time for usages like this? Again, keep in mind that all of that extra data that will never be used by your client needs to be retrieved, serialized, transmitted and deserialized again.  Surely, this is an awful waste, no? 

And then there’s the case where a piece of data needs to be displayed in its entirety.  In these cases, you will almost always need all of the properties of the entity that is displayed, but you’ll most often also need to show other data (things that can be selected, or linked to the main entity).  This other data will in most cases fall into the previous category where you’ll only need very little information about the actual entity.  If you’re smart, you’ve chosen the DTO approach to retrieve this data for the data that can be selected, and in that case, you already have all of the infrastructural code in place to project entities or data into DTO’s.  So you might as well reuse it for the main entity as well since you already have the capability to do this.

Always keep in mind that your entities will frequently either contain more data than needed, or less data than needed.  As such, it just doesn’t make much sense to expose entities to your clients since they are hardly ever optimal for client-side usage.  If you really want to think about performance, stop worrying about the supposed cost of mapping to DTO’s (which is truly negligible) and start focusing on what your actually sending to and from your service because this is far more costly than any kind of DTO-mapping really is.

Must your data really come from entities?

If you are displaying data to your user, does that data really need to come from your domain model?  Does it really need to be retrieved by populating a collection of entities to then return them to the client?  Again, keep the form of the data in mind when thinking about this.  In many cases, as i mentioned above, an entity is not the most optimal form of the data that your client needs.  So why even retrieve it through entities? Sure, asking your ORM to retrieve a set of entities based on a set of criteria is often the easiest thing to do, but if the easiest path were the best path, the overall quality of software projects wouldn’t be in the sad state that it’s in today.  If the form of the required data is not identical to the structure of an entity, it’s often far more optimal to simply populate a DTO directly from the data.  With NHibernate, you can easily do this by adding a list of projections to your query and then using a ResultTransformer to populate the DTO’s based on the direct output of the query.  In this case, no entity instance ever needs to be created when you’re just retrieving data, and no extra mapping between the entity and the DTO’s needs to be performed.  Your data access code simply retrieves the resulting data from a query, and puts that data directly in your DTO’s.  There’s no reason why usage of an ORM should prevent you from doing this.   Once again, this approach will offer far more performance benefits than avoiding DTO mapping at all costs ever can.

What about the behavior of your entities?

Do your entities have any behavior in them?  If not, they are already more of a DTO than a true entity.  In fact, if your entities have no behavior at all, you could even wonder why you’re using an ORM in the first place.  Now, behavior can mean many things.  It could mean lazy loading of associations.  It could mean actual business logic.  Obviously, lazy-loading doesn’t (and shouldn’t!) work client-side, but what about your business logic? Do you have business logic that can be executed client-side? Or is it business logic that should only be executed behind the service layer? If so, how do you make the distinction between this to prevent client-side usage from these entities? Whatever you do, you’re pretty much opening up a can of worms that really is better avoided in the first place.

How are you going to deal with technical issues?

Accepting and returning entities from services introduces a host of technical issues that can be quite substantial.  Serialization and deserialization specifically are issues that you need to be worried about.  If you’re using an ORM which does lazy-loading of associations, this will certainly cause serialization issues that you need to work around.  You can either disable lazy loading, or you can make sure that your entities are always fully initialized (as in: always have their associations fully loaded) before they are sent back to the client.  Disabling lazy-loading will cause performance problems in your service layer, either in places where you don’t expect them to be or in places that you haven’t thought of before it’s too late.  Fully loading your entities and their associates before returning them is another performance nightmare waiting to happen so that’s really not an ideal solution either.  You can try to hook into the serialization process or even the lazy-loading features of your ORM but whatever you do in that case will be a hack that will cause issues sooner or later.  And again, all of these problems can very easily be avoided with a solution which, i hope you realize by now, offers plenty more benefits than any solution where you accept/return entities in your service.

Conclusion

Every single downside to exposing entities through services are issues that i have myself encountered in past projects, either ones i’ve worked on myself, or ones that i’ve seen other people work on.  If that’s not enough for you, then maybe you’ll find it interesting to know that some of the brightest and most respected people (like Udi Dahan and Ayende for instance) in the .NET community also actively recommend against exposing entities through services because of the same downsides that i mentioned, though they could probably give you even more downsides that i forgot to cover in this post.  These downsides are not figments of anyone’s imagination.  They are very real, and you really, really ought to think twice before dismissing this advice. 

Posted in Architecture, Code Quality, Opinions, Performance, WCF | 23 Comments »