Why You Shouldn’t Expose Your Entities Through Your Services

27 commentsWritten on May 17th, 2010 by
Categories: Architecture, Code Quality, Opinions, Performance, WCF

I sometimes still get questions from people who want to expose their entities through their WCF Services.  Regardless of whether these are entities that are populated through NHibernate or any other ORM, this is just not a good thing to do.  Many people prefer to accept and return entities through their services because they believe this is an easier programming model.  They believe that it takes less work than mapping to DTO’s and that as a whole, this solution is much more manageable.  Rest assured that this is a fallacy.  Any perceived benefit that you’ll get from exposing entities outside of your service layer will only last a very short time and will quickly be dwarfed by added complexity, increased maintenance overhead and a performance overhead which must not be ignored. 

In this post, i’d like to take the chance to explain the downsides to exposing entities through services.  Though i’ll probably miss quite a few of the downsides (feel free to add to the list through comments), the ones i will mention are IMO important enough to take note of.

Exposing entities to clients means your clients are very tightly coupled to your service(s)

Entities are a part of your domain.  These entities in your domain can change for various reasons.  Sometimes because functional changes are required, but quite often also for optimizations (whether they are for performance reasons or to improve the clarity and maintainability of your domain).  Functional changes can impact your clients, though that is not necessarily the case.  Optimizations hardly ever have an impact on your clients (other than possibly improved response times from your service calls obviously).  If your service layer accepts and returns domain entities, each possible change is highly likely to have an impact on your clients.  And this impact is not cheap.  In the best case scenario, it means updating your service contracts, regenerating your service proxies and redeploying your clients.  In the worst case scenario, it means making actual changes to the code of your clients.  And for what? Because of changes that shouldn’t have impacted your clients in the first place?

Ideally, your clients are as dumb as they can be.  They should know as little as possible about the actual implementation of the domain because that implementation is simply not relevant to them.  They should present users with data and give them the option to modify that data, to trigger actions and to perform certain tasks.  They should focus squarely on those tasks and pretty much everything else is typically better suited to be done behind your service layer.  If you build your clients with no real knowledge of the actual domain model, but of DTO’s and possible actions to be performed then you can reduce the level of coupling between your clients and your services substantially.

Many of the people who prefer to expose entities often claim that going for the DTO approach introduces too much extra work and too many extra, seemingly unnecessary classes.  For starters, they don’t want to write code that maps entities to DTO’s.  First of all, the amount of code that this requires is in reality very small, not to mention very easy.  Secondly, you can just as well use a library such as AutoMapper to take that pain away from you.  And contrary to what you might think, there is a big performance gain to be had from returning DTO’s over entities, but i’ll get to that in the next section.

Entities are hardly ever the most optimal representation of data

I think we can safely say that most applications need to show data in the following 3 ways:

  • In a grid view, either as a total listing of all instances of a certain type of data or the result of a search query or some kind of filtering action
  • In dropdown controls or anything else that lets users select pieces of data
  • In edit screens where a piece of data needs to be displayed in its entirety, perhaps even to be modified by the user

There are undoubtedly more ways in which data can be presented to the user but i think it’s safe to say that most business applications will certainly rely on the following 3 ways quite heavily.

In the case of a grid view, you’re frequently showing data that is related to more than one entity.  You’ll often need to include the name or the description of some associated entities.  So what exactly is it that you want to do in this situation?  Do you want to return a list of the main entities of the grid view, which all have their required association properties filled in so you can display the columns that you need in the grid view?  Do you actually need all of the properties of these entities (for both the main entities and the associated entities)?  Odds are high that you’re going to be returning a lot more data to the client than you actually need.  And that is what is realistically going to hurt the performance of your system.  Any piece of unnecessary data that you transmit to your clients has a cost associated with it.  The unnecessary data is retrieved from the database.  The entities are then serialized at the service end.  Then they are transmitted to the client.  Then they are deserialized by your client.  All of this is pretty costly, so the more unnecessary data that is included in this operation, the more your performance and the responsiveness of your client (not to mention your database and your server) is impacted negatively.

In the case of dropdown controls or anything else that lets users select pieces of data, you typically only need very few of the properties of that piece of data.  In many cases, the primary key and a name or a description are sufficient.  Do you really need to transmit the entire entity every time for usages like this? Again, keep in mind that all of that extra data that will never be used by your client needs to be retrieved, serialized, transmitted and deserialized again.  Surely, this is an awful waste, no? 

And then there’s the case where a piece of data needs to be displayed in its entirety.  In these cases, you will almost always need all of the properties of the entity that is displayed, but you’ll most often also need to show other data (things that can be selected, or linked to the main entity).  This other data will in most cases fall into the previous category where you’ll only need very little information about the actual entity.  If you’re smart, you’ve chosen the DTO approach to retrieve this data for the data that can be selected, and in that case, you already have all of the infrastructural code in place to project entities or data into DTO’s.  So you might as well reuse it for the main entity as well since you already have the capability to do this.

Always keep in mind that your entities will frequently either contain more data than needed, or less data than needed.  As such, it just doesn’t make much sense to expose entities to your clients since they are hardly ever optimal for client-side usage.  If you really want to think about performance, stop worrying about the supposed cost of mapping to DTO’s (which is truly negligible) and start focusing on what your actually sending to and from your service because this is far more costly than any kind of DTO-mapping really is.

Must your data really come from entities?

If you are displaying data to your user, does that data really need to come from your domain model?  Does it really need to be retrieved by populating a collection of entities to then return them to the client?  Again, keep the form of the data in mind when thinking about this.  In many cases, as i mentioned above, an entity is not the most optimal form of the data that your client needs.  So why even retrieve it through entities? Sure, asking your ORM to retrieve a set of entities based on a set of criteria is often the easiest thing to do, but if the easiest path were the best path, the overall quality of software projects wouldn’t be in the sad state that it’s in today.  If the form of the required data is not identical to the structure of an entity, it’s often far more optimal to simply populate a DTO directly from the data.  With NHibernate, you can easily do this by adding a list of projections to your query and then using a ResultTransformer to populate the DTO’s based on the direct output of the query.  In this case, no entity instance ever needs to be created when you’re just retrieving data, and no extra mapping between the entity and the DTO’s needs to be performed.  Your data access code simply retrieves the resulting data from a query, and puts that data directly in your DTO’s.  There’s no reason why usage of an ORM should prevent you from doing this.   Once again, this approach will offer far more performance benefits than avoiding DTO mapping at all costs ever can.

What about the behavior of your entities?

Do your entities have any behavior in them?  If not, they are already more of a DTO than a true entity.  In fact, if your entities have no behavior at all, you could even wonder why you’re using an ORM in the first place.  Now, behavior can mean many things.  It could mean lazy loading of associations.  It could mean actual business logic.  Obviously, lazy-loading doesn’t (and shouldn’t!) work client-side, but what about your business logic? Do you have business logic that can be executed client-side? Or is it business logic that should only be executed behind the service layer? If so, how do you make the distinction between this to prevent client-side usage from these entities? Whatever you do, you’re pretty much opening up a can of worms that really is better avoided in the first place.

How are you going to deal with technical issues?

Accepting and returning entities from services introduces a host of technical issues that can be quite substantial.  Serialization and deserialization specifically are issues that you need to be worried about.  If you’re using an ORM which does lazy-loading of associations, this will certainly cause serialization issues that you need to work around.  You can either disable lazy loading, or you can make sure that your entities are always fully initialized (as in: always have their associations fully loaded) before they are sent back to the client.  Disabling lazy-loading will cause performance problems in your service layer, either in places where you don’t expect them to be or in places that you haven’t thought of before it’s too late.  Fully loading your entities and their associates before returning them is another performance nightmare waiting to happen so that’s really not an ideal solution either.  You can try to hook into the serialization process or even the lazy-loading features of your ORM but whatever you do in that case will be a hack that will cause issues sooner or later.  And again, all of these problems can very easily be avoided with a solution which, i hope you realize by now, offers plenty more benefits than any solution where you accept/return entities in your service.

Conclusion

Every single downside to exposing entities through services are issues that i have myself encountered in past projects, either ones i’ve worked on myself, or ones that i’ve seen other people work on.  If that’s not enough for you, then maybe you’ll find it interesting to know that some of the brightest and most respected people (like Udi Dahan and Ayende for instance) in the .NET community also actively recommend against exposing entities through services because of the same downsides that i mentioned, though they could probably give you even more downsides that i forgot to cover in this post.  These downsides are not figments of anyone’s imagination.  They are very real, and you really, really ought to think twice before dismissing this advice. 

  • Pingback: Sending NHibernate entities over the WCF wire | The Inquisitive Coder – Davy Brion's Blog

  • http://jonkruger.com/blog Jon Kruger

    The one exception here (IMO) would be if you are writing the web service and the application that consumes it and you aren’t making that service public. In that case, you might just be using the web service so that a thick client app doesn’t have access to your database, but it’s nothing more than that.

  • http://blog.phatboyg.com/ Chris Patterson

    Jon:
    Not public is just semantics – the fact that you are deploying a thick client and providing data via the web service is a perfect example of where you do not want to expose the service’s privates on the wire. The service and clients will almost always evolve independently, even if you swear on day 1 that they will always update at the same time.

  • David Martines

    Davy, you hit the nail on the head. I recently worked on a project where this same issue was deliberated and we eventually came to the same conclusion. I think this ties into the whole command/query separation principle: return DTOs from queries where data is needed to be displayed, and send back command objects. The commands could be DTOs as well wrapped in an Agtha request, but they might as well just BE the Agatha requests themselves, right? Bottom line, clients shouldn’t be using the entities directly.

    We also see this confusion in the MVC space, were you have a pure web app with no real physical service-layer, and the entities and the current session/context are available at the time of data binding. The “Models” in MVC are NOT the same as domain entities, but unfortunately most of what you read on ASP.NET MVC suggests this.

  • http://davybrion.com Davy Brion

    @David

    “The commands could be DTOs as well wrapped in an Agtha request, but they might as well just BE the Agatha requests themselves, right?”

    Yup… but you don’t _have_ to use agatha for something like that, though it does tend to make it easier ;)

    “We also see this confusion in the MVC space, were you have a pure web app with no real physical service-layer, and the entities and the current session/context are available at the time of data binding.”

    agreed… i didn’t really want to get into that with this post, but i cringe pretty much every time i see an ASP.NET MVC related post/article for pretty much that exact reason. it’s almost just as bad, the only difference is that they at least don’t have to pay the cost of serialization/deserialization but other than that, they’d greatly benefit from a more strict separation as well IMHO.

  • Dan Jensen

    Davy, I agree with all of your points, but there is one area I’m struggling with. I want to encapsulate as much behavior as possible in my entities, such as calculations for fields that the user really shouldn’t edit by hand. I don’t want to duplicate the rules for these types of calculations in my UI View Models if I can avoid it. For example, let’s say two fields are editable. Once the user fills those in, a third field is automatically calculated based on the values from the first two. So how would you approach this? Would you go ahead and duplicate the rules on the front end? Or maybe create some sort of helper class that performs the calcs, which both the VMs and entities could call? In most cases I also need to validate the numbers before persisting them, and it would essentially be using the same formula to make sure the numbers are correct.

  • Andreanta

    I agree with this vision , in my past projects i have tried always to have 2 different models for UI and for Service Layers but i have to do a manually / semi automatically data/state transfer from UI model to ServiceModel.

    Something like views and tables in a RDBMS.

    This separate my models, test and layers from UI programmers and Business\Data programmers.

    Now i’m trying to use ORM like Nhibernate and i’m studing on Agatha and Ncommon projects.

    [01:06:35] andreanta: With NHibernate, you can easily do this by adding a list of projections to your query and then using a ResultTransformer to populate the DTO’s based on the direct output of the query. In this case, no entity instance ever needs to be created when you’re just retrieving data, and no extra mapping between the entity and the DTO’s needs to be performed. Your data access code simply retrieves the resulting data from a query, and puts that data directly in your DTO’s. There’s no reason why usage of an ORM should prevent you from doing this. Once again, this approach will offer far more performance benefits than avoiding DTO mapping at all costs ever can. in this sentence seems that i can do automatically . Very fine.
    I’ll try to study more on this.

    Thanks a lot for a very useful article.

  • Henning Anderssen

    Good post and couldn’t agree more. I have one comment though, on the part where you mention projecting the raw data to your DTO’s or whatever. When you do it like that, you/ORM still have to do all the nasty SQL junk, such as join and what not.
    Perhaps it would be better to actually save the DTO’s to the database. That particular method is called Persistent View models by either Udi Dahan or Greg Young. That might give you some extra overhead when changing names or other modifications, where you have to now update many other tables and rows, etc, but do you really want to sacrifice read performance for that (probably highly) rare update?

    @Dan Jensen,
    In that case you’d want to extract the calculation into a “Domain service” and inject the calculation class into your entity via Double Dispatch, and into your MVC controller (or whatever else you use). Remember to use interfaces when injecting and not the actual implementation of the calculation!!

  • http://davybrion.com Davy Brion

    @Dan Jensen

    “Would you go ahead and duplicate the rules on the front end? Or maybe create some sort of helper class that performs the calcs, which both the VMs and entities could call? In most cases I also need to validate the numbers before persisting them, and it would essentially be using the same formula to make sure the numbers are correct.”

    sort of depends on the actual calculation… if it’s a very easy one or one that is highly unlikely to ever change, i’d just duplicate it to be honest. sure, some people would call me heretic for that but a little bit of pragmatism never hurt anybody. If it’s a tough calculation, i’d either go with the helper class, or just making it a server-side only thing. You could trigger the request to calculate the value asynchronously when the user inputs both required fields and then just update the calculated value when the result is retrieved. then you can reuse the same logic when you’re performing your final server-side validation as well.

    @Henning

    i’d rather not sacrifice read-performance… people’s impression of a system’s responsiveness is largely based on read-performance. That writes are a bit slower than reads is somewhat more easily acceptable to them “because the system is working!” other than “it’s just supposed to show me data!” during reads

  • Henning Anderssen

    @Davy

    We actually considered doing something like you mention, where we create NHibernate mappings directly to the view models, but because of various reasons we chose to change our domain model a bit and denormalize some of the data. Not sure what the effect was, since I’m not working on that part of the system at the moment. We also discussed persistent viewmodels, or rather persisten reportmodels, since it is our big reports that eat up most of the resources. The app is quite reporting heavy.

    If one choses to use some kind of persistent viewmodels, you could do the updates to the viewmodels async, so they’ll be eventually consistent. I guess more in line with the whole CQRS thinking. Depends on your needs I guess.

    Btw, I’d also chose read performance in most of my apps any day of the week. However, some apps will be more read heavy, and such a solution would not be as effective.

  • http://davybrion.com Davy Brion

    @Henning

    yeah, we occasionally map views with NHibernate as well and in some cases even use stored procedures for really complex queries

  • http://dgoyani.blogspot.com/ Dhananjay Goyani

    Is ViewModel (the MVVM pattern) specialized case of DTO? Just want to know your view.

  • http://davybrion.com Davy Brion

    We don’t use MVVM so i’m probably a bit rusty when it comes to the details of a ViewModel, but i wouldn’t consider them related at all.

    You’d probably use the data from the DTO’s, or even the DTO’s themselve in the ViewModel but a ViewModel itself is in no way a DTO IMHO

  • http://dgoyani.blogspot.com/ Dhananjay Goyani

    well, yah I got your point.

  • Pingback: Scott Banwart's Blog » Blog Archive » Distributed 52

  • http://dgoyani.blogspot.com/ Dhananjay Goyani

    And also, I am wondering what kind of data binding approach you prefer? Both for Silverlight and MVC apps.

  • http://davybrion.com Davy Brion

    @Dhananjay

    my only requirement is basically that databinding can _never_ trigger an automatic service call or database roundtrip… that’s pretty much it as far as i care about it

  • Pingback: More Debugging Tips - Peter Miller

  • Pingback: links for 2010-07-15 « Praveen’s Blog

  • Lars-Erik Roald

    I know I am bit late for commenting this blogpost…but I totally agree with you. Exposing the backend entities to the client is a dangerous game. I have done this myself in a pretty large project (wcf ria services) – it was a disaster. The layers will be transparent – a change in the db will cause a change in all the clients. And exposing an iqueryable interface against the entities at the clientside is not good – specifications , not depending on entities, are much better.
    What surprises me, is that developers tend to fall into this pitfall again and again – even experienced developers. And Microsoft helps us falling…by introduciing odata, wcf ria…

  • http://weblogs.sqlteam.com/markc AjarnMark

    Davy, thanks for this post! This clears up some confusion I was having while studying examples of Entity Framework in action. We’ve been writing most of the plumbing code by hand in our systems for a while, and I have been reviewing O/R options recently, but have felt that there was something just not quite “right” with the examples, even though they worked. It seems what my gut was telling me is the same thing your post highlights and that David Martines alludes to: the examples are overly simplified to illustrate a particular feature, and do not reflect best practices. After reading several of your articles and a few by Craig Stuntz as well, I am much more comfortable with blending the good parts of our existing approach with the benefits that an O/R Mapper brings to the equation. Keep up the good fight (and great writing)!

  • Amine

    Hi All,
    Thanks Davy, greate Post.
    But I am wondering if all that was sayed here still true after the release of WCF Ria Services. surely the problem of Domain Changes still exist, but concerning data validation, WCF Ria Services, is offering a good way to share the validation logic between client and server, on top of DataAnnotations. Is there any one how agrees with me on that, and if yes, is that a personnal opinon or a result of real word experience.
    a second question : If I will try to apply what was sayed here, I will have to develop many of DTOs by Hand. suppose that in cases i am requiring a ConsumerDTO with only 2 fileds (lets say ID and Name for example) and later I am requiring ConsumerDTO1 with 5 fields to be showed and / or modified by end User, and later a third ConsumerDTO2 …. and so on… I know that with a little copy&paste it can be resolved, but i am not very convienced that this is the best way.
    Third and final Question : Still have not a clear idea about ResultTransformer mentioned with NHibernate. is there any further clarification ???

    Thanks in advance .

  • http://davybrion.com Davy Brion

    @Amine

    RIA Services doesn’t change my opinion about this at all. The more you share between client and server, the worse off you’ll be. It’s really as simple as that.

    And yes, it could lead to a lot of DTO’s. And yes, you have to write them by hand. People quickly say that it’s a lot of work, but seriously, if that takes up too much of your time than you’ve got bigger problems to worry about. And yes, you can have multiple DTO types which essentially all ‘show’ data from the same entity, but in different ways. But they all will be used in the most optimal manner… the one-size-fits-all approach simply doesn’t hold up for long.

    As for the ResultTransformer… you can use projections in your NHibernate Criterias and then let NHibernate hydrate your DTOs directly through the ResultTransformer, based on the projections. A quick google search should tell you more.

  • Datta Jadhav

    thanks for really great post.

    i agree with u.

  • Haipeng Jiang

    Could anyone please advise me on this design of a data integration solution:

    We’re doing a data integration project between a MS Sql Server database and a Microsoft CRM system (through its web services).

    We’re trying to build a “service” layer on top of the database. The design of our current solution is to use web services for CRUD, with xml being the format of data.

    Views are created to consolidate related tables into one entity, and we query these views, using the “SELECT * FROM someview ” + “For XML” to generated xml that will be returned from our web services.

    For update we’re trying to use the same approach – using SQL XML to map updates views, we have “instead of” triggers defined on top of these views, and in these “instead of” triggers we update the underlying tables.

    The views/triggers are generated by tools so don’t be too concerned with coding efficiency here…

    what do you think if we use WCF data Provider to publish a enterprise data model (essentially DTOs) ? p.s., we don’t have a BL layer for now, it’s all in the stored procedures!!!

    What’s your opinion on this / any better design? much appreciated!

  • Pingback: Learning Silverlight #3 – Getting Started « Danny-T.co.uk home of Dan Thomas, Moov2 Ltd

  • Pingback: DTO’s Should Transfer Data, Not Entities