Stop Exposing Collections Already!

20 commentsWritten on October 28th, 2009 by
Categories: Code Quality

Every time i see code with properties that return List<T>, IList<T>, ICollection<T> or any other of the types which allow you to modify the contents of a collection, i cringe a little. In most cases, the only reason to expose the contents of the collection is for read-only usage. You want people to be able to use the objects in the collection, but why on earth do so many developers continuously enable someone else to modify the internal state of the collection that they are holding?

The problem that i have with exposing a collection's values through a type which enables other pieces of code to modify the state of that collection is that it is effectively a pretty big breach of encapsulation. You no longer have sole control over the contents of the collection. Anyone can effectively add and remove elements from the collection, or even clear them entirely, without your object knowing about it. Obviously, this can cause subtle side-effects in the behavior of your object. Which can (and sooner or later will) lead to some quality time between you and your debugger. You also no longer have the ability to easily change the type of collection you're using which might prevent certain future improvements that you can make to a class. And in case you're wondering why you'd want to change the type of collection, check out an example where the benefit was huge.

In a lot of cases, people do this because they don't know any better or because it's the easiest thing to do. I can't tell you how many times i've seen people expose List<T> simply so that some calling code could use the ForEach method of that list. How long does it take to write a ForEach extension method for IEnumerable<T>? Granted, it should've been included in the .NET framework already but just because it's not doesn't mean you can't do so.

Here's my philosophy on exposing collections: if i'm writing a class that is responsible for holding a collection, i take away every ability to directly modify the contents of the collection. I typically expose the collection through an IEnumerable<T> property, and if necessary i will provide a specific AddXxx and RemoveXxx method. That's it. You can do every single thing you need to do with the collection through IEnumerable<T>'s extension methods and through the AddXxx and RemoveXxx methods that i provide. Meanwhile, i have every ability to choose the collection type that i want to use, or to modify it later on. Furthermore, i can eliminate the possibility of side-effects in my code due to unexpected externally caused mutations in the state of the collection.

The only situation in which i think it's OK to return a real collection type is if you have a method which is responsible for creating a list, but is not supposed to hold on to it. In that case, the created collection is clearly intended to be used in whatever way makes sense by the consumer. After all, you created the collection specifically for the consumer so they can do whatever they want with it. But if you need to hold a reference to the collection for some reason, then you probably also have more responsibilities than merely creating the collection. In that case, the collection is yours and you should encapsulate it as such.

  • http://www.kenegozi.com/blog Ken Egozi

    I second everything, however I’d replace “IEnumerable” with “T[]“.
    IEnumerable implies “infinite” and “lazy-evaluated”, and many times this is not the case.
    it’s good within contexts of building linq (or other) queries, that benefit directly from the lazy loading. once the data is loaded in-memory, an Array is the simplest representation of the data.

  • http://gabriel.lozano-moran.name Gabriel Lozano-Moran

    Basically this only becomes a problem when you are creating a public interface that will not only be consumed by your code. Then again exposing a collection as IEnumerable or even as an Array as Ken mentions, it only provides you shallow immutability. When returning an array, you could decide to return a Clone() but the again Clone() = shallow copy.

    Better would be to not expose a collection at all. If you need to do see then it is time to reconsider the design.

  • http://awkwardcoder.blogspot.com/ Awkward Coder

    Total agree all exposed collections should be immutable, why not expose ReadOnlyCollection? not ideal as it’s not an interface but it is explicit in the intention.

  • http://davybrion.com Davy Brion

    a ReadOnlyCollection is basically a collection that only allows enumeration or access by index… both are things that are possible with IEnumerable and you avoid being tied to a specific implementation

  • http://blog.ashmind.com Andrey Shchekin

    I strongly disagree on AddXXX/RemoveXXX methods. Should I re-implement my AddRange/RemoveRange/RemoveWhere methods for all objects just because original developer haven’t thought of these scenarios? What about Clear?

    I think providing rich API and managing side-effects (which is not hard, using a event-sending or custom collection if really needed) is a sign of better design than limited modification points. Just because in second case it is much easier to forget some useful method and much harder to work with this API in a generic way.

  • Pingback: Reflective Perspective - Chris Alcock » The Morning Brew #464

  • http://dnagir.blogspot.com Dmitriy Nagirnyak

    You have valid points and concerns. But I do not think you are 100% correct on these:

    > “Anyone can effectively add and remove elements from the collection, or even clear them entirely, without your object knowing about it.”
    > “I can eliminate the possibility of side-effects in my code due to unexpected externally caused mutations in the state of the collection.”
    True. Using IEnumerable you prevent *collection changes*, but there is no way you can prevent *modification of objects themselves*:

    customer.ReadOnlyListOfBills.First().Amount = 0; // Still customer is not aware of the change

    To prevent this, all the objects must be linked between themselves and subscribed for any possible modifications. This leads to the whole nes set of issues.

    So generally speaking you can minimise possible side-effects, but you cannot totally eliminate them.

    And yes, the wired thing about exposing IEnumerable is that we have to use Count() extension method to obtain number of elements which is ineffective (as it iterates the collection).

    What would you do in this case? Expose a new property:

    customer.CountOfBills // returns Count from internal collection

    Cheers,
    Dmitriy.

  • http://awkwardcoder.blogspot.com/ Awkward Coder

    @Davy – obivious now you’ve pointed it out ;)

    As for the entities in the collection being mutable, I was thinking of having using a ‘ReadOnly’ version of an entity that uses aggregation of the entity to expose the state but not any behaviour. These would then be returned as the contents of the collection.

    e.g.

    public class ReadOnlyOrder
    {
    private Order _order;

    public ReadOnlyOrder(Order order)
    {
    _order = order;
    }

    public int Id
    {
    get { return _order.Id; }
    }
    }

  • http://kearon.blogspot.com Sean Kearon

    I strongly disagree also. I consider AddXXX/RemoveXXX methods noisy and very much prefer direct interaction with the collections. I find it a more natural approach. As Andrey mentioned, it also tends toward bloating the API.

    My collections have changing and changed events (can be cancelled in the case of changing), business logic is handled through configurations that use these events.

  • Steve

    @Dmitriy – That’s not entirely true, the Count() extension method will first check if it can cast to IList and then access the Count property, otherwise yes it will enumerate to get the count.

  • http://dnagir.blogspot.com/ Dmitriy Naginryak

    @Steve, ok, that is a poimnt.
    But exposing IEnumerable there is no any guarantee at all about the actual implementation (IList, ICollection, IDictionary, whatever?) and we should know how to use the class’ interface – using Count() extension or some other way. Otherwise there is no point of abstraction and, as the post’s idea is, in ability to replace the implementation.

    But that is not my main point. Points are:
    1. You just cannot prevent possibility of side-effects only by exposing non-modifable collection.
    2. It is bad to expose CLASS, instead INTERFACE should be used. I agrre that all List should be replaced by IList or similar but you cannot use IEnumarable as a guard for protecting collections/objects from modifications.

    Cheers.

  • http://dnagir.blogspot.com/ Dmitriy Naginryak

    @Awkward Coder,

    Having read-only objects?? Come on… You must have too much free time.

    There is only one framework that tracks all the possible changes: http://www.capableobjects.com/

  • Philip

    I see your argument for List. But IList, ICollection? If you’re exposing a collection and providing your own Add/Remove methods, why wouldn’t you simply make your class (or a collection class) implement IList?

    That’s what the interfaces are there for, so that you can have complete control of the collection (like you want), but still allow people to use your collection in a standard and natural way.

    Personally, If I see a component that forces me to use AddWidget, GetIndex, RemoveWidget, etc., and doesn’t implement IList I wonder if they designed the interface for me (the consumer) or for themselves.

  • http://awkwardcoder.blogspot.com/ Awkward Coder

    @Dimitriy – you got a point there ;)

    @Philip – may be there is more behaviour than just inserting into a list when AddWidget, RemoveWidget is called…

  • http://www.jphamilton.net J.P. Hamilton

    @Philip

    To implement IList, don’t you also need to implement ICollection as well as IEnumerable? That’s about 13 methods. I can think of many scenarios where that is just too much behavior. For instance, inserting and removing elements at specific indexes. This is not only about preventing modifications to collections, but also about encapsulation, and the ISP to some degree.

  • Hack A Day

    For the developer who thinks T[] is better, T[] is still mutable and doesn’t solve the issue. All of the above is a matter of style, anyone who thinks “my way is the best” should be taken out in back a shot. Every situation is different, no one solution fits all cases. It’s all a combination of easy-of-use, scalability and manageability. The best APIs are the ones that fit nicely for which they are intended, scale when needed and are easy to debug.

  • http://morten.lyhr.dk Morten Lyhr

    I think this is another case of data on the inside vs. data on the outside.

    If I have to calculate the Sum of several order lines or validate the order lines, do the class need to expose its internal state?… not really.

    If I have a class that I need to expose to the UI, it does not really matter if it is a T[], IEnumerable, List, IList. Why ? Because the UI need to know the internals of the class in order to render it correctly.

    Perhaps your entity is responsible for more than one thing? Both being used by UI, where it is treated as data, and by other classes where it is used as and encapsulated object?

    Also when adding the AddXXX and RemoveXXX, you have all but said that your class is some sort of IList, why not implement the interface?

  • herzmeister_der_welten

    We need to use the System.ObjectModel.ReadOnlyCollection (or a custom read-only type for that matter) to wrap our inner collection either way. We can publish it as an IEnumerable if we like. But we must wrap it up, otherwise the caller obviously can cast it back to a List.

  • http://grantpalin.com Grant Palin

    I agree with the principle, but there is a caveat. I have some code I would like to fix up, a class with a number of members. Some of those members are Lists of other types. The trick is, the main class, and the ones it uses, are persisted through XML serialization, which handles public members and/or properties. Collections may not be exposed via interfaces (e.g. IList) because the serialization will not work correctly. So at this point I have two properties for each collection member: one for consumption by code, and one for use by the serializer. Not an ideal situation.

    Any suggestions on how to remove the duplication?

  • Tom

    @Grant Palin: In my opinion, the code and time savings of using the XML serializer far outweigh the disadvantages.

    Instead of shunning List properties, it is smarter to instead get used to them. In most cases they are harmless. In my few years’ of .NET experience, I have yet to run into a problem such as having nefarious code modifying a list when it shouldn’t. I understand the argument against them, but I think it is somewhat academic.

    Now I wouldn’t advocate putting a List property in a public API used by a lot of people, but using them in general code is okay in my opinion, especially when you are using XML serialization.

    If I were in your shoes, I would eliminate the duplication by removing the redundant interface property types. You haven’t gained anything by putting them in – after all, the lists can still be modified by the other property – and it messes up the code a lot.