MSDTC Woes With NServiceBus And NHibernate

19 commentsWritten on March 19th, 2010 by
Categories: .NET bugs, MSDTC, NHibernate, nservicebus

I’ve spent about 3 days trying to get something working that should’ve just worked.  I basically wanted some .NET code to use a distributed transaction to update some data in a database, and then publish a message on the service bus.  I want to do this in a distributed transaction because if something goes wrong, i want to roll back both transactions (the database change and the published message).  Normally, this should just work if you have MS DTC configured correctly.  On my machine, i enabled Network DTC Access, and allowed outbound transaction communication.  On the database server, Network DTC Access was already enabled and both outbound and inbound communication was allowed. 

Now the thing is, i’d either expect DTC to fail outright or to just work.  But it shouldn’t fail in one situation, and work in another.  On my machine, it failed in the following situation (which i’ll further refer to as Situation A):

  1. open a transaction scope
  2. open an nhibernate session
  3. hit the db
  4. publish a message through nservicebus
  5. close the nhibernate session
  6. complete and close the transaction scope

Step 4 and 5 could be switched around but it didn’t make a difference.  In Situation A, i always got a TransactionManagerCommunicationException with the following message:

Network access for Distributed Transaction Manager (MSDTC) has been disabled. Please enable DTC for network access in the security configuration for MSDTC using the Component Services Administrative tool.

Everyone who’s worked with MSDTC before probably knows that exception since it usually takes some fiddling with the settings to make things work.  The thing is, i was pretty sure that my settings, as well as the ones on the database server were correct.  Unfortunately, DTCPing didn’t confirm that since that too failed.

However, i also tried the following sequence of events (Situation B):

  1. open a transaction scope
  2. open an nhibernate session
  3. publish a message
  4. hit the db
  5. close the nhibernate session
  6. complete and close the transaction scope

And guess what.  That actually worked.  With full DTC transaction semantics.  The DTC statistics on the server confirmed that it was indeed using a DTC transaction, and if i made the code fail with an exception both the database action and the published message were correctly rolled back.

So the question is: why on earth does it only work when i publish a message before i hit the db?

During my investigation i noticed that in Situation A, the internal transaction that the transaction scope was using was a SqlDelegatedTransaction.  Which, if i’m not mistaken is an LTM transaction.  When trying to send a message to a message queue, the transaction manager tries to promote the current transaction to an OletxCommittableTransaction since the OleTx transaction protocol is required when using MSMQ (it doesn’t support LTM transactions).  For some reason, promoting the SqlDelegatedTransaction to a full DTC (OleTx) transaction fails on my machine.

In Situation B, the internal transaction is promoted to an OletxCommittableTransaction as soon as you try to send the message to a message queue.  Once it’s time to hit the DB, NHibernate nicely works together with the OletxCommittableTransaction and everything just works.

Now, i have no idea on earth why promotion of a SqlDelegatedTransaction fails, but after a long number of attempts and experiments to get it working correctly, i sorta gave up and figured i’d have to resort to a hack.  What i basically needed was for the transaction scope’s internal transaction to automatically be promoted to an OletxCommittableTransaction before i’d hit the database and without having to publish a dummy message at the beginning of the transaction.

I found one way of doing this which, while being a huge hack, is still relatively clean i think.  I wrote the following class:

    public class DummyEnlistmentNotification : IEnlistmentNotification

    {

        public static readonly Guid Id = new Guid("E2D35055-4187-4ff5-82A1-F1F161A008D0");

 

        public void Prepare(PreparingEnlistment preparingEnlistment)

        {

            preparingEnlistment.Prepared();

        }

 

        public void Commit(Enlistment enlistment)

        {

            enlistment.Done();

        }

 

        public void Rollback(Enlistment enlistment)

        {

            enlistment.Done();

        }

 

        public void InDoubt(Enlistment enlistment)

        {

            enlistment.Done();

        }

    }

 

Then, right after opening the transaction scope and before doing anything else, i do this:

    Transaction.Current.EnlistDurable(DummyEnlistmentNotification.Id, new DummyEnlistmentNotification(), EnlistmentOptions.None);

 

This basically tells the System.Transactions infrastructure that we’re adding our own Resource Manager to the current transaction.  And because it’s a durable Resource Manager, it now automatically promotes the internal transaction to an OletxCommittableTransaction and everything just works.  While our Resource Manager participates in the 2-phase-commit process, it doesn’t actually do anything.  It’s sole purpose is to force the creation of an OletxCommittableTransaction.

Like i said, it’s a hack but it’s still relatively clean.  I still have no idea why i needed to resort to this hack though… If anyone can shed some light on this, i’d highly appreciate it :)

Also, if you ever want to learn more about transactions in .NET or distributed transactions in particular, you really need to check out this article.  Without it, i probably wouldn’t have figured out what to do :)

  • Chris Geihsler

    Davy,

    We are actively working on getting WCF/NHibernate/NSB/TransactionScopes integrated in our application as well, so this post is appreciated.

    We’re having a problem where any time we publish an NSB message before the NHibernate session is flushed, the TransactionScope times out on Dispose() with a “The operation is not valid for the state of the transaction” exception being thrown. Have you run into this issue? Like you, in only happens when we publish a message after we’ve done some SQL work, but before we’ve completely flushed the session.

    We’ll give your IEnlistmentNotification “hack” a try and see if our problem goes away as well.

    Again, thanks for the posts. It’s nice to find someone else working on the exact same thing we are!

    -cg

  • http://davybrion.com Davy Brion

    @Chris

    i did run into that, but IIRC, it only happened when i enabled both Inbound and Outbound DTC access on my machine. i don’t need inbound access so i disabled that again, and then the problem went away. it certainly looks like a bug somewhere though (my guess would be NHibernate), though i didn’t focus on it since i was trying to fix my original problem :)

  • Chris Geihsler

    @Davy

    It worked! The IEnlistmentNotification trick made the hang on Dispose go away. I have a feeling it’s an NHibernate bug as well, so I’m going to try and create a test case that reproduces the issue.

    Do you know if there is any additional overhead to using an OletxCommittableTransaction instead of a SqlDelegatedTransaction?

    -cg

  • http://davybrion.com Davy Brion

    @Chris

    it’s most certainly more expensive, though i don’t have any numbers

  • Corey

    Have you tried the below property?
    SetProperty(Environment.TransactionStrategy,
    “NHibernate.Transaction.AdoNetTransactionFactory”)

  • Greg Menounos

    Have you tried creating the transaction scope using the EnterpriseServicesInteropOption.Full option? That should immediately create a MSDTC transaction rather than trying to escalate to one later.

    However the fact that DTCPing is failing does indicate that you’ve got a network or DTC configuration issue of some sort.

    – Greg

  • http://davybrion.com Davy Brion

    @Corey

    no, that makes NHibernate use low-level ado.net transactions, which don’t participate in DTC transactions (unless i’m very mistaken)

    @Greg

    i’ll have to try that with the actual code at work, but if i try it in a simple test, the internal transaction is not an OletxCommittableTransaction right after instantiating the transaction scope.

    and yeah, i’m sure there is some kind of network or DTC configuration issue… which makes the fact that it actually works when i already have an OletxCommittableTransaction all the more weird IMO :)

  • Vadim Kantorov
  • Pingback: Forcibly creating a distributed .NET transaction | Build. Optimize. Make Awesome.

  • http://www.make-awesome.com David Boike

    I don’t think this method is a hack – I think it’s actually pretty elegant. I like it and it helped me with a similar problem I was having with NServiceBus.

    The only thing is it’s a lot to remember how to do, so I wrapped it all up in an extension method so that it’s really easy.

    Thanks! You saved me a lot of time and heartache!

  • http://jonathan-oliver.blogspot.com Jonathan Oliver

    I’ve been posting about two-phase commits and their usage relative to NServiceBus recently. The issues that people are experiencing above are symptoms of a much bigger problem, one that we should actively be attempting to solve:

    http://jonathan-oliver.blogspot.com/2010/04/nservicebus-distributed-transaction.html
    http://jonathan-oliver.blogspot.com/2010/04/extending-nservicebus-avoiding-two.html

  • Quang

    Does anyone think using DTC should be avoided all together?

    This means publishing of events is one transaction and database calls shall be in its own transaction scope.

    Seeming the service bus should only use as a communications mechanism, and not affect your service level code?

  • http://davybrion.com Davy Brion

    @Quang

    i don’t know… if an event is published _because_ of a database call (insert/update/delete), then i’d really like to keep them in the same transaction

  • Pingback: NServiceBus Distributed Transaction Woes

  • http://profiles.google.com/david.boike David Boike

    How did you come up with the Guid E2D35055-4187-4ff5-82A1-F1F161A008D0 and does it bear any special significance? Should it really be a static readonly?

    I’ve been using this method for quite some time with my NServiceBus endpoints with a lot of success, but I tried using it in a web application environment, and started having sporadic issues:

    System.Runtime.InteropServices.COMExceptionA
    resource manager with the same identifier is already registered with the
    specified transaction coordinator. (Exception from HRESULT: 0x8004D102)I wonder if this occurs when 2 webapp threads try to use the dummy enlistment simultaneously. I wonder, then, if using a new Guid each time would be safe. After all, we’re just forcing a distributed transaction, we’re not really *DOING* anything.Thoughts?

    • http://davybrion.com Davy Brion

      i ran into that as well when i had 2 web apps in the same application pool… if app A used that dummy enlistment before app B, app B would run into that exception when it first tries to use it

      i wasn’t sure whether using a new guid every time would be safe so i just used a guid per web app instance (just a call to Guid.NewGuid in the static initializer of the property) and there were no more issues

      • http://profiles.google.com/david.boike David Boike

        I don’t have more than one application in the application pool I was using, so I did some more research and ran into some anecdotal evidence regarding SQL Server and distributed transactions. The upshot was that if you cloned a SQL Server instance without running sysprep, the two instances would have the same ID and the distributed transaction coordination system as a whole would get confused.

        I realized that my load balanced environment could essentially be doing this – requests to the same app on two different servers would be enlisting identical IDs originating from different servers. I believe your suggestion of using the NewGuid() in the static initializer should probably alleviate this – I’ll let you know if I run into any more troubles.

  • Bhoomi Kakaiya

    +1000 for the hack.. M scratching my head since 5 days!  Ur hack worked. Lovely…

  • Bhoomi Kakaiya

    I am facing an issue with this hack intermittently. 1st transaction of the day fails giving this error :
    {System.Runtime.InteropServices.COMException (0x8004D00E): The transaction has already been implicitly or explicitly committed or aborted (Exception from HRESULT: 0x8004D00E)   at System.Transactions.Oletx.ITransactionShim.Export(UInt32 whereaboutsSize, Byte[] whereabouts, Int32& cookieIndex, UInt32& cookieSize, CoTaskMemHandle& cookieBuffer)   at System.Transactions.TransactionInterop.GetExportCookie(Transaction transaction, Byte[] whereabouts)}

    Immediate submit will go through. Then if the system remains idle for some time, then again it will throw same error for 1st save.

    Pl help..