The Inquisitive Coder – Davy Brion's Blog

Trying to walk that thin line between intelligence and ignorance

Is This A Good Approach For Multi-Tenancy Or Not?

Posted by Davy Brion on December 17th, 2008

Here’s the situation: we have an application which is used by multiple customers. The application consists of various functional modules. Each customer can use one or many (or obviously all) of these modules. In the past we used to deploy this application for each customer. The configuration file contained various settings that could differ from customer to customer and obviously, each deployed version had its specific configuration file depending on the settings required for each customer. This approach worked, but it was not really ideal.

Multi-tenancy to the rescue! Not sure if this counts as the official definition of multi-tenancy, but wikipedia defines it likes this:

Multitenancy refers to a principle in software architecture where a single instance of the software runs on a software-as-a-service (SaaS) vendor’s servers, serving multiple client organizations (tenants). Multitenancy is contrasted with a multi-instance architecture where separate software instances (or hardware systems) are set up for different client organizations. With a multitenant architecture, a software application is designed to virtually partition its data and configuration so that each client organization works with a customized virtual application instance.

Sounds like this is exactly what we’re looking for. So i’ve recently been working on changing the application to support this, and i came up with the approach i will outline in the rest of this post. The approach does not strictly comply with the definition above, but it does seem to comply with Ayende’s definition of it. I’d like to get some feedback from you guys as to whether you believe this approach is good or not, what could be better, what we need to keep in mind, etc…

First of all, i would like to point out that this application already exists. We already have a database with about 200 tables in it and there are about 150 pages (it’s a web app obviously). So obviously, we can’t just change everything to make it fully compliant with ‘the definition’.

Let’s start with the database. Instead of trying to use one huge database to keep all of the tenants’ data, which would require modifications in a large amount of the existing tables, we’ve decided to go with a separate database for each tenant. Each database will have an identical structure, regardless of whether some functional modules are used or not by a particular tenant. There is also one ‘master’ database which contains tenant-specific data. Basically it contains all of the configuration settings for each tenant, including connection strings to each tenants’ specific database. The connection strings do not contain user names and passwords as we will use Windows’ Integrated Security to connect to the specific databases (more on that in a bit).

Now for the application itself. Our first idea was to have one actual instance of the application, and the application would be able to determine the current tenant for each request based on the URL of the incoming request. Tenant A would have an URL like tenanta.ourproduct.com, tenant B would have tenantb.ourproduct.com etc. The application would basically use the URL to get the correct tenant-info from the master database (note that the connection string to the master database would be the only connection string we’d have in the application’s configuration file) and it would then use the tenant-specific database for each request with the tenant’s application URL.

The idea was to use an NHibernate SessionFactory for each specific tenant. You obviously can’t use just one because you’re using multiple databases. But we also use NHibernate’s 2nd Level Cache, which is problematic when you’re using multiple SessionFactories. The 2nd Level Cache is great, but it doesn’t differentiate between multiple SessionFactories. So if you have the result of a specific query cached, you could get that data back for a different tenant than the one the data actually belongs to if the query’s parameters happen to be identical. Btw, if i’m wrong about this please let me know.

So then we figured we could still use one physical deployment (as in: one physical folder where the application is located), and then we’d use multiple virtual directories in IIS which all point to the same physical folder. We’d basically have one virtual directory (and one instance of the application) per tenant. We still have the benefit of one physical deployment, and because each tenant’s ‘virtual’ application runs in its own AppDomain, the caching problem is no longer an issue. Each virtual directory is configured for the URL of its tenant and the running instance of the application can still retrieve the actual tenant’s data from the master database based on the URL. Each virtual directory can run in its own Application Pool which can be set up to run under a Windows account which is specific to the current tenant. This allows us to use Integrated Security when connecting to the tenant-specific database. So each tenant would only have access to the master database and its own specific database.

Obviously, this approach would require a bit more effort in our ‘management module’ (which is yet to be written). Whenever we need to add a tenant, we not only have to create a specific database for the tenant, we’d also have to create a new windows account for the tenant, set up a new virtual directory, and an application pool if each tenant indeed runs in its own application pool under it’s own Windows account.

Apart from the management module, i’ve modified the application to work with this approach in just a few days work. But, nothing is final just yet… so now i’d like to hear from you guys whether this is a good approach or not. What are the possible problems we need to take into account? Is this really still multi-tenancy? I guess the opinions on this will be divided, but it does largely solve the issue of multiple deployment and multiple configuration files. True, the configuration is now mostly in the master database and those settings need to be maintained as well. However, they could now be maintained without having to redeploy the application which is a plus. Some settings could even be modified by the tenant itself (at least, the users who have the proper privileges to do so).

So anyways, i’m awaiting your feedback :)

UPDATE: i’ve been told that i can simply use a different cache region for each tenant in NHibernate’s 2nd Level Cache… so i could avoid having to use multiple instances of the application. Still not sure on whether one instance for all tenants would be better than one instance for each tenant though.

4 Responses to “Is This A Good Approach For Multi-Tenancy Or Not?”

  1. Jeremy Gray Says:

    “we’ve decided to go with a separate database for each tenant”

    Then you aren’t doing multi-tenanting, at least in that most if not all descriptions of multi-tenanting that I have encountered have required that multi-tenanting be taking place within a database instance (as doing multi-tenanting up at the web tier is the easy part regardless of how the database instances are set up.)

    “Each database will have an identical structure”

    Good call. Each and every time I’ve seen people go down the path of having different schema per instance it has become a very large headache.

    “Our first idea was to have one actual instance of the application”

    Also a good call in general, but make sure to at least consider being able to configure your load-balancing so as to focus a given client’s users on a given subset of servers so as to maximize web tier in-memory cache coherency and to keep from overrunning your database servers’ ability to accept incoming connections for each schema instance. You’re going to need to monitor, analyze, and adjust a number of balancing factors between things like the number of web servers and the number of database servers, clients per web server, maximum db connections per web-server-client-instance, the number of db schema instances per physical db server, etc.

    “We still have the benefit of one physical deployment, and because each tenant’s ‘virtual’ application runs in its own AppDomain, the caching problem is no longer an issue.”

    But memory usage will then be an issue. This will in turn push you towards installing components in the GAC so as to get shared code in memory. This in turn will push you towards having to deal with annoying component deployment and versioning issues. I’m not going to suggest that this is a horrible route, or that the alternatives are perfect, but do be aware that there be dragons down the path of separation by AppDomain. :)

    Jeremy

  2. Ayende Rahien Says:

    All databases having the same scheme – Take care of problematic versioning in this scenario. You are likely to run into situations where you cannot change the system all at once.

    tenantb.ourproduct.com – that is the most common way to do that, the other way of doing that is to have a common login page and redirect base on the user object (shared across all tenants).
    The 2nd Level Cache is great, but it doesn’t differentiate between multiple SessionFactories. – not quite accurate, you can define regions, and IIRC, it uses the session factory name as part of the cache space.
    I wouldn’t go with the virtual dir path, myself.

  3. Mai Kalange Says:

    Sounds familiar to a problem that we had to solve only that we did not have the benefit of NHibernate.
    What we did have was a single DB instance, the Enterprise Caching Block and DAAB. Yes the good old days ;)

    The application had to service different mobile network operators’(the tenants) devices as part of the initial phone configuration workflow. Each tenant had a contract definition which included handsets that they supported, associated device settings(as xml documents).

    Access to the application was via the following mechanism

    configuration.service.com?contractId=00000000239. This triggered a FrontController that went about initialising the session and loaded core data in the cache and keyed it by contractId, peripheral data was lazy loaded.

  4. Rohland de Charmoy Says:

    Hi

    I have been involved in a few applications that require this type of architecture. For the database I would suggest the following (I am assuming you are using SQL Server):

    1. Create only one instance of the database with all the customers data (see below for more information regarding partitioning).
    2. Add an account column to each table (you can do this with script) with a default value of suser_sname() (This function returns the name of the currently logged in user, basically you are implementing row-level security)
    3. For each customer, create a separate login to SQL Server
    4. Create a view for every table that looks like this:

    SELECT col1, col2, col3 FROM mytable WHERE account = suser_sname()

    5. Update your DAL to query the views and not the tables. Inserts, updates, deletes will still work.
    6. Develop some code so that when a user logs in, it checks what their associated customer login is, subsequent database calls should be made with this login.

    This works really well. In terms of maintenance, you only ever work with one database. If you are fixing a bug or checking out an issue for Customer B, you can log into the database with customer B’s associated login. If you run a query that looks like this “DELETE FROM vw_mytable” you will only every affect Customer B as the views have effectively partitioned your context. I would suggest adding each customer’s account to a database role which only allows inserts, update, deletes for the relevant views (ensures you haven’t missed a table).

    Hope this helps.
    Rohland

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>