Multi-Tenancy- 101
Karthick Chandrasekar
?? Passionate Payments Executive | Fintech Innovator ?? | Driving Payment Business Growth??
What is Multi-Tenancy?
Multi-Tenancy is where the same application is run as multiple incarnations or tenants on the same server. That server may be catering for differently defined instances of that application. Multi-tenancy is an architecture in which a single instance of a software application serves multiple customers. Each customer is called a tenant. Tenants may be given the ability to customize some parts of the application, such as the colour of the user interface (UI) or business rules, but they cannot customize the application's code.
Multi-tenancy can be economical because software development and maintenance costs are shared. It can be contrasted with single-tenancy, an architecture in which each customer has their own software instance and may be given access to the code. With a multi-tenancy architecture, the provider only has to make updates once. With a single-tenancy architecture, the provider has to touch multiple instances of the software in order to make updates.
In cloud computing, the meaning of multi-tenancy architecture has broadened because of new service models that take advantage of virtualization and remote access. A software-as-a-service (SaaS) provider, for example, can run one instance of its application on one instance of a database and provide web access to multiple customers. In such a scenario, each tenant's data is isolated and remains invisible to other tenants.
Multi-Tenancy Use Case:
First, we should discuss why someone might come to implement multi-tenancy. The most common use case is where the Multi-Tenancy engine is driving business processes for a SaaS application. In this case, each tenant is operated on by a completely different client. When a client logs into their tenant they’ll of course only be expecting to see their own data and functions. Nothing a client can do in their tenant should ever be able to interfere with any other tenant and so data isolation is paramount. So why is multi-tenancy so useful in a case like this? Scalability. With multi-tenancy, granting new client access to a tenant has the lowest cost and quickest implementation time. The alternative is to deploy and maintain a brand new server, database and software installation each time a new client joins.
Data Isolation vs. Resource Consumption:
Each type of multi-tenancy implementation walks a line between data isolation and resource consumption. To have the maximum possible data isolation (i.e. a single database per tenant) requires the use of a lot of resources and is generally quite wasteful of those resources – and so this is often avoided. On the other end of the spectrum is of course perfectly optimized resource consumption (i.e. all tenants sharing the same database and tables) but in this case, all it takes is a lazily written database query to completely break down the data isolation between tenants. The ideal scenario is of course to have the most robust data isolation with the fewest possible wasted resources.
Tenant Identifiers:
Starting on the scale of least data isolation is Tenant Identifiers. This means that all tenants run separately but write data to exactly the same tables. Each running tenant requires a unique marker in order to gain access to their tenant-specific data. This means that all queries and tables must have this tenant identifier added or data isolation will be lost.
There are some advantages and disadvantages to this approach. The main advantage is that because the data for all tenants is stored on the same tables, it makes writing queries for data across a number of tenants quite easy. This can be useful if reporting and analytics on your tenants is a strong focus. It is also the most resource-efficient implementation of multi-tenancy. To deploy definitions for a single tenant, the tenant identifier has to be set on the deployment. If no tenant identifier is set, then the deployment and its definitions belong to all tenants.
In this case, all tenants can access the deployment and the definitions and usually a new process instance is associated with a specific tenant when it is started by one of the tenant’s users. That means that working with tenant identifiers enables using shared definitions, which in turn reduces a lot of deployment and operations overhead in such scenarios.
One of the new options that come with shared definitions include scenarios where a service provider might want to deploy a shared process definition for all tenants which then calls tenant-specific DMN tables or call activities. Another option could be using a shared definition for the majority of the tenants and still being able to provide specific variants of that definition to single tenants. The easiest way of deploying a definition to a specific tenant is to add the tenant ID to deployment descriptor of your process application.
The disadvantages of this approach relate to data isolation and query performance. Because each tenant’s data is only distinguished by a marker, that identifier must be included in every query performed by the process. Because it can be cumbersome to pass the tenant ID to every API call, we use transparent access restrictions for tenants, which allow omitting the tenant ID in queries once an authenticated user has been set. In order to achieve that, a list of tenant IDs needs to be provided when setting the authentication programmatically. Alternatively, the authentication provided by the REST API.
It’s also possible that performance could suffer from the introduction of tenant markers especially when querying process variables. For internal queries, performance has been highly optimized by our developers, but that aspect has to be kept in mind when writing own queries. Probably the biggest disadvantage with this approach is the risk of disclosing data that belongs to other tenants. Bugs or careless application programming is all it would take for data to be returned to the wrong tenant.
Schema Per Tenant:
The Schema per Tenant architecture is the recommended approach when it comes to multi-tenancy implementation for scenarios where a higher level of data isolation is required. With this approach, each tenant has its own process engine and each processing engine shares the same database, but not the same tables. This is achieved by creating a number of schemas within a single data source. Each tenant connects to its own schema. The result is that data isolation is guaranteed by the process engine and at the same time database resources can be shared between the various tenants.
We can say that data isolation is guaranteed because of how the process engine’s internal architecture implements queries. Every query made by the process engine begins with a database prefix. In the code snippet below you can see the variable prefix before the name of the table. If a tenant has been set up for this process engine it would be added to the query automatically. If there is no tenant the prefix variable is empty and will simply query the database tables directly.
Another important mechanism to understand is the job executor. This is fundamental to how resources are efficiently managed between tenants. As described in the diagram above, when a process engine requires a job executed it writes it to a table on its database. One or more job acquisition processes can be running and they can pick up jobs from the table. It then passes the job to the executor.
There is one job executor, which is in contact with all tenants, it has control over the thread pool and so picks a thread to run the job for the process engine. What this achieves is that all threads are available equally to all tenants. Tenants that require a lot of threads are not going to be slowed by a per-process engine allocation. While tenants that aren’t experiencing such high usage don’t have a pool of unused threads sitting idle.
For multi-tenancy reporting specifically, where no existing software is in place, I would suggest looking into implementing a Business Intelligence (BI) solution. The benefit of which is that the data from all tenants are uploaded independently to a single location and manipulated within the BI itself. The alternative, if setting up a BI solution isn’t appealing, is writing queries directly to the history tables that would combine the data from various tenants. It should be noted however that these queries would require a good deal of work to ensure they have optimal performance.
Conclusion
Multi-tenant SaaS architecture makes for long-term benefits for all users, be it in terms of maintenance, cost of investment, or development. However, no matter which architecture you use, it entails a few challenges which you need to identify and take in your stride in the long-run.
As you reap the benefits and face the challenges that this architecture presents, there are things you will need to figure out in the context of your organization and the goals you are aiming at.
Senior Solutions Architect
3 年Karthick Chandrasekar Thanks for the article. Since your background is Banking and Finance, are there any compliance restrictions that favor one logical separation model over the other? Which approach have you adopted and why? From reading the article, it looks like your team went for a tenant identifier approach, which is weaker data separation as a trade off for maintenance and cost saving.