With a sizable number of companies offering SaaS software. Architects are faced with typical problems of how to manage their customer's data.
Questions like, Data security, backup, recovery, cost(ROI) are typically what an architect needs to answer when offering a strategy to achieve multitenancy.
Whenever my peers ask me "What is the best database strategy to achieve multitenancy".
My answer is "It depends!"
As far as databases are concerned there is only a finite set of approaches that can be used to achieve multitenancy and these approaches would take you from a isolated to a shared database architecture. The reason I go from isolated to shared architecture is because in the past decade many software companies that offered "single installation" products have modified their offerings to make them "SaaS" compatible and as they try to manage cost they knowingly or unknowingly move from an isolated database to a shared database architecture.
Separate database
Each customer/ tenant has a separate database. This is probably the most easiest approach to mutitenancy. The customers data is isolated. There is no way another tenant can access or corrupt the data. In case of customization needs the data model can be extended as per your customers requirements. Backup and restoring is straight forward and there is no impact on the another tenants data. Almost zero or minimal application logic is required to achieve isolation. This however is a costly approach as you add more and more customers to your offering. Hardware and processing costs will be higher. A single server can only hold a finite number of databases and so more hardware will be required. And you would need a substantial budget and strategy as you start gearing up for horizontal scalability. This strategy is most suited for customers who are willing to pay a premium for added security and customization.
Shared database separate schema
This approach is similar to the separate database approach. It provides logical isolation of data. Each tenants data can be accessed using the schema name and the table name. There is some application logic required to resolve the schema for the entity. You can accommodate more customers per server as compared to the separate database approach. Data restoration would provide some problems when it comes to restoring a specific customer's data. This approach is suitable when you have a small number of tables that are used by your application.
(A variation to this model would be a shared database, shared schema and separate tables per tenant. Additional logic would be required so resolve which set of tables to use for a specific customer).
This approach while costly is still cheaper to implement as compared to the separate database approach. This strategy is suited for customers who accept co-location of their data.
Shared database shared schema (and shared tables)
In this model the customers data is stored in same set of tables. A tenant id column is required in all relevant tables to identify a tenant's data.
This strategy is the most cost efficient as far as your hardware budget is concerned. However there would be extensive application logic required to ensure security and separation of data. Added care from a software architecture perspective has to be taken so as to ensure that a tenant can NEVER see or corrupt another tenant's data.
Data restoration in case of a failure can be big pain point especially if you are trying to restore a specific tenant's data. This would involve importing your backup into a temporary database and then copying over the relevant tenants data into the production database. This could be a fairly complex (or as a friend of mine would say "Non Trivial") and time consuming task.
As far as performance is concerned, one customer's volume could potentially impact another customer.
As you keep adding customers you could potentially reach OLAP volumes requiring OLTP operations. Hence your indexing, partitioning, archival strategies play an important role. But more on this later..
Irrespective of the approach taken, it is important to understand the trade offs being made. It will also play an important role in deciding your strategy as far as scalability, extensibility configuration capabilities are concerned.
Questions like, Data security, backup, recovery, cost(ROI) are typically what an architect needs to answer when offering a strategy to achieve multitenancy.
Whenever my peers ask me "What is the best database strategy to achieve multitenancy".
My answer is "It depends!"
- It depends on the domain that you are targeting. Or rather the data that your software is going to manage. The isolation that is required.
- It depends on the dollars your business is willing to spend (also read as licensing fees, hardware at your disposal, ability to quickly ramp up production capacity).
- It depends on how fast you need to provision or setup a new customer.
As far as databases are concerned there is only a finite set of approaches that can be used to achieve multitenancy and these approaches would take you from a isolated to a shared database architecture. The reason I go from isolated to shared architecture is because in the past decade many software companies that offered "single installation" products have modified their offerings to make them "SaaS" compatible and as they try to manage cost they knowingly or unknowingly move from an isolated database to a shared database architecture.
Separate database
Each customer/ tenant has a separate database. This is probably the most easiest approach to mutitenancy. The customers data is isolated. There is no way another tenant can access or corrupt the data. In case of customization needs the data model can be extended as per your customers requirements. Backup and restoring is straight forward and there is no impact on the another tenants data. Almost zero or minimal application logic is required to achieve isolation. This however is a costly approach as you add more and more customers to your offering. Hardware and processing costs will be higher. A single server can only hold a finite number of databases and so more hardware will be required. And you would need a substantial budget and strategy as you start gearing up for horizontal scalability. This strategy is most suited for customers who are willing to pay a premium for added security and customization.
Shared database separate schema
This approach is similar to the separate database approach. It provides logical isolation of data. Each tenants data can be accessed using the schema name and the table name. There is some application logic required to resolve the schema for the entity. You can accommodate more customers per server as compared to the separate database approach. Data restoration would provide some problems when it comes to restoring a specific customer's data. This approach is suitable when you have a small number of tables that are used by your application.
(A variation to this model would be a shared database, shared schema and separate tables per tenant. Additional logic would be required so resolve which set of tables to use for a specific customer).
This approach while costly is still cheaper to implement as compared to the separate database approach. This strategy is suited for customers who accept co-location of their data.
Shared database shared schema (and shared tables)
In this model the customers data is stored in same set of tables. A tenant id column is required in all relevant tables to identify a tenant's data.
This strategy is the most cost efficient as far as your hardware budget is concerned. However there would be extensive application logic required to ensure security and separation of data. Added care from a software architecture perspective has to be taken so as to ensure that a tenant can NEVER see or corrupt another tenant's data.
Data restoration in case of a failure can be big pain point especially if you are trying to restore a specific tenant's data. This would involve importing your backup into a temporary database and then copying over the relevant tenants data into the production database. This could be a fairly complex (or as a friend of mine would say "Non Trivial") and time consuming task.
As far as performance is concerned, one customer's volume could potentially impact another customer.
As you keep adding customers you could potentially reach OLAP volumes requiring OLTP operations. Hence your indexing, partitioning, archival strategies play an important role. But more on this later..
Irrespective of the approach taken, it is important to understand the trade offs being made. It will also play an important role in deciding your strategy as far as scalability, extensibility configuration capabilities are concerned.