The future of multi-tenancy #6710

LukasKalbertodt · 2025-05-15T16:14:55Z

LukasKalbertodt
May 15, 2025
Maintainer

Hello community,

as part of the data model discussion, multi-tenancy came up again. We talked about it a bit, identified a few things that could be improved, but it was also questioned if we can potentially get rid of it completely (gasp!). Therefore we decided to get feedback from the community on this.

If you run an Opencast with multi-tenancy, please take the time to read this post and write a reply. In particular, please describe your use case and explain in detail why you use multi-tenancy as opposed to just having multiple Opencasts (potentially in a Kubernetes cluster). Knowing the exact reasons helps us to find a good solution for everyone. ^{Note: also consider the XY problem and try to find your root reasons.}

Problems & reasons to get rid of it

Opencast's implementation of multi-tenancy is not great:
- Isolation between tenants is sometimes broken.
- All code in Opencast has to separately handle multi-tenancy, making it more complex and error-prone. The way it is implemented, the complexity spreads through the whole code base.
- It's not uncommon that code paths forget to deal with multi-tenancy.
- Support for tenant-specific config options need to be implemented manually instead of just being able to configure (almost) anything per tenant automatically.
Often breaks: due to the previously mentioned problems, the low number of users and the fact that hardly any developer tests stuff in multi-tenancy mode, Opencast updates for multi-tenancy installation are spicy! There is a decent chance something breaks in multi-tenancy.
The world has changed: containerization and things like Kubernetes are here and promise to solve essentially the problem that multi-tenancy attempts to solve. Without adding complexity to our software.

Reason for multi-tenancy

And yet, some adopters use multi-tenancy. Usually one tenant per department of your university/organization, or as a "Opencast as a service" provider, one tenant per organization/university.

Sharing resources
- Sharing workers: Allocating enough workers to deal with peak demand for each tenant is often a waste. Sharing workers often allows fewer workers overall.
- Sharing idle-resources: Opencast eats a decent amount of memory and some disk space just for running idly. When dealing with many small tenants, duplicating this for each can be costly.
"Getting a new VM from my data center is annoying and takes forever"
... comment with additional reasons!

Possible paths forward

Containerization and Kubernetes is the obvious solution. There are some potential problems though (which we need to actually dig into instead of just saying "sounds about right" and giving up on the idea):

Letting containers use GPUs for encoding may be difficult (someone in the meeting mentioned a solution, who was that again?)
Opencast lacks good support for dynamically scaling up workers via Kubernetes (right?).
- Maybe "dumb workers" can improve the situation as well
Does not immediately help with idle resources (-> Make Opencast use less memory?!)
Kubernetes is not trivial to setup (-> can good resources in our docs help?)

But if we don't want to go the full way and lift that logic completely out of Opencast, we could still improve Opencast's implementation of it and fix lots of problems by lifting the multi-tenancy logic up a bit. The core problem is that multi-tenancy is handled at the very bottom by every piece of code. For example, events have an organization column in the DB and all code handling events has to potentially check that field to filter for events of only a specific tenant. And that's easy to forget.

We could instead have one database (not DBMS!) per tenant. And also have one separate folder in the storage for each tenant. Then almost all code can pretend like there is only one tenant, not scattering multi-tenancy logic everywhere and also vastly improving the isolation between tenants.

mtneug · 2025-05-15T21:28:47Z

mtneug
May 15, 2025
Maintainer

The world has changed: containerization and things like Kubernetes are here and promise to solve essentially the problem that multi-tenancy attempts to solve. Without adding complexity to our software.

Why does this always come up as the silver bullet? Containerization/Kubernetes is the solution only when you also tightly integrate with this ecosystem. Only having a simple way to run side-by-side on the same machine is not the reason for multi-tenancy (at least for me). And simply putting stuff in containers doesn't mean it runs well in this environment.

So, just to paint the picture, at the height of educast.nrw we had 17 tenants (with hope to scale further). The main reason we decided to use multi-tenancy back then and the reason why I now also use it at shio solutions for tales.media is to avoid the overhead each individual Opencast cluster comes with. And this in multiple ways:

Opencast is a heavy Java application. We run custom distributions that can best be compared to adminpresentation and worker. Additionally, we have a scheduler distribution that extracts job dispatching so we can run adminpresentation replicated. Each Opencast cluster would need to run at least one node from each of these distributions. Not only are workers not shared, but the memory and CPU requirements for the other node types also pile up. We run the adminpresentation node twice with 4 CPU cores and 4 GB RAM (so 8 CPU cores and 8 GB RAM in total). scheduler (by definition not replicated) has 1 CPU core and 1 GB RAM. Simply replicating this 17 times means 153 CPU cores and 153 GB RAM! These are obviously not the actual needed resources, and you would set limits differently for single-tenant clusters, but the point still stands. Although I would imagine that people running classical VMs actually would use so many resources (can elan e.V. comment?). Also, the thing is that it's fairly easy to overprovision CPU usage (it's elastic in the sense that Kubernetes/Linux will simply throttle the application). This is not true for memory, and the recommendation is not to overprovision on RAM. Otherwise, things can get unstable and OOM kills are more likely.
Not sharing workers, as you already mentioned, is another huge disadvantage. Sure, you could autoscale in Kubernetes, but even then, workers would not necessarily fully utilize the available hardware (e.g. spin up a worker for each tenant to only run one workflow on each tenant).
Upgrading to a new major Opencast version can be very time-consuming. Yes, we have automated configuration management yada yada. This is not the problem. The really time-consuming thing for me is monitoring an index rebuild, fearfully hoping there are no problems, but then having to fix some issues anyway. Sure, this is not a great situation anyway, but it's a sad reality. I don't want to do it n times.

I don't care for multi-tenancy if the overhead is minimal. We don't yet run Tobira, but I imagine it's very low on resource usage and has other properties that are very advantageous for containerized workloads. For Opencast to fall in this category, it would need to have the following properties:

way lower memory / CPU requirements: see points above
way faster restarts (like ready in 1-10s): Kubernetes environments are typically very dynamic. If you want, servers can be scaled/replaced automatically, stopping workloads on those nodes. Workloads ideally move to other servers quickly.
allow replicated instances for all distributions: point above. If workloads are stopped on one server, instances on other servers need to handle the work to avoid downtime.
reliable easy upgrades: self-explanatory
have a solution for sharing resources for Opencast jobs: We could tightly integrate with Kubernetes and run Kubernetes jobs, i.e. get rid of long-running Opencast workers. This would actually give you the advantages Kubernetes comes with, e.g. you could run the Kubernetes node autoscaler which can scale the number of servers automatically based on the number of jobs yada yada. The disadvantage is that we then have a hard requirement on Kubernetes (but it's the silver bullet, right?). We could also have long-running workers that are shared by multiple Opencast clusters.

We have run Opencast in containers for a very long time (started in 2.1.x times) and have obviously seen advantages in doing so. But Opencast is not the "ideal" workload for today's cluster solutions. Improving on these properties would make it more reasonable to run single-tenant systems.

Now, some comments on the mentioned problems:

Letting containers use GPUs for encoding may be difficult (someone in the meeting mentioned a solution, who was that again?)

Kubernetes' scheduler supports multi-resource scheduling and has explicit support for GPUs with additional components. I personally have not yet configured a Kubernetes Cluster with GPU support (educast.nrw was shut down before we came to it), but my University has a managed Kubernetes Cluster with GPU workloads. However, non-datacenter GPUs might be a problem (NVIDIA doesn't want you to run them anyway).

Opencast lacks good support for dynamically scaling up workers via Kubernetes (right?).

Well, you can make dynamic scaling work, but making use of all available hardware resources is challenging, as mentioned above.

Maybe "dumb workers" can improve the situation as well

I had some ideas around this as well. Basically, pushing everything forward to Kubernetes jobs. Or something like the GitLab CI systems with multiple implementations for runner environments.

In my research, I actually work on a prototype system for cloud and edge-native multimedia workflows. I thought of making this more production-ready after my PhD and having an Opencast operation that pushes workflows to this system 🙈.

Does not immediately help with idle resources (-> Make Opencast use less memory?!)

Yes.

Kubernetes is not trivial to setup (-> can good resources in our docs help?)

No. Kubernetes already has good documentation, but a production-ready Kubernetes cluster is hard anyway. There are way too many environments, from on-prem to cloud to edge, to provide a common documentation. I don't see Opencast Matrix giving advice on setting up Kubernetes.

We also have to consider that getting familiar with Kubernetes to the point of managing workloads is one thing. Actually, setting up and maintaining a Kubernetes cluster is a whole different thing. I'm therefore very torn on forcing the whole community in this direction.

But if we don't want to go the full way and lift that logic completely out of Opencast, we could still improve Opencast's implementation of it and fix lots of problems by lifting the multi-tenancy logic up a bit. The core problem is that multi-tenancy is handled at the very bottom by every piece of code. For example, events have an organization column in the DB and all code handling events has to potentially check that field to filter for events of only a specific tenant. And that's easy to forget.

We could instead have one database (not DBMS!) per tenant. And also have one separate folder in the storage for each tenant. Then almost all code can pretend like there is only one tenant, not scattering multi-tenancy logic everywhere and also vastly improving the isolation between tenants.

Asset manager is actually separated into different folders and you can now use different S3 buckets. But until recently, assets could be hard linked between orgs.

Having one process connect to different databases sounds somehow wrong to me, but I don't really have strong arguments against it. It's more like not going all the way to getting rid of multi-tenancy.

1 reply

LukasKalbertodt Feb 4, 2026
Maintainer Author

Super late answer but: thanks for the insights! You convinced me that running OC in Kubernetes the way I imagined is not viable today nor anytime soon.

I will move this discussion to the data-model repo, to discuss concretely how to implement this well in OC. Opening the discussion here was meant to attract other people running multi-tenancy, but clearly that didn't happen...

LukasKalbertodt · 2026-02-04T18:28:18Z

LukasKalbertodt
Feb 4, 2026
Maintainer Author

Moving discussion to data-model repo: opencast/data-model#10

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opencast

The future of multi-tenancy #6710

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Opencast

The future of multi-tenancy #6710

Uh oh!

LukasKalbertodt May 15, 2025 Maintainer

Problems & reasons to get rid of it

Reason for multi-tenancy

Possible paths forward

Replies: 2 comments · 1 reply

Uh oh!

mtneug May 15, 2025 Maintainer

Uh oh!

LukasKalbertodt Feb 4, 2026 Maintainer Author

Uh oh!

LukasKalbertodt Feb 4, 2026 Maintainer Author

LukasKalbertodt
May 15, 2025
Maintainer

Replies: 2 comments 1 reply

mtneug
May 15, 2025
Maintainer

LukasKalbertodt Feb 4, 2026
Maintainer Author

LukasKalbertodt
Feb 4, 2026
Maintainer Author