Cluster split implementation expects cached data to be fully replicated.

This plugin introduces [an implementation](https://github.com/igniterealtime/openfire-hazelcast-plugin/blob/master/src/java/org/jivesoftware/openfire/plugin/util/cache/ClusteredCache.java) of `org.jivesoftware.util.cache.Cache<K,V>` that is backed by a Hazelcast-provided distributed data structure. Much of the clustering-related functionality depends on data in caches of this implementation to be shared amongst cluster nodes. For example: one node puts something in the cache, another node can access that data from a locally instantiated cache with the same name. The data is "synchronized under water" by Hazelcast.

Pretty much all caches, as well as the default, as defined in https://github.com/igniterealtime/openfire-hazelcast-plugin/blob/master/classes/hazelcast-cache-config.xml use a [Hazelcast Map](https://docs.hazelcast.com/imdg/3.12/data-structures/map.html) as the data structure that is used to back our Cache implementation. All of them seem to use a `backup-count` of 1. This means that data added by a node will be replicated to one other node. This is largely hidden from all usages of the Cache, as the data will be available on all nodes, even if the cluster is larger than 2 nodes. Nodes on which the data is accessed, but don't have it, will obtain it through Hazelcast-provided magic.

Although during normal run-time, the data is accessible to all cluster nodes (as described above), there does not seem to be a guarantee that all data is, at all times, readily available on all cluster nodes when the cluster is larger than two nodes. It is probably reasonable to assume that this is almost guaranteed to _not_ be the case in larger clusters, as by default, the data will live on one node, and be backed-up to another. I'm not sure if there at times are more copies, but it's probably safe to assume that Hazelcast will eventually retain data on only two nodes.

Much of the code in Openfire that is executed when a cluster node drops out of the cluster (notably, implementations of `org.jivesoftware.openfire.cluster.ClusterEventListener`) is written with the expectation that _all_ data is available on the local cluster node, for at least a number of (and possibly all) caches. This seems to be an error.

To illustrate: the following was observed (repeatedly) during testing: On a three-node cluster, where a different client/user that is subscribed to the presence of all other users was connected to each node, the senior node was disconnected from the cluster (I'm unsure if seniority is important). It is important to realize that at that point, Hazelcast won't be able to look up cache entries 'online'. Whatever it has on the local node will be what it can deal with - all other data is lost. The _expected_ behavior would be that the client connected to the local node would receive a presence unavailable for its two contacts, by virtue of the routing table being cleaned up, having recognized that those two routes are now no longer available. In practise, we did not always see this happen. Often, we'd get presence unavailable for only one contact, instead of both.

We believe that what's going on here is that the disconnected node iterates over (or is otherwise dependent of - see Remark A below) on the routing table to send the presence unavailable's for the clients on the now unreachable cluster nodes. As there is no guarantee that all data exists on all cluster nodes, this might go wrong.

(I've actually provided a simplified description of the scenario that was used during testing: the test scenario that was actually used involved MUC. The 'offline' presence that is expected would be picked up by the conference service, to be broadcast to all local occupants. I believe that this nuance is not important)

A confusing characteristic of the implementation is that there seems to be overlap in the implementation of [`org.jivesoftware.openfire.plugin.util.cache.ClusterListener`](https://github.com/igniterealtime/openfire-hazelcast-plugin/blob/master/src/java/org/jivesoftware/openfire/plugin/util/cache/ClusterListener.java) in the Hazelcast plugin (notably its cleanup routines) and the implementation of the `org.jivesoftware.openfire.cluster.ClusterEventListener` interface in various parts of the Openfire code base (such as `RoutingTableImpl`).

The issue described here will probably not be very apparent in a test environment, when the test environment does not consist of _at least_ three cluster nodes. The default backup count of 1 will effectively cause replication "to the entire cluster" if the cluster consists of no more than two nodes.

Remark A: While penning this text, I have started to wonder if the problem described above (not guaranteed to have cluster data after a sudden disconnect) is what the `lookupJIDList` method in Hazelcast's `ClusterListener` class is trying to work around. That implementation is using a Hazelcast `EntryListener` (as implemented in the `S2SCacheListener` in that same `ClusterListener` definition) to keep track of all JIDs in many caches. If these EntryListeners get fired whenever data is modified on any cluster node, then this could ensure that at least some data-modification (it only seems to store JIDs) is guaranteed to be registered on all nodes as soon as that occurs on any node. If that's done synchronously, even better, but I have not checked yet. With this, and having identified the apparent duplication of code in Hazelcast's `ClusterListener` and some Openfire implementations of `ClusterEventListener`, combined with the fact that for some Caches, it'd make sense to me to update them 'last' (or at the very least in a predictable order), I wonder if a sensible course of action would be to remove code from Openfire, and have (only) the Hazelcast plugin be responsible for things like maintaining the RoutingTable state.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster split implementation expects cached data to be fully replicated. #61

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cluster split implementation expects cached data to be fully replicated. #61

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions