Skip to content

Cluster split implementation expects cached data to be fully replicated. #61

@guusdk

Description

@guusdk

This plugin introduces an implementation of org.jivesoftware.util.cache.Cache<K,V> that is backed by a Hazelcast-provided distributed data structure. Much of the clustering-related functionality depends on data in caches of this implementation to be shared amongst cluster nodes. For example: one node puts something in the cache, another node can access that data from a locally instantiated cache with the same name. The data is "synchronized under water" by Hazelcast.

Pretty much all caches, as well as the default, as defined in https://github.com/igniterealtime/openfire-hazelcast-plugin/blob/master/classes/hazelcast-cache-config.xml use a Hazelcast Map as the data structure that is used to back our Cache implementation. All of them seem to use a backup-count of 1. This means that data added by a node will be replicated to one other node. This is largely hidden from all usages of the Cache, as the data will be available on all nodes, even if the cluster is larger than 2 nodes. Nodes on which the data is accessed, but don't have it, will obtain it through Hazelcast-provided magic.

Although during normal run-time, the data is accessible to all cluster nodes (as described above), there does not seem to be a guarantee that all data is, at all times, readily available on all cluster nodes when the cluster is larger than two nodes. It is probably reasonable to assume that this is almost guaranteed to not be the case in larger clusters, as by default, the data will live on one node, and be backed-up to another. I'm not sure if there at times are more copies, but it's probably safe to assume that Hazelcast will eventually retain data on only two nodes.

Much of the code in Openfire that is executed when a cluster node drops out of the cluster (notably, implementations of org.jivesoftware.openfire.cluster.ClusterEventListener) is written with the expectation that all data is available on the local cluster node, for at least a number of (and possibly all) caches. This seems to be an error.

To illustrate: the following was observed (repeatedly) during testing: On a three-node cluster, where a different client/user that is subscribed to the presence of all other users was connected to each node, the senior node was disconnected from the cluster (I'm unsure if seniority is important). It is important to realize that at that point, Hazelcast won't be able to look up cache entries 'online'. Whatever it has on the local node will be what it can deal with - all other data is lost. The expected behavior would be that the client connected to the local node would receive a presence unavailable for its two contacts, by virtue of the routing table being cleaned up, having recognized that those two routes are now no longer available. In practise, we did not always see this happen. Often, we'd get presence unavailable for only one contact, instead of both.

We believe that what's going on here is that the disconnected node iterates over (or is otherwise dependent of - see Remark A below) on the routing table to send the presence unavailable's for the clients on the now unreachable cluster nodes. As there is no guarantee that all data exists on all cluster nodes, this might go wrong.

(I've actually provided a simplified description of the scenario that was used during testing: the test scenario that was actually used involved MUC. The 'offline' presence that is expected would be picked up by the conference service, to be broadcast to all local occupants. I believe that this nuance is not important)

A confusing characteristic of the implementation is that there seems to be overlap in the implementation of org.jivesoftware.openfire.plugin.util.cache.ClusterListener in the Hazelcast plugin (notably its cleanup routines) and the implementation of the org.jivesoftware.openfire.cluster.ClusterEventListener interface in various parts of the Openfire code base (such as RoutingTableImpl).

The issue described here will probably not be very apparent in a test environment, when the test environment does not consist of at least three cluster nodes. The default backup count of 1 will effectively cause replication "to the entire cluster" if the cluster consists of no more than two nodes.

Remark A: While penning this text, I have started to wonder if the problem described above (not guaranteed to have cluster data after a sudden disconnect) is what the lookupJIDList method in Hazelcast's ClusterListener class is trying to work around. That implementation is using a Hazelcast EntryListener (as implemented in the S2SCacheListener in that same ClusterListener definition) to keep track of all JIDs in many caches. If these EntryListeners get fired whenever data is modified on any cluster node, then this could ensure that at least some data-modification (it only seems to store JIDs) is guaranteed to be registered on all nodes as soon as that occurs on any node. If that's done synchronously, even better, but I have not checked yet. With this, and having identified the apparent duplication of code in Hazelcast's ClusterListener and some Openfire implementations of ClusterEventListener, combined with the fact that for some Caches, it'd make sense to me to update them 'last' (or at the very least in a predictable order), I wonder if a sensible course of action would be to remove code from Openfire, and have (only) the Hazelcast plugin be responsible for things like maintaining the RoutingTable state.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions