-
Notifications
You must be signed in to change notification settings - Fork 31
Setting up Cluster with Multiple Nodes - Segmentation Fault #25
Copy link
Copy link
Open
Description
Hi,
Setup
I am trying to set up a simple cluster with 2 nodes. These are the network interfaces of each node:
- Node 1
eno33: 128.110.219.19
enp65s0f0: 10.10.1.2
- Node 2
eno33: 128.110.219.27
enp65s0f0: 10.10.1.3
In each of these node, I set g_n_hot_rep to 2 and RPC interface to
static struct peer_id hot_replicas[g_n_hot_rep] = {
{ .ip = "10.10.1.2", .role = HOT_REPLICA, .type = KERNFS_PEER},
{ .ip = "10.10.1.3", .role = HOT_REPLICA, .type = KERNFS_PEER},
};
I run KernFS starting from the node that has 10.10.1.3 as its interface.
Result
I received a segmentation fault
initialize file system
dev-dax engine is initialized: dev_path /dev/dax0.0 size 8192 MB
Reading root inode with inum: 1fetching node's IP address..
Process pid is 4013
ip address on interface 'ib0' is 10.10.1.2
cluster settings:
--- node 0 - ip:10.10.1.2
--- node 1 - ip:10.10.1.3
Connecting to KernFS instance 1 [ip: 10.10.1.3]
./run.sh: line 15: 4013 Segmentation fault LD_LIBRARY_PATH=../build:../../libfs/lib/nvml/src/nondebug/ LD_PRELOAD=../../libfs/lib/jemalloc-4.5.0/lib/libjemalloc.so.2 MLFS_PROFILE=1 numactl -N0 -m0 $@
Debugging
After debugging, it looks like the segmentation fault comes in libfs/lib/rdma/agent.c line 96 and line 130, the rdma_cm_id struct after rdma_create_id is NULL.
I also run the filesystem as a local file system, where g_n_hot_rep = 1 and RPC interface is set to localhost, and it works
Do you mind helping me with this problem? Thank you very much!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels