Skip to content

Setting up Cluster with Multiple Nodes - Segmentation Fault #25

@agnesnatasya

Description

@agnesnatasya

Hi,

Setup

I am trying to set up a simple cluster with 2 nodes. These are the network interfaces of each node:

  1. Node 1

eno33: 128.110.219.19
enp65s0f0: 10.10.1.2

  1. Node 2

eno33: 128.110.219.27
enp65s0f0: 10.10.1.3

In each of these node, I set g_n_hot_rep to 2 and RPC interface to

static struct peer_id hot_replicas[g_n_hot_rep] = {                                                         
 { .ip = "10.10.1.2", .role = HOT_REPLICA, .type = KERNFS_PEER},                                   
 { .ip = "10.10.1.3", .role = HOT_REPLICA, .type = KERNFS_PEER},
};

I run KernFS starting from the node that has 10.10.1.3 as its interface.

Result

I received a segmentation fault

initialize file system
dev-dax engine is initialized: dev_path /dev/dax0.0 size 8192 MB
Reading root inode with inum: 1fetching node's IP address..
Process pid is 4013
ip address on interface 'ib0' is 10.10.1.2
cluster settings:
--- node 0 - ip:10.10.1.2
--- node 1 - ip:10.10.1.3
Connecting to KernFS instance 1 [ip: 10.10.1.3]
./run.sh: line 15:  4013 Segmentation fault      LD_LIBRARY_PATH=../build:../../libfs/lib/nvml/src/nondebug/ LD_PRELOAD=../../libfs/lib/jemalloc-4.5.0/lib/libjemalloc.so.2 MLFS_PROFILE=1 numactl -N0 -m0 $@

Debugging

After debugging, it looks like the segmentation fault comes in libfs/lib/rdma/agent.c line 96 and line 130, the rdma_cm_id struct after rdma_create_id is NULL.
I also run the filesystem as a local file system, where g_n_hot_rep = 1 and RPC interface is set to localhost, and it works

Do you mind helping me with this problem? Thank you very much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions