Skip to content

Gateway segfault when stressing the system #5

@suraj44

Description

@suraj44

I have 4 machines each with 12 CPU cores and 64GB RAM. I deploy the Nightcore gateway on one of them and on each of the other three, I deploy an instance of the engine and a launcher for a hello-world function.

I have 3 other machines which act as clients and invoke the hello-world function by sending http POST requests to the gateway. The segfault occurs only when there are a large number of client threads (10 or 14 client threads on each client machine). What happens is that in the middle of the experiment, the gateway returns the following error in the log:

When I first encountered the problem, the segfault happened at uv__count_bufs but in my latest attempt to produce the error I got the following message from gdb:

Thread 3 "gateway" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7228700 (LWP 275299)]
__GI___libc_free (mem=0xffffffff00000000) at malloc.c:3102

and in the tail of the log, there was this message:

3102    malloc.c: No such file or directory.

Any ideas as to what causes this problem and how to resolve it?

Some more info that might be useful: When I deploy the engine and launcher on only 2 other machines (instead of 3), then this error does not show up regardless of the number of client threads I use to stress the system. The minWorkers and maxWorkers config parameters for the function are 20 and 80 respectively.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions