-
Notifications
You must be signed in to change notification settings - Fork 23
Gateway segfault when stressing the system #5
Description
I have 4 machines each with 12 CPU cores and 64GB RAM. I deploy the Nightcore gateway on one of them and on each of the other three, I deploy an instance of the engine and a launcher for a hello-world function.
I have 3 other machines which act as clients and invoke the hello-world function by sending http POST requests to the gateway. The segfault occurs only when there are a large number of client threads (10 or 14 client threads on each client machine). What happens is that in the middle of the experiment, the gateway returns the following error in the log:
When I first encountered the problem, the segfault happened at uv__count_bufs but in my latest attempt to produce the error I got the following message from gdb:
Thread 3 "gateway" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7228700 (LWP 275299)]
__GI___libc_free (mem=0xffffffff00000000) at malloc.c:3102
and in the tail of the log, there was this message:
3102 malloc.c: No such file or directory.
Any ideas as to what causes this problem and how to resolve it?
Some more info that might be useful: When I deploy the engine and launcher on only 2 other machines (instead of 3), then this error does not show up regardless of the number of client threads I use to stress the system. The minWorkers and maxWorkers config parameters for the function are 20 and 80 respectively.