Skip to content

Launching dotnet using mono on macOS will hang if dotnet launches processes #55645

@rolfbjarne

Description

@rolfbjarne

Description

Repro:

Here's a test case: signalbug-521a1b6.zip

  • Download & extract
  • Run make mono to repro: this will execute csharp which will execute dotnet test (this will hang)
$ make mono  
mono --version
Mono JIT compiler version 6.12.0.140 (2020-02/51d876a041e Thu Apr 29 10:44:55 EDT 2021)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
	TLS:           
	SIGSEGV:       altstack
	Notification:  kqueue
	Architecture:  amd64
	Disabled:      none
	Misc:          softdebug 
	Interpreter:   yes
	LLVM:          yes(610)
	Suspend:       hybrid
	GC:            sgen (concurrent by default)
csharp -e 'System.Diagnostics.Process.Start ("dotnet", "test tests.csproj").WaitForExit ();'
  Determining projects to restore...
  Restored /Users/rolf/test/dotnet/signalbug/tests.csproj (in 950 ms).
  You are using a preview version of .NET. See: https://aka.ms/dotnet-core-preview
  tests -> /Users/rolf/test/dotnet/signalbug/bin/Debug/net6.0/tests.dll
Test run for /Users/rolf/test/dotnet/signalbug/bin/Debug/net6.0/tests.dll (.NETCoreApp,Version=v6.0)
Microsoft (R) Test Execution Command Line Tool Version 17.0.0-preview-20210712-03
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
No test is available in /Users/rolf/test/dotnet/signalbug/bin/Debug/net6.0/tests.dll. Make sure that test discoverer & executors are registered and platform & framework version settings are appropriate and try again.

Additionally, path to test adapters can be specified using /TestAdapterPath command. Example  /TestAdapterPath:<pathToCustomAdapters>.
[... and it hangs here, nothing else happens...]
  • Run make dotnet to run dotnet test directly (which works just fine)
$ make dotnet
dotnet --version
6.0.100-preview.7.21364.4
dotnet test
  Determining projects to restore...
  Restored /Users/rolf/test/dotnet/signalbug/tests.csproj (in 965 ms).
  You are using a preview version of .NET. See: https://aka.ms/dotnet-core-preview
  tests -> /Users/rolf/test/dotnet/signalbug/bin/Debug/net6.0/tests.dll
Test run for /Users/rolf/test/dotnet/signalbug/bin/Debug/net6.0/tests.dll (.NETCoreApp,Version=v6.0)
Microsoft (R) Test Execution Command Line Tool Version 17.0.0-preview-20210712-03
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
No test is available in /Users/rolf/test/dotnet/signalbug/bin/Debug/net6.0/tests.dll. Make sure that test discoverer & executors are registered and platform & framework version settings are appropriate and try again.

Additionally, path to test adapters can be specified using /TestAdapterPath command. Example  /TestAdapterPath:<pathToCustomAdapters>.
[this command completes successfully]

I did some debugging, and the difference is that the signal handler for SIGCHLD is different when dotnet is launched from a mono process.

Soon after launch, this is what I get when checking the signal handler for SIGCHLD:

# allocate some memory
(lldb) p (void *) malloc (40)
(void *) $0 = 0x00007fed89c04080
# set a dummy value for that memory
(lldb) expr ((void**)$0)[0] = (void*) 0xdeadf00ddeadf00d
(void *) $1 = 0xdeadf00ddeadf00d
# call sigaction to get the existing signal handler
(lldb) p (int) sigaction (20, 0, $0)
(int) $2 = 0
# inspect the result
(lldb) x/4wx $0
0x7fed89c04080: 0x00000000 0x00000000 0x00000000 0x00000042

The sigaction struct is 16 bytes, where the first 8 bytes are sa_handler, the next 4 bytes are sa_mask, and the final 4 bytes are sa_flags.

This means that:

sa_handler: 0x0 (SIG_DFL)
sa_mask: 0
sa_flags = 0x42 (SA_SIGINFO | SA_RESTART)

man sigaction says this about SA_SIGINFO: "This bit should not be set when assigning SIG_DFL or SIG_IGN.", so the behavior I'm seeing does not follow the spec. That said, if I attach to the mono process, the SIGCHLD handler is very different:

(lldb) p (void *) malloc (40)
(void *) $0 = 0x00007fd756468d50
(lldb) expr ((void**)$0)[0] = (void*) 0xdeadf00ddeadf00d
$1 = 0xdeadf00ddeadf00d
(lldb) p (int) sigaction (20, 0, $0)
(int) $2 = 0
(lldb) x/4wx $0
0x7fd756468d50: 0x06377930 0x00000001 0x00000000 0x0000004a

so I have no idea why the initial SIGCHLD handler is different in dotnet when dotnet is launched from mono.

The end result is that this will crash:

assert(origHandler->sa_sigaction);
origHandler->sa_sigaction(sig, siginfo, context);

and things will go badly from there:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000000000000
    frame #1: 0x00000001067e17a1 libcoreclr.dylib`sigsegv_handler(int, __siginfo*, void*) + 49
    frame #2: 0x00007fff205d5d7d libsystem_platform.dylib`_sigtramp + 29
    frame #3: 0x0000000000000001
    frame #4: 0x000000010767c4d5 libSystem.Native.dylib`SignalHandler + 101
    frame #5: 0x00007fff205d5d7d libsystem_platform.dylib`_sigtramp + 29
    frame #6: 0x00007fff2055dcdf libsystem_kernel.dylib`__psynch_cvwait + 11
    frame #7: 0x00007fff20590e49 libsystem_pthread.dylib`_pthread_cond_wait + 1298
    frame #8: 0x000000010680e05b libcoreclr.dylib`CorUnix::CPalSynchronizationManager::ThreadNativeWait(CorUnix::_ThreadNativeWaitData*, unsigned int, CorUnix::ThreadWakeupReason*, unsigned int*) + 315
    frame #9: 0x000000010680dd2a libcoreclr.dylib`CorUnix::CPalSynchronizationManager::BlockThread(CorUnix::CPalThread*, unsigned int, bool, bool, CorUnix::ThreadWakeupReason*, unsigned int*) + 458
    frame #10: 0x00000001068125aa libcoreclr.dylib`CorUnix::InternalWaitForMultipleObjectsEx(CorUnix::CPalThread*, unsigned int, void* const*, int, unsigned int, int, int) + 1946
    frame #11: 0x0000000106812892 libcoreclr.dylib`WaitForMultipleObjectsEx + 82
    frame #12: 0x000000010691871e libcoreclr.dylib`Thread::DoAppropriateWaitWorker(int, void**, int, unsigned int, WaitMode) + 734
    frame #13: 0x00000001069139d0 libcoreclr.dylib`Thread::DoAppropriateWait(int, void**, int, unsigned int, WaitMode, PendingSync*) + 48
    frame #14: 0x000000010697b013 libcoreclr.dylib`WaitHandleNative::CorWaitOneNative(void*, int) + 179
    frame #15: 0x000000010ce9242b
    frame #16: 0x000000010ce9308c
    frame #17: 0x000000010f374718
    frame #18: 0x000000010f373e24
    frame #19: 0x000000010f3727d3
    frame #20: 0x000000010f371d79
    frame #21: 0x000000010db2a873
    frame #22: 0x000000010d8455c8
    frame #23: 0x000000010d8256b7
    frame #24: 0x000000010d83ad71
    frame #25: 0x000000010d867cb5
    frame #26: 0x000000010d867191
    frame #27: 0x0000000106b02f29 libcoreclr.dylib`CallDescrWorkerInternal + 124
    frame #28: 0x0000000106954cc8 libcoreclr.dylib`MethodDescCallSite::CallTargetWorker(unsigned long const*, unsigned long*, int) + 1496
    frame #29: 0x0000000106837ad8 libcoreclr.dylib`RunMain(MethodDesc*, short, int*, PtrArray**) + 776
    frame #30: 0x0000000106837dfb libcoreclr.dylib`Assembly::ExecuteMainMethod(PtrArray**, int) + 395
    frame #31: 0x000000010686aaec libcoreclr.dylib`CorHost2::ExecuteAssembly(unsigned int, char16_t const*, int, char16_t const**, unsigned int*) + 508
    frame #32: 0x0000000106821164 libcoreclr.dylib`coreclr_execute_assembly + 196
    frame #33: 0x000000010677f5f1 libhostpolicy.dylib`run_app_for_context(hostpolicy_context_t const&, int, char const**) + 1313
    frame #34: 0x0000000106780511 libhostpolicy.dylib`corehost_main + 241
    frame #35: 0x000000010670d42e libhostfxr.dylib`fx_muxer_t::handle_exec_host_command(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, host_startup_info_t const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::unordered_map<known_options, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, known_options_hash, std::__1::equal_to<known_options>, std::__1::allocator<std::__1::pair<known_options const, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > > > const&, int, char const**, int, host_mode_t, char*, int, int*) + 1550
    frame #36: 0x000000010670ca01 libhostfxr.dylib`fx_muxer_t::handle_cli(host_startup_info_t const&, int, char const**, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 1457
    frame #37: 0x000000010670c2c6 libhostfxr.dylib`fx_muxer_t::execute(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int, char const**, host_startup_info_t const&, char*, int, int*) + 646
    frame #38: 0x0000000106708d38 libhostfxr.dylib`hostfxr_main_startupinfo + 152
    frame #39: 0x00000001066b9c17 dotnet`exe_start(int, char const**) + 1191
    frame #40: 0x00000001066b9ddf dotnet`main + 143
    frame #41: 0x00007fff205abf5d libdyld.dylib`start + 1
    frame #42: 0x00007fff205abf5d libdyld.dylib`start + 1

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions