-
Notifications
You must be signed in to change notification settings - Fork 31
run_actor_network hangs when a failing actor is not connected to all others #931
Copy link
Copy link
Open
Labels
bugSomething isn't workingSomething isn't workingstreamingPart of the ongoing streaming effort.Part of the ongoing streaming effort.
Description
When one actor in the network fails and shuts down its own channels, any actor that is not connected to those channels has no way to learn that the network is done. It stays blocked on its next channel->receive() or channel->send() waiting for a message or a consumer that will never come.
Reproducer
--- a/cpp/tests/streaming/test_error_handling.cpp
+++ b/cpp/tests/streaming/test_error_handling.cpp
+TEST_F(StreamingErrorHandling, UnconnectedActorCancelledOnFailure) {
+ // Actor A fails immediately.
+ auto ch_b = ctx->create_channel();
+ std::vector<Actor> actors;
+
+ // Actor A: fails immediately.
+ actors.push_back([](Context& ctx) -> Actor {
+ co_await ctx.executor()->schedule();
+ throw std::runtime_error("actor_a_failure");
+ }(*ctx));
+
+ // Actor B is unconnected to A but blocks
+ actors.push_back(
+ [](std::shared_ptr<Channel> ch_b) -> Actor {
+ std::ignore = co_await ch_b->receive();
+ }(ch_b)
+ );
+
+ try {
+ run_actor_network(std::move(actors), ctx);
+ FAIL() << "Expected std::runtime_error";
+ } catch (std::runtime_error const& e) {
+ EXPECT_THAT(e.what(), ::testing::HasSubstr("actor_a_failure"));
+ }
+}One approach to a fix is to add a method to Context: cancel_network (for example) that broadcasts a shutdown to every channel and memory reservation the context has ever created, and to call it automatically whenever any actor in the network fails.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstreamingPart of the ongoing streaming effort.Part of the ongoing streaming effort.