Skip to content

run_actor_network hangs when a failing actor is not connected to all others #931

@Matt711

Description

@Matt711

When one actor in the network fails and shuts down its own channels, any actor that is not connected to those channels has no way to learn that the network is done. It stays blocked on its next channel->receive() or channel->send() waiting for a message or a consumer that will never come.

Reproducer

--- a/cpp/tests/streaming/test_error_handling.cpp
+++ b/cpp/tests/streaming/test_error_handling.cpp
+TEST_F(StreamingErrorHandling, UnconnectedActorCancelledOnFailure) {
+    // Actor A fails immediately.
+    auto ch_b = ctx->create_channel();
+    std::vector<Actor> actors;
+
+    // Actor A: fails immediately.
+    actors.push_back([](Context& ctx) -> Actor {
+        co_await ctx.executor()->schedule();
+        throw std::runtime_error("actor_a_failure");
+    }(*ctx));
+
+    // Actor B is unconnected to A but blocks
+    actors.push_back(
+        [](std::shared_ptr<Channel> ch_b) -> Actor {
+            std::ignore = co_await ch_b->receive();
+        }(ch_b)
+    );
+
+    try {
+        run_actor_network(std::move(actors), ctx);
+        FAIL() << "Expected std::runtime_error";
+    } catch (std::runtime_error const& e) {
+        EXPECT_THAT(e.what(), ::testing::HasSubstr("actor_a_failure"));
+    }
+}

One approach to a fix is to add a method to Context: cancel_network (for example) that broadcasts a shutdown to every channel and memory reservation the context has ever created, and to call it automatically whenever any actor in the network fails.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstreamingPart of the ongoing streaming effort.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions