Skip to content

[BUG] cudf::left_anti_join fails with a signal error (SIGABRT) instead of throwing an exception when there is an OOM condition #16059

@aocsa

Description

@aocsa

Describe the bug

When an out-of-memory (OOM) condition occurs, cudf::left_anti_join fails with a signal error (SIGABRT) instead of throwing an appropriate exception (std::bad_alloc).

Steps/Code to reproduce bug

TEST(CudfTest, LeftAntiJoinOOM) {
  rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource();
  auto pool_mr = std::make_shared<rmm::mr::pool_memory_resource<rmm::mr::device_memory_resource>>(mr, 256, 2560);
  rmm::mr::set_current_device_resource(pool_mr.get());

  auto make_table = [](int32_t size, int32_t start) -> std::unique_ptr<cudf::table> {
    auto sequence_column = cudf::sequence(size, cudf::numeric_scalar<int32_t>(start));

    std::vector<std::unique_ptr<cudf::column>> columns;
    columns.push_back(std::move(sequence_column));
    return std::make_unique<cudf::table>(std::move(columns));
  };

  try {
    auto left = make_table(64, 0);
    auto right = make_table(128, 50);

    std::cerr << "left size: " << left->num_rows() << ", right size: " << right->num_rows() << "\n";
    std::unique_ptr<rmm::device_uvector<cudf::size_type>> left_indices =
        cudf::left_anti_join(left->view(), right->view());

    std::cerr << "done left_anti_join " << "\n";

  } catch(const std::exception& e) {
    std::cerr << "Caught exception: " << e.what() << "\n";
  }
}
left size: 64, right size: 128
terminate called after throwing an instance of 'rmm::out_of_memory'
  what():  std::bad_alloc: out_of_memory: RMM failure at:/home/alexander/envs/theseus_dev/include/rmm/mr/device/pool_memory_resource.hpp:313: Maximum pool size exceeded
Aborted (core dumped)

Running this test produces a SIGABRT (Abort signal) instead of catching a std::bad_alloc exception:

Expected behavior

The function should throw a std::bad_alloc exception which can be caught and handled gracefully, instead of terminating the program with a signal error.

Environment details

Method of cuDF install: source code
v24.06.00 branch release

Additional context

After debugging the internal functions utilized in cudf::left_anti_join, I determined that the cudf::detail::contains call is failing.

auto const flagged = cudf::detail::contains(right_keys,

rmm::device_uvector<bool> contains(table_view const& haystack,

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions