Skip to content

Perf: use GPU-only searchsorted instead of numpy.repeat#253

Open
eriknw wants to merge 2 commits intorapidsai:mainfrom
eriknw:cp_repeat_workaround
Open

Perf: use GPU-only searchsorted instead of numpy.repeat#253
eriknw wants to merge 2 commits intorapidsai:mainfrom
eriknw:cp_repeat_workaround

Conversation

@eriknw
Copy link
Copy Markdown
Contributor

@eriknw eriknw commented Apr 2, 2026

The GPU-only searchsorted for doing indptr-to-COO expansion (which eliminates D2H transfers) is faster than numpy.repeat.

cupy.repeat does not yet support ndarray as repeats argument, so searchsorted is the appropriate recipe to do instead for most data. For very large data, a "cumsum+scatter" approach may be faster. cp.repeat is being updated in cupy/cupy#9828

Example speed improvement:

  • from_csr (1M nodes, 20M edges): 2.4ms vs 72ms (30x faster)

This PR was motivated by work I am doing in cupy/cupy#9825

eriknw added 2 commits April 2, 2026 14:56
The GPU-only searchsorted for doing indptr-to-COO expansion
(which eliminates D2H transfers) is faster than numpy.repeat.

`cupy.repeat` does not yet support ndarray as `repeats` argument,
so searchsorted is the appropriate recipe to do instead for most
data. For very large data, a "cumsum+scatter" approach may be faster.
`cp.repeat` is being updated in cupy/cupy#9828

Example speed improvement:
  from_csr (1M nodes, 20M edges): 2.4ms vs 72ms (30x faster)

This PR was motivated by work I am doing in cupy/cupy#9825
@eriknw eriknw added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant