cpd: Add Go/rpk based throughput benchmark#29772
cpd: Add Go/rpk based throughput benchmark#29772StephanDollberg wants to merge 5 commits intodevfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new rpk benchmark CLI command (Go/franz-go based) and integrates it into the ducktape/CPD test framework via a new RpkBenchmarkService, plus a perf test and a service smoke test.
Changes:
- Add
rpk benchmarkcommand to create a topic, optionally wait for balanced leadership, run a producer workload, and emit JSON metrics. - Add
RpkBenchmarkServiceducktape wrapper that runs the benchmark remotely and writes aresult.jsonfor CPD. - Add a perf test and a basic service self-test for the new benchmark wrapper.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/rptest/tests/rpk_benchmark_service_test.py | New smoke test validating the ducktape benchmark wrapper produces usable metrics. |
| tests/rptest/services/rpk_benchmark_service.py | New ducktape service wrapper to execute rpk benchmark and collect metrics. |
| tests/rptest/services/redpanda.py | Add return type annotation to find_binary. |
| tests/rptest/perf/rpk_benchmark_test.py | New perf/CPD test that runs the benchmark and writes result metrics. |
| src/go/rpk/pkg/cli/root.go | Registers the new benchmark subcommand with the root CLI. |
| src/go/rpk/pkg/cli/benchmark/benchmark.go | Implements the benchmark CLI command, topic lifecycle, leadership check, workload, and metrics output. |
| src/go/rpk/pkg/cli/benchmark/BUILD | Bazel target for the new benchmark command package. |
| src/go/rpk/pkg/cli/BUILD | Wires the new benchmark package into the CLI build deps. |
d5dbb9c to
39e9161
Compare
CI test resultstest results on build#81480
test results on build#81485
test results on build#81486
|
34cfa88 to
6e3351b
Compare
graham-rp
left a comment
There was a problem hiding this comment.
Couple of ergonomic concerns, but I think the in-the-weeds logic looks good. Back a few steps though, have you thought about a more interactive version of this? I'm thinking it could be a pretty good use of timers, spinner, etc
| warmup_s=20, | ||
| duration_s=60, |
There was a problem hiding this comment.
do we need the full 80s during tests?
There was a problem hiding this comment.
Yes, smoothes out some of the noise.
|
|
||
| cmd := &cobra.Command{ | ||
| Use: "benchmark", | ||
| Short: "Run a Kafka benchmark", |
There was a problem hiding this comment.
we could probably use a Long: set of instructions here
| cmd.Flags().Int32VarP(&partitions, "partitions", "p", 18, "Number of partitions for benchmark topic creation") | ||
| cmd.Flags().Int16VarP(&replicas, "replicas", "r", 3, "Replication factor for benchmark topic creation") | ||
| cmd.Flags().IntVar(&clients, "clients", 16, "Number of producer client connections") | ||
| cmd.Flags().IntVar(&recordSize, "record-size", 100, "Record payload size in bytes") |
There was a problem hiding this comment.
out of curiosity, why these numbers?
There was a problem hiding this comment.
It's a small batch size high RPS workload which generally stresses brokers the most.
| cmd.Flags().IntVar(&recordSize, "record-size", 100, "Record payload size in bytes") | ||
| cmd.Flags().IntVar(&warmupS, "warmup", 10, "Warmup duration in seconds") | ||
| cmd.Flags().IntVar(&durationS, "duration", 60, "Measurement duration in seconds") | ||
| cmd.Flags().StringVar(&metricsJSON, "metrics-json", "", "Optional path to write final metrics JSON") |
There was a problem hiding this comment.
I'm wondering if there's a way to get this into the regular --format type structure. Maybe suppress output if --format=json?
There was a problem hiding this comment.
Let me see what that does. Having output would still be useful for debugging purposes. Only the final "stats" are really needed in json for tool usage.
There was a problem hiding this comment.
Right, yeah had a look and I am not sure it's 100% fit. I prefer still having status output.
| }(cl) | ||
| } | ||
|
|
||
| ticker := time.NewTicker(time.Second) |
There was a problem hiding this comment.
Instead of having the ticker and printStats switching between phases, I'm thinking that using time.After to start the second phase may be more direct
There was a problem hiding this comment.
I am not entirely following. The ticker here in the end is really just for periodic printing.
The second/after-warmup phase is started like you suggest using time.After/Before in both the printer and metrics collection in the produceloop.
| if final { | ||
| prefix = "final " | ||
| } | ||
| fmt.Printf("%srequests/s=%.2f MB/s=%.2f errors=%d\n", prefix, m.RequestsPerSec, m.MBPerSec, m.Errors) |
There was a problem hiding this comment.
I think the canonical way to do this would be out.NewTable
| topic string, | ||
| payload []byte, | ||
| measureStart time.Time, | ||
| stats *stats, |
There was a problem hiding this comment.
$0.02: I think this sort of violates the ideas of CSP. It'd probably be more idiomatic to send results down a channel, but I think it's probably fine here?
There was a problem hiding this comment.
Channels are quite expensive for what is basically two atomnic increments.
For now this mostly an internal (hidden) tool with the benefit of it being in rpk that it's basically everywhere available where PR is deployed. We are still considering making it a separate app. Later we might decided to make it generally available at which point we might be able to tune visuals. |
Adds a (hidden) benchmark subcommand. For now it's only a simple produce RPS throughput benchmark. Right now we mostly use OMB. However OMB has the disadvantage that it's basically just latency benchmark while we are really missing a throughput benchmark. Equally rdkafka_performance has the disadvantage that the code is a bit rough but more importantly it's single connection and librdkafka suffers from kafka client C10K problem. Using the go client (as is done rpk) avoids this issue.
Adds a service class for the new rpk subcommand. Note this command potentially runs for longer so can't be part of the normal rpk commands (this would run into DT timeouts otherwise).
Add a CPD produce RPS throughput test. We keep the OMB result.json result file which means the results get automatically ingested into perf db. Mostly optimized for noise. NB. reactor util is not at 100% here.
6e3351b to
51a0920
Compare
Adds a produce throughput benchmark to rpk and adds a CPD test for it.
Note this still uses two shards (which introduces some inherent noise due to unlucky connection shard placement). To fix that we might need to add a custom load balancer thingy to redpanda (to shuffle produce/consume connections correctly).
Backports Required
Release Notes