Skip to content

cpd: Add Go/rpk based throughput benchmark#29772

Open
StephanDollberg wants to merge 5 commits intodevfrom
stephan/bencher
Open

cpd: Add Go/rpk based throughput benchmark#29772
StephanDollberg wants to merge 5 commits intodevfrom
stephan/bencher

Conversation

@StephanDollberg
Copy link
Member

Adds a produce throughput benchmark to rpk and adds a CPD test for it.

Note this still uses two shards (which introduces some inherent noise due to unlucky connection shard placement). To fix that we might need to add a custom load balancer thingy to redpanda (to shuffle produce/consume connections correctly).

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x

Release Notes

  • none

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new rpk benchmark CLI command (Go/franz-go based) and integrates it into the ducktape/CPD test framework via a new RpkBenchmarkService, plus a perf test and a service smoke test.

Changes:

  • Add rpk benchmark command to create a topic, optionally wait for balanced leadership, run a producer workload, and emit JSON metrics.
  • Add RpkBenchmarkService ducktape wrapper that runs the benchmark remotely and writes a result.json for CPD.
  • Add a perf test and a basic service self-test for the new benchmark wrapper.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/rptest/tests/rpk_benchmark_service_test.py New smoke test validating the ducktape benchmark wrapper produces usable metrics.
tests/rptest/services/rpk_benchmark_service.py New ducktape service wrapper to execute rpk benchmark and collect metrics.
tests/rptest/services/redpanda.py Add return type annotation to find_binary.
tests/rptest/perf/rpk_benchmark_test.py New perf/CPD test that runs the benchmark and writes result metrics.
src/go/rpk/pkg/cli/root.go Registers the new benchmark subcommand with the root CLI.
src/go/rpk/pkg/cli/benchmark/benchmark.go Implements the benchmark CLI command, topic lifecycle, leadership check, workload, and metrics output.
src/go/rpk/pkg/cli/benchmark/BUILD Bazel target for the new benchmark command package.
src/go/rpk/pkg/cli/BUILD Wires the new benchmark package into the CLI build deps.

@StephanDollberg StephanDollberg force-pushed the stephan/bencher branch 2 times, most recently from d5dbb9c to 39e9161 Compare March 7, 2026 23:35
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Mar 8, 2026

CI test results

test results on build#81480
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
WriteCachingFailureInjectionE2ETest test_crash_all {"use_transactions": false} integration https://buildkite.com/redpanda/redpanda/builds/81480#019ccaaf-eb27-4742-b388-81011d061513 FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0907, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.2482, p1=0.0577, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all
test results on build#81485
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
TestReadReplicaService test_identical_lwms_after_delete_records {"cloud_storage_type": 1, "partition_count": 5} integration https://buildkite.com/redpanda/redpanda/builds/81485#019ccd97-c2ae-4348-a6a3-2e4e6ca2d6b4 FLAKY 18/21 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0537, p0=0.2920, reject_threshold=0.0100. adj_baseline=0.1526, p1=0.3925, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TestReadReplicaService&test_method=test_identical_lwms_after_delete_records
test results on build#81486
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
DirectConsumerVerifierTest test_basic_consuming_from_topic {"failure_mode": "RANDOM"} integration https://buildkite.com/redpanda/redpanda/builds/81486#019cce21-3d2a-48fa-9fdf-733cbfdf31b9 FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=DirectConsumerVerifierTest&test_method=test_basic_consuming_from_topic
CloudTopicsL0GCNodeFailureTest test_node_failure_mid_gc {"cloud_storage_type": 2} integration https://buildkite.com/redpanda/redpanda/builds/81486#019cce21-3d2d-4a09-8dc5-564a34f63a12 FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0269, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=CloudTopicsL0GCNodeFailureTest&test_method=test_node_failure_mid_gc

@StephanDollberg StephanDollberg force-pushed the stephan/bencher branch 2 times, most recently from 34cfa88 to 6e3351b Compare March 8, 2026 15:37
Copy link
Contributor

@graham-rp graham-rp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of ergonomic concerns, but I think the in-the-weeds logic looks good. Back a few steps though, have you thought about a more interactive version of this? I'm thinking it could be a pretty good use of timers, spinner, etc

Comment on lines +40 to +41
warmup_s=20,
duration_s=60,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the full 80s during tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, smoothes out some of the noise.


cmd := &cobra.Command{
Use: "benchmark",
Short: "Run a Kafka benchmark",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could probably use a Long: set of instructions here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +422 to +425
cmd.Flags().Int32VarP(&partitions, "partitions", "p", 18, "Number of partitions for benchmark topic creation")
cmd.Flags().Int16VarP(&replicas, "replicas", "r", 3, "Replication factor for benchmark topic creation")
cmd.Flags().IntVar(&clients, "clients", 16, "Number of producer client connections")
cmd.Flags().IntVar(&recordSize, "record-size", 100, "Record payload size in bytes")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, why these numbers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a small batch size high RPS workload which generally stresses brokers the most.

cmd.Flags().IntVar(&recordSize, "record-size", 100, "Record payload size in bytes")
cmd.Flags().IntVar(&warmupS, "warmup", 10, "Warmup duration in seconds")
cmd.Flags().IntVar(&durationS, "duration", 60, "Measurement duration in seconds")
cmd.Flags().StringVar(&metricsJSON, "metrics-json", "", "Optional path to write final metrics JSON")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there's a way to get this into the regular --format type structure. Maybe suppress output if --format=json?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see what that does. Having output would still be useful for debugging purposes. Only the final "stats" are really needed in json for tool usage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, yeah had a look and I am not sure it's 100% fit. I prefer still having status output.

}(cl)
}

ticker := time.NewTicker(time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having the ticker and printStats switching between phases, I'm thinking that using time.After to start the second phase may be more direct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely following. The ticker here in the end is really just for periodic printing.

The second/after-warmup phase is started like you suggest using time.After/Before in both the printer and metrics collection in the produceloop.

if final {
prefix = "final "
}
fmt.Printf("%srequests/s=%.2f MB/s=%.2f errors=%d\n", prefix, m.RequestsPerSec, m.MBPerSec, m.Errors)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the canonical way to do this would be out.NewTable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched

topic string,
payload []byte,
measureStart time.Time,
stats *stats,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$0.02: I think this sort of violates the ideas of CSP. It'd probably be more idiomatic to send results down a channel, but I think it's probably fine here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Channels are quite expensive for what is basically two atomnic increments.

@StephanDollberg
Copy link
Member Author

Couple of ergonomic concerns, but I think the in-the-weeds logic looks good. Back a few steps though, have you thought about a more interactive version of this? I'm thinking it could be a pretty good use of timers, spinner, etc

For now this mostly an internal (hidden) tool with the benefit of it being in rpk that it's basically everywhere available where PR is deployed. We are still considering making it a separate app. Later we might decided to make it generally available at which point we might be able to tune visuals.

Adds a (hidden) benchmark subcommand.

For now it's only a simple produce RPS throughput benchmark.

Right now we mostly use OMB. However OMB has the disadvantage that it's
basically just latency benchmark while we are really missing a
throughput benchmark.

Equally rdkafka_performance has the disadvantage that the code is a bit
rough but more importantly it's single connection and librdkafka suffers
from kafka client C10K problem.

Using the go client (as is done rpk) avoids this issue.
Adds a service class for the new rpk subcommand.

Note this command potentially runs for longer so can't be part of the
normal rpk commands (this would run into DT timeouts otherwise).
Add a CPD produce RPS throughput test.

We keep the OMB result.json result file which means the results get
automatically ingested into perf db.

Mostly optimized for noise. NB. reactor util is not at 100% here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants