fix: (shard-distributor) use one transaction for assignShardsInCurren…#7687
Conversation
…tHeartbeat Signed-off-by: edigregorio <edigregorio@uber.com>
0050b1d to
ac414a3
Compare
🔍 CI failure analysis for ac414a3: Integration test failed during shutdown phase due to SQLite database locking and context deadline exceeded errors, completely unrelated to the shard distributor code changes in this PR.IssueThe CI failure in "Golang integration test with sqlite" is caused by test infrastructure issues during shutdown, specifically context deadline exceeded errors and SQLite database locking issues. Root CauseThe integration test run failed with multiple error patterns during test shutdown:
The final test result was: DetailsWhy this is unrelated to PR changes:
Test Infrastructure Problem:
Evidence this is pre-existing/flaky:
Code Review 👍 Approved with suggestions 0 resolved / 1 findingsSolid fix replacing the two-step Delete+Assign with a single atomic update using ModRevision for optimistic concurrency. The race condition described in the PR is properly addressed. One previous minor finding about test coverage of ModRevision propagation remains unresolved. 💡 Edge Case: Test doesn't verify ModRevision propagation to AssignShards📄 service/sharddistributor/handler/executor_test.go:224 The existing success-path test (Test Case 7) creates an Since the core improvement of this PR is preserving
This would ensure a future refactor doesn't accidentally lose the Rules ❌ No requirements metRepository Rules
OptionsAuto-apply is off → Gitar will not commit updates to this branch. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
…tHeartbeat
What changed?
When the assignment is from the heartbeat, instead of deleting the info stored in the executor key and repopulating them, just update theassignment.
Why?
When the change was introduced it was only the executor that was calling get shard-owner if the shard was not assigned to the current executor, so the following scenatio was not possible.
Now also spectators are calling GetShardOwner and the following happens:
To avoid this we can simply update the executor info instead of deleting the key.
How did you test it?
added unit tests and:
go test github.com/uber/cadence/service/sharddistributor/handler
Potential risks
shard manager is in development phase, no production integration is recomanded.
Release notes
Documentation Changes
Reviewer Validation
PR Description Quality (check these before reviewing code):
go testinvocation)