-
Notifications
You must be signed in to change notification settings - Fork 77
Fix the wrong backoff computation when retrying #296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the wrong backoff computation when retrying #296
Conversation
|
|
### Motivation All the retryable operations share the same `Backoff` object in `RetryableLookupService`, so if the reconnection happens for some times, the delay of retrying will keeps the maximum value (30 seconds). ### Modifications Refactor the design of the `RetryableLookupService`: - Add a `RetryableOperation` class to represent a retryable operation, each instance has its own `Backoff` object. The operation could only be executed once. - Add a `RetryableOperationCache` class to represent a map that maps a specific name to its associated operation. It's an optimization that if an operation (e.g. find the owner topic of topic A) was not complete while the same operation was executed, the future would be reused. - In `RetryableLookupService`, just maintain some caches for different operations. - Add `RetryableOperationCacheTest` to verify the behaviors.
1146d02 to
78bb0be
Compare
lib/RetryableLookupService.h
Outdated
| size_t getNumberOfPendingTasks() const { | ||
| return lookupCache_->size() + partitionLookupCache_->size() + namespaceLookupCache_->size() + | ||
| getSchemaCache_->size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only used for testing. Do we need to expose it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed them. PTAL again.
shibd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! Left some small comments.
|
It seems some tests failed after merging the main branch. Mark it as drafted currently. |
|
@shibd @RobertIndie Now all tests passed, PTAL again. |
### Motivation apache#296 introduced a regression for GCC <= 7. > lib/RetryableOperation.h:109:66: error: 'pulsar::RetryableOperation<T>::runImpl(pulsar::TimeDuration)::<lambda(pulsar::Result, const T&)> [with T = pulsar::LookupService::LookupResult]::<lambda(const boost::system::error_code&)>' declared with greater visibility than the type of its field 'pulsar::RetryableOperation<T>::runImpl(pulsar::TimeDuration)::<lambda(pulsar::Result, const T&)> [with T = pulsar::LookupService::LookupResult]::<lambda(const boost::system::error_code&)>::<this capture>' [-Werror=attributes] It seems to be a bug for GCC <= 7 abort the visibility of the lambda expression might not be affected by the `-fvisibility=hidden` option. ### Modifications Add `__attribute__((visibility("hidden")))` to `RetryableOperation::runImpl` explicitly.
### Motivation #296 introduced a regression for GCC <= 7. > lib/RetryableOperation.h:109:66: error: 'pulsar::RetryableOperation<T>::runImpl(pulsar::TimeDuration)::<lambda(pulsar::Result, const T&)> [with T = pulsar::LookupService::LookupResult]::<lambda(const boost::system::error_code&)>' declared with greater visibility than the type of its field 'pulsar::RetryableOperation<T>::runImpl(pulsar::TimeDuration)::<lambda(pulsar::Result, const T&)> [with T = pulsar::LookupService::LookupResult]::<lambda(const boost::system::error_code&)>::<this capture>' [-Werror=attributes] It seems to be a bug for GCC <= 7 abort the visibility of the lambda expression might not be affected by the `-fvisibility=hidden` option. ### Modifications Add `__attribute__((visibility("hidden")))` to `RetryableOperation::runImpl` explicitly.
Motivation
All the retryable operations share the same
Backoffobject inRetryableLookupService, so if the reconnection happens for some times, the delay of retrying will keeps the maximum value (30 seconds).Modifications
Refactor the design of the
RetryableLookupService:RetryableOperationclass to represent a retryable operation, each instance has its ownBackoffobject. The operation could only be executed once.RetryableOperationCacheclass to represent a map that maps a specific name to its associated operation. It's an optimization that if an operation (e.g. find the owner topic of topic A) was not complete while the same operation was executed, the future would be reused.RetryableLookupService, just maintain some caches for different operations.RetryableOperationCacheTestto verify the behaviors.Documentation
doc-required(Your PR needs to update docs and you will update later)
doc-not-needed(Please explain why)
doc(Your PR contains doc changes)
doc-complete(Docs have been already added)