Skip to content

feat: withRetry helper#343

Open
SilanHe wants to merge 6 commits intomainfrom
retryable-operation
Open

feat: withRetry helper#343
SilanHe wants to merge 6 commits intomainfrom
retryable-operation

Conversation

@SilanHe
Copy link
Copy Markdown
Contributor

@SilanHe SilanHe commented Apr 22, 2026

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Issue Link, if available

N/A

Description

Add a public WithRetry helper that retries any durable operation (waitForCallback, invoke, waitForCondition, etc.) end-to-end with configurable backoff. Today users have to hand-roll a while loop with manual context.wait() backoff calls. This helper provides a single, replay-safe, well-tested primitive for that pattern, reusing the exact same RetryStrategy shape developers already know from StepConfig.

New files:

  • WithRetry<T> (sdk/.../util/) — @FunctionalInterface that takes (DurableContext, int attempt) and returns T. Enables lambda syntax for the operation to retry.
  • withRetryConfig (sdk/.../config/) — Builder-pattern config holding the RetryStrategy (required, same type as StepConfig) and wrapInChildContext (defaults to true).
  • withRetryHelper (sdk/.../util/) — Static utility with two overloads:
    • Named form retryOperation(context, name, operation, config) — wraps in runInChildContext by default for clean execution history grouping.
    • Anonymous form retryOperation(context, operation, config) — runs directly in the caller's context, no wrapping.

Key behaviors:

  • Reuses the existing RetryStrategy / RetryDecision types — zero new retry concepts.
  • Replay-safe by construction: every side-effect in the loop is a durable operation (operation.execute() calls durable primitives, context.wait() handles backoff).
  • Correctly propagates SuspendExecutionException and UnrecoverableDurableExecutionException without retrying — these are internal SDK control flow signals.
  • Default 1-second backoff when RetryDecision.delay() is Duration.ZERO.

Demo/Screenshots

N/A — utility library change, no UI.

Checklist

  • I have filled out every section of the PR template
  • I have thoroughly tested this change

Testing

Unit Tests

Yes. 36 unit tests across 3 test classes in sdk/src/test/:

  • WithRetryConfigTest (8 tests) — builder validation, defaults, required fields, chaining.
  • WithRetryHelperTest (24 tests in 3 nested classes):
    • NamedForm (9) — child-context wrapping, opt-out, backoff wait naming, default delay, null validation.
    • AnonymousForm (7) — no wrapping, anonymous backoff naming, exception identity, null validation.
    • RetryBehavior (8) — attempt numbers, error forwarding, custom delays, context passthrough, SuspendExecutionException propagation, UnrecoverableDurableExecutionException propagation, retry exhaustion.
  • RetryableOperationTest (4 tests) — lambda implementation, context/attempt passthrough, exception propagation, null return.

Integration Tests

Yes. 12 integration tests across 2 test classes in sdk-integration-tests/src/test/:

  • RetryInvokeIntegrationTest (6 tests) — first-attempt success, retry after failure, all retries exhausted, custom backoff delays, composition with steps, original exception type preservation.
  • RetryWaitForCallbackIntegrationTest (6 tests) — first-attempt success, retry after callback failure, all retries exhausted, composition with steps, multiple failures then success, submitter re-execution on each retry.

Examples

Yes. 2 example handlers with 8 example tests in examples/:

  • RetryInvokeExample + RetryInvokeExampleTest (4 tests) — demonstrates withRetry wrapping context.invoke with retry on transient failures.
  • RetryWaitForCallbackExample + RetryWaitForCallbackExampleTest (4 tests) — demonstrates withRetry wrapping context.waitForCallback with retry on callback rejection.

@SilanHe SilanHe changed the title feat: Retryable operation feat: withRetry helper Apr 27, 2026
@SilanHe SilanHe marked this pull request as ready for review April 27, 2026 18:53
@SilanHe SilanHe requested a review from a team April 27, 2026 18:53

// Step 2: waitForCallback with retry — if the external system fails, try again with a fresh callback
var approvalResult = WithRetryHelper.withRetry(
context,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If withRetry requires a durable context parameter, why don't we put this method into DurableContext?

* @return the operation result
*/
@SuppressWarnings("unchecked")
public static <T> T withRetry(DurableContext context, String name, WithRetry<T> operation, WithRetryConfig config) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is async version withRetryAsync of it? withRetryAsync will return a DurableFuture and users can combine that with anyOf/allOf or just kick off an operation with retry in the background.

context,
(ctx, attempt) -> ctx.waitForCallback(
"approval-" + attempt, String.class, (callbackId, stepCtx) -> stepCtx.getLogger()
.info("Attempt {}: sending callback {} to approval system", attempt, callbackId)),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this supplier have to contain only one operation? How do we ensure and validate this?

*
* @return true if child-context wrapping is enabled
*/
public boolean wrapInChildContext() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which scenarios wrapInChildContext is mandate?

*/
public WithRetryConfig build() {
if (retryStrategy == null) {
throw new IllegalArgumentException("retryStrategy is required");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All existing configurations are optional. I think we should keep the convention for this new WithRetryConfig


if (config.wrapInChildContext()) {
return (T) context.runInChildContext(
name, new TypeToken<Object>() {}, childCtx -> executeRetryLoop(childCtx, name, operation, config));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have a new subtype for this child context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants