Skip to content

fix(wecom): fall back to 7200s when access-token expires_in is missing or non-positive#976

Open
kevinWangSheng wants to merge 1 commit into
chenhg5:mainfrom
kevinWangSheng:fix/wecom-token-expiry-zero-fallback
Open

fix(wecom): fall back to 7200s when access-token expires_in is missing or non-positive#976
kevinWangSheng wants to merge 1 commit into
chenhg5:mainfrom
kevinWangSheng:fix/wecom-token-expiry-zero-fallback

Conversation

@kevinWangSheng
Copy link
Copy Markdown
Contributor

Summary

`getAccessToken` in `platform/wecom/wecom.go` computed the cache window as:

```go
p.tokenCache.expiresAt = time.Now().Add(time.Duration(result.ExpiresIn-60) * time.Second)
```

The hard-coded 60-second safety margin assumes `ExpiresIn` is large enough to absorb it. If WeCom (or a proxy / private-deployment `api_base_url` that strips the field) returns `expires_in = 0` (or omits it), the arithmetic lands at `-60s` — `time.Now().Add(-60s)` is 60 seconds in the past, so `p.tokenCache.expiresAt.Before(time.Now())` returns `true` on the very next call.

`tokenCache.mu` serializes all callers behind the same mutex, so every outbound API request (send message, download media, refresh typing ticket, etc.) falls through to a fresh `/cgi-bin/gettoken` round-trip in a tight loop — easily tripping WeCom's token rate limit and turning every message into a token refetch.

The sibling `platform/qqbot/qqbot.go` already guards this with the same pattern (`if expiresSec <= 0 { expiresSec = 7200 }`); WeCom was missing it.

Fix

When `expires_in` is missing or non-positive, fall back to WeCom's documented 7200-second default and log a warning so an operator can spot a backend regression. Skip the 60-second buffer subtraction when the (real) `expires_in` is itself `<= 60` to avoid a negative window on legitimately short-lived tokens.

```go
expires := result.ExpiresIn
if expires <= 0 {
slog.Warn("wecom: missing/invalid expires_in in token response, defaulting to 7200s", "got", result.ExpiresIn)
expires = 7200
}
if expires > 60 {
expires -= 60
}
p.tokenCache.expiresAt = time.Now().Add(time.Duration(expires) * time.Second)
```

Tests

  • `TestGetAccessToken_ZeroExpiresIn_FallsBackToDefault` — stubs the `/cgi-bin/gettoken` response with `{"expires_in":0}` and asserts the cache window is at least 1 hour. Stash-revert just `wecom.go` on this branch and the test fails with `tokenCache window = -59.99s`, confirming the regression guard catches the off-by-margin.
  • `TestGetAccessToken_NormalExpiresIn_AppliesBuffer` — pins the `7200 → ~7140s` buffer behaviour so the fallback can't drift.

Both use a small `http.RoundTripper` stub on `apiClient.Transport`; no network access required.

Test plan

  • `go test ./platform/wecom/ -count=1 -timeout 60s` — all wecom tests pass.
  • Stash-revert just `wecom.go` while keeping new tests — zero-expires_in case fails with `-59.99s` window, confirming bug.
  • `go build ./...` (excluding `web/embed.go`'s `all:dist` pattern which needs the JS bundle).

…g/zero

getAccessToken computed the cache window as:

    p.tokenCache.expiresAt = time.Now().Add(time.Duration(result.ExpiresIn-60) * time.Second)

The hard-coded 60s safety margin assumes ExpiresIn is large enough to
absorb it. If WeCom (or a proxy in front of it / a private-deployment
api_base_url that mangles the response) returns expires_in = 0 or
omits the field, the arithmetic lands at -60s — `time.Now().Add(-60s)`
is 60 seconds in the past, so the cache is stale on the very next
call. With tokenMu serializing all callers behind the same mutex,
every outbound API request (send message, download media, refresh
typing ticket…) immediately falls through to a fresh /cgi-bin/gettoken
round-trip, easily tripping WeCom's token rate limit.

Adopt the same pattern qqbot already uses: when expires_in is missing
or non-positive, fall back to WeCom's documented 7200s default and
log a warning so an operator can notice a backend regression. Skip the
60s buffer when the (real) expires_in is itself <= 60 to avoid a
negative window in legitimate short-lived tokens.

Tests:
  - TestGetAccessToken_ZeroExpiresIn_FallsBackToDefault: stubs the
    /cgi-bin/gettoken response with `{"expires_in":0}` and asserts the
    cache window is at least 1h. Stash-revert just wecom.go on this
    branch and the test fails with `tokenCache window = -59.99s`,
    confirming the regression guard catches the off-by-margin.
  - TestGetAccessToken_NormalExpiresIn_AppliesBuffer: pins the
    7200 -> ~7140s buffer behaviour so the fallback can't drift.

Both use a small http.RoundTripper stub on apiClient.Transport; no
network access required.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant