Feature Request / Issue: Global State Support for ai-rate-limiting (Redis Integration) #13077
Closed
mohamedDev
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
|
This feature is not currently supported; however, it is being added in the PR at #12751. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
The Problem (Context)
Currently, the ai-rate-limiting plugin stores token usage and quotas in local shared memory (SHM). In a modern Kubernetes deployment with multiple APISIX replicas, the token counters are isolated per Pod.
Why this is breaking AI Gateway features:
Inconsistent Fallback: If a user makes a request that consumes 2500 tokens on Pod A (where the limit is 50), the next request routed to Pod B or Pod C will bypass the fallback logic because their local counters are still at 0.
Quota Multiplier: The effective limit becomes limit * number_of_replicas, making fine-grained AI cost control impossible.
Header Drift: The X-Rate-Limit-Remaining headers change inconsistently depending on which Pod the Load Balancer hits.
Comparison with existing plugins
Standard plugins like limit-count already support a policy: redis option to synchronize state across a cluster. The ai-rate-limiting plugin lacks this critical "Cloud Native" feature.
Proposed Solution
Add a policy field to the ai-rate-limiting plugin configuration, allowing users to point to a Redis instance for global token counting
Current Workaround (Suboptimal)
Currently, users have to write complex serverless-functions to manually increment Redis keys in the log phase and check them in the access phase, which defeats the purpose of having a dedicated AI plugin.
Beta Was this translation helpful? Give feedback.
All reactions