-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
Library name and version
Microsoft.Azure.WebJobs.Extensions.Storage.Queues, 5.4.0-beta.1
Describe the bug
Description
QueueProcessor.BeginProcessingMessageAsync has a max dequeue count check that applies unconditionally, even when the function's target queue is itself a poison queue (e.g., myqueue-poison). When a message on such a queue exceeds MaxDequeueCount, it is silently dropped on every dequeue cycle and never processed.
Documentation here for poison message handling isn't specific whether MaxDequeueCount applies to the poison queue. However, the intent of poison queue handling is to ensure the user deals with the message eventually by processing it successfully and allowing it to be deleted.
Root Cause
When a function targets a queue ending in -poison, QueueListenerFactory.CreatePoisonQueueReference correctly returns null (no poison-of-poison queue). However, BeginProcessingMessageAsync still checks message.DequeueCount > MaxDequeueCount and calls HandlePoisonMessageAsync, which no-ops because _poisonQueue is null. The method then returns false, preventing the function from being invoked.
Relevant code path:
QueueListenerFactory.CreatePoisonQueueReference— returnsnullfor-poisonsuffixed queuesQueueProcessor.BeginProcessingMessageAsync— unconditionally checksMaxDequeueCountQueueProcessor.HandlePoisonMessageAsync— no-ops when_poisonQueueisnull
Impact
- Messages on poison queues are never processed after exceeding
MaxDequeueCount(default 5) - Messages remain in the queue indefinitely, getting dequeued and silently ignored each cycle
- Target-based scaling sees these unactionable messages and makes incorrect scale-out decisions
Suggested Fix
Guard the dequeue count check in BeginProcessingMessageAsync with a _poisonQueue != null check:
if (_poisonQueue != null && message.DequeueCount > QueuesOptions.MaxDequeueCount)If there is no poison queue to move the message to, the function should always be invoked.
Expected behavior
When a function targets a poison queue (e.g., myqueue-poison), messages should always be processed by the function regardless of their dequeue count.
Actual behavior
Messages exceeding MaxDequeueCount (default 5) are silently skipped on every dequeue cycle — never processed, never deleted — causing them to accumulate in the queue indefinitely and inflate scaling metrics.
Reproduction Steps
Set up 2 queue trigger functions. One to process messages from test-queue, and one to process messages from test-queue-poison. Configure both functions to fail message processing every time. After the first function fails 5 times, the message will be moved to the poison queue. The poison queue function will then be invoked 5 times, then after that will no longer be invoked, and the message will sit in the queue until its TTL expires (7 days by default).
Additional Notes
The proposed fix will be a behavior change that customers may notice, but I'd argue the current behavior is completely broken - it's not really by design that we let a queue message sit in the poison queue and repeatedly dequeue it and noop for a week until TTL expiry.
Poison queue handling is designed to ensure the customer processes these messages - if they opt in to processing that queue, they're signing up to handle those messages successfully
Importantly, our documentation actually states currently that MaxDequeueCount doesn't apply to poison queues:
Line 147 in fc23a89
| /// Some queues do not have corresponding poison queues, and this property does not apply to them. Specifically, |