Skip to content

[BUG] QueueProcessor silently drops messages when function targets a poison queue #56759

@mathewc

Description

@mathewc

Library name and version

Microsoft.Azure.WebJobs.Extensions.Storage.Queues, 5.4.0-beta.1

Describe the bug

Description

QueueProcessor.BeginProcessingMessageAsync has a max dequeue count check that applies unconditionally, even when the function's target queue is itself a poison queue (e.g., myqueue-poison). When a message on such a queue exceeds MaxDequeueCount, it is silently dropped on every dequeue cycle and never processed.

Documentation here for poison message handling isn't specific whether MaxDequeueCount applies to the poison queue. However, the intent of poison queue handling is to ensure the user deals with the message eventually by processing it successfully and allowing it to be deleted.

Root Cause

When a function targets a queue ending in -poison, QueueListenerFactory.CreatePoisonQueueReference correctly returns null (no poison-of-poison queue). However, BeginProcessingMessageAsync still checks message.DequeueCount > MaxDequeueCount and calls HandlePoisonMessageAsync, which no-ops because _poisonQueue is null. The method then returns false, preventing the function from being invoked.

Relevant code path:

  1. QueueListenerFactory.CreatePoisonQueueReference — returns null for -poison suffixed queues
  2. QueueProcessor.BeginProcessingMessageAsync — unconditionally checks MaxDequeueCount
  3. QueueProcessor.HandlePoisonMessageAsync — no-ops when _poisonQueue is null

Impact

  • Messages on poison queues are never processed after exceeding MaxDequeueCount (default 5)
  • Messages remain in the queue indefinitely, getting dequeued and silently ignored each cycle
  • Target-based scaling sees these unactionable messages and makes incorrect scale-out decisions

Suggested Fix

Guard the dequeue count check in BeginProcessingMessageAsync with a _poisonQueue != null check:

if (_poisonQueue != null && message.DequeueCount > QueuesOptions.MaxDequeueCount)

If there is no poison queue to move the message to, the function should always be invoked.

Expected behavior

When a function targets a poison queue (e.g., myqueue-poison), messages should always be processed by the function regardless of their dequeue count.

Actual behavior

Messages exceeding MaxDequeueCount (default 5) are silently skipped on every dequeue cycle — never processed, never deleted — causing them to accumulate in the queue indefinitely and inflate scaling metrics.

Reproduction Steps

Set up 2 queue trigger functions. One to process messages from test-queue, and one to process messages from test-queue-poison. Configure both functions to fail message processing every time. After the first function fails 5 times, the message will be moved to the poison queue. The poison queue function will then be invoked 5 times, then after that will no longer be invoked, and the message will sit in the queue until its TTL expires (7 days by default).

Additional Notes

The proposed fix will be a behavior change that customers may notice, but I'd argue the current behavior is completely broken - it's not really by design that we let a queue message sit in the poison queue and repeatedly dequeue it and noop for a week until TTL expiry.

Poison queue handling is designed to ensure the customer processes these messages - if they opt in to processing that queue, they're signing up to handle those messages successfully

Importantly, our documentation actually states currently that MaxDequeueCount doesn't apply to poison queues:

/// Some queues do not have corresponding poison queues, and this property does not apply to them. Specifically,

Metadata

Metadata

Labels

ClientThis issue is related to a non-management packageService AttentionWorkflow: This issue is responsible by Azure service team.StorageStorage Service (Queues, Blobs, Files)needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions