Skip to content

Disconnects seem to be quite ungraceful #342

@karalabe

Description

@karalabe

Been playing around with NSQ a lot lately and I keep hitting walls when trying to write test suites for assembling various network topologies. Most of the issues seems to stem from NSQD not handling properly consumer disconnects (I'm using go-nsq). I don't even know where to describe the strange things:

  • When stopping a consumer, sometimes the CLS message is sent to NSQD, sometimes it is not.
  • Even if the CLS does get to NSQD, sometimes it seems to not respond with CLOSE_WAIT, rather nukes the stream.
  • The logs are full of error messages on both consumer and broker side during shutdowns that one side or another tries to read/write but the stream is already dead (no graceful disconnect).
  • Disconnecting the last consumer doesn't seem to decrement the client count of a topic/channel.
  • Disconnecting a consumer doesn't seem to abort/reschedule the in-flight messages for that consumer.

Seems to me that the entire shutdown pathway is very very wrong, just that various timeouts hack around the root cause. E.g. the client heartbeats (or lack thereof after a disconnect) is the one that will trigger the cleanup of leftover client counts; the in-flight timeout is the one that reschedules messages nor processed by a disconnected client.

I'm unsure if I'm doing something weird here, but it seems that NSQ is very very prone to weird behavior when I have very short lived connections.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions