Skip to content

Conversation

@joshk
Copy link
Collaborator

@joshk joshk commented Oct 2, 2024

This PR is for discussion.

When NervesHub hosts are shutdown (usually for updating) the device connection telemetry and database updates may need some extra time to complete.

This previously wasn't an issue as we weren't hooking into clean shutdowns, and in the case of Fly.io, a hard kill was being used by default.

I think having these dials available is useful, and I think it's worth discussing this approach.

Copy link
Contributor

@oestrich oestrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell we've been using the default drainer config and it works but tends to spike the ecto queue as expected. So I don't think it's a bad idea to allow configuration. I think the smaller batch size is good but should we keep DEVICE_SOCKET_DRAINER_BATCH_INTERVAL to be 2000? I don't have any strong feelings towards either way

@joshk
Copy link
Collaborator Author

joshk commented Oct 2, 2024

I thought 4 secs, as a max cut off (I believe it acts as a timeout) would give time for all the sockets in the group to be closed and update the db.

If you are ok with keeping this PR as it, I'll merge it in and we can reevaluate after it has had some testing?

@joshk joshk merged commit 40b434e into main Oct 2, 2024
@joshk joshk deleted the drainer-discussion branch October 2, 2024 21:56
@oestrich
Copy link
Contributor

oestrich commented Oct 3, 2024

For posterity - chatted in Slack and I was 💯 on merging since it's configurable. If it's a timeout then I'm totally cool with 4s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants