Skip to content

Comments

refactor run events; emit them to sync service#31

Merged
nonsense merged 25 commits intomasterfrom
peek
Oct 13, 2020
Merged

refactor run events; emit them to sync service#31
nonsense merged 25 commits intomasterfrom
peek

Conversation

@nonsense
Copy link
Member

@nonsense nonsense commented Sep 29, 2020

This PR is introducing a notification package in the SDK - a way for the daemon to receive notifications from the sync service with respect to events on the test instances.

Notifications' goal is to convey information to the Testground daemon with respect to the:

  1. Outcomes of test plan instances
  2. Stages passed

When test plans have multiple stages, this functionality will make it easier to determine at which stage a test run blocked and why.


TODO:

  • update Event struct to support Stage events
  • update Event struct to support pointers to concrete types (ala Kubernetes - helps with deserialisation)
  • add godoc

FOLLOW UP ISSUE:

  • get rid of mapstructure on sdk-go and testground
  • get rid of zap logger from sdk-go
  • add SubscribeEvents function to WatchClient, so that we can monitor incoming events as they happen (useful for the Stage events).
  • refactor FetchAllEvents and SubscribeEvents and maybe unify in one method.

Copy link
Member

@hacdias hacdias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

}
c.runenv.RecordMessage(InitialisationSuccessful)

err = c.syncClient.SignalEvent(ctx, runtime.NewStageNotification(c.runenv.TestGroupID, "network-initialized", "exit"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we record exit:err or exit:ok depending on the outcome. Also, should notifications be able to carry a message? It seems like propagating the error could save us from having to look into logs.

Copy link
Member Author

@nonsense nonsense Oct 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind notifications is for them to be aggregated - I don't really have a use-case for them to propagate specific messages.

exit:err and exit:ok do not belong here as far as I can tell, this is specifically the place that emits a notification on one of the barriers we care about (and that we've had issues in the past, once networking goes down)... so I am not sure I follow the question :)

Copy link
Contributor

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and we should probably add tests for this stuff ;-)

@nonsense nonsense marked this pull request as draft October 2, 2020 15:42
@nonsense nonsense force-pushed the peek branch 3 times, most recently from 60b22ab to ba89012 Compare October 5, 2020 13:39
}, nil
}

func (w *WatchClient) FetchAllEvents(rp *runtime.RunParams) ([]*runtime.Event, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method implicitly assumes that this will be called at the end. If that's the case, we should document it (as it's part of the contract), and instead of doing this undeterministic pagination, we should do an XLen and consume that many elements (paginating, if necessary), to make this method deterministic.


Why not move this to a channel-based subscription? We are going to need real-time consumption pretty soon. And we are already doing a funny dance here to paginate, anyway. We don't need to implement indefinite Block and the unblock flow (like the sync service topic subscription does).

Just passing in a context, keeping the block time of 1 second, and looping until the context is closed is enough.

We can also pass a closeOnTerminalEvt bool flag that closes the channel when we find a terminal event (crash, success, failure). We can add Event#IsTerminal() bool to abstract over that.

That would allow the daemon to do:

wcl := NewWatchClient()
for evt := range wcl.FetchEvents(ctx) {
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - this method is not guaranteed to exit, if the producer keeps on spamming events.

We should either refactor and just use the Subscribe method, or use XLen as you suggest.

@raulk raulk changed the title introduce notifications refactor run events; emit them to sync service Oct 13, 2020
Copy link
Contributor

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants