Skip to content

Nat to nat#146

Merged
squat merged 7 commits intosquat:mainfrom
JulienVdG:nat2nat
Apr 21, 2021
Merged

Nat to nat#146
squat merged 7 commits intosquat:mainfrom
JulienVdG:nat2nat

Conversation

@JulienVdG
Copy link
Copy Markdown
Contributor

My approach is highly inspired from @squat comments in #109, with a difference on the annotation usage:
The annotation is note used for a node to present it's own endpoint detected by other, but the other way around: a node published the NAT endpoints it detected.

This has 2 advantages:

  • the node detecting the endpoint will only update it's own annotation (and can use the existing code for this)
  • if several nodes detect different values, they won't toggle the same annotation but instead publish each their own view.

Then I updated the topology code to resolve the detected endpoint so that the local node will use :

  • the endpoint it detected itself (to avoid changing a working configuration)
  • the endpoint detected by a node in the topology (looking segments and nodes in sorted order to get a stable output)

I tested this on a cluster with public nodes and 2 NAT regions and the NAT regions could then see each-other.

The code probably still needs polishing.

@JulienVdG
Copy link
Copy Markdown
Contributor Author

@remyleone if you want to have a look.

Comment thread pkg/wireguard/conf.go Outdated
Copy link
Copy Markdown
Owner

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JulienVdG 😍 thank you for taking a look at this issue. I definitely love the approach. I think that keeping a map of discovered endpoints in each node is great and will indeed help maintain stability in the cluster!

Also, thank you for adding the tests! I know these are very verbose and daunting.

One question: don't we need to remove the requirement in node.Ready for the node to have an endpoint?
Edit: as commented in #109 (comment), we don't need to modify node.Ready

Comment thread pkg/wireguard/conf.go Outdated
Comment thread pkg/mesh/topology.go Outdated
Comment thread pkg/mesh/topology.go
Comment thread pkg/mesh/mesh.go Outdated
Comment thread pkg/mesh/mesh.go Outdated
Comment thread pkg/mesh/mesh.go Outdated
updateNATEndpoints(nodes, peers, oldConf)
natEndpoints := updateNATEndpoints(nodes, peers, oldConf, m.logger)
nodes[m.hostname].DiscoveredEndpoints = natEndpoints
m.nodes[m.hostname].DiscoveredEndpoints = natEndpoints
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodes is a map of pointers to Nodes, so I think updating the nodes map should suffice and we should not need to update m.nodes as well

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The applyTopology function starts by doing shallow copies of the Ready Nodes ...
Updating nodes will provide the updated endpoint to NewTopology.
Updating m.nodes ensure that next call to checkIn will publish the DiscoveredEndpoints to k8s.

There are others way to do it, but this looked easy enough.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I definitely agree that we want both nodes[m.hostname] and m.nodes[m.hostname] to include the updated endpoints so that the checkIn method can update the endpoints annotation.

What I don't see is why we need to explicitly set .DiscoveredEndpoints = natEndpoints for both of them. Since nodes is a shallow copy, updating nodes[m.hostname] will also update m.nodes[m.hostname], no?

e.g. https://play.golang.com/p/JXCE1jnUN-c

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the copy is not a pointer copy but a shallow copy so more like:
https://play.golang.com/p/istIdrj__0_P

for it to work I need to make the endpoint list a pointer so that the copy points to the same map
eg: https://play.golang.com/p/M2uunAQXRyI

I can do that if you think it's better that way.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're totally right 🤦

// Make a shallow copy of the node.

This lgtm then

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this some more, it occurs to me that the only reason we really had to do the shallow copies was to prevent the endpoints of the nodes to be overwritten by the updateNATEndpoints func, which used to mutate the parameters. Now that the func no longer mutates data we could probably switch from shallow copies to just copying the pointers. We can do this in a follow up however

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread pkg/mesh/mesh.go
n.Endpoint = peer.Endpoint
level.Debug(logger).Log("msg", "WireGuard Update NAT Endpoint", "node", n.Name, "endpoint", peer.Endpoint, "former-endpoint", n.Endpoint, "same", n.Endpoint.Equal(peer.Endpoint))
// Only public ip ? (should check location leader but only in topology ... or have topology handle that list)
// Better check wg latest-handshake
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea is to only set the endpoint, if the last handshake is not older than some constant, instead of how it is now?

So nodes that have an outdated endpoint will eventually forget about it and get the updated endpoint from another nodes' annotation?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a great feature to add eventually

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bare minimum is when latest-handshake is 0 wireguard didn't see any connection yet, so better not advertise it (anyway in that cas it's still equal to n.Endpoint). Then yes we could detect outdated endpoint.

Also in topology instead of a static ordering we could sort by latest-handshake, the fact that we give priority to the current detected endpoint already ensure stability, but having more noise would allow to try the different endpoints one node could have (in case of really complicated network)

The cost however in another call to wg (ie wg show kilo0 latest-handshakes) as the wg showconf call will not return the latest-handshake

Comment thread Makefile Outdated
PKG := github.com/squat/$(PROJECT)
REGISTRY ?= index.docker.io
IMAGE ?= squat/$(PROJECT)
ifneq ($(REGISTRY),index.docker.io)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind splitting this up into a new PR so we can keep each chunk of work separate and still squash the commits related to NAT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done #147

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the drop from this PR, sorry...

Now that updateNATEndpoints was updated to discoverNATEndpoints and that
the endpoints are overridden by topology instead of mutating the nodes and
peers object, we can safely drop this copy.
@JulienVdG
Copy link
Copy Markdown
Contributor Author

This one should be fine now, do you want me to squash the commits ?

Copy link
Copy Markdown
Owner

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Copy link
Copy Markdown
Owner

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to squash in the PR, I'll squash now while merging 🛸✈️🚀🚢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants