Nat to nat by JulienVdG · Pull Request #146 · squat/kilo

JulienVdG · 2021-04-19T16:26:12Z

My approach is highly inspired from @squat comments in #109, with a difference on the annotation usage:
The annotation is note used for a node to present it's own endpoint detected by other, but the other way around: a node published the NAT endpoints it detected.

This has 2 advantages:

the node detecting the endpoint will only update it's own annotation (and can use the existing code for this)
if several nodes detect different values, they won't toggle the same annotation but instead publish each their own view.

Then I updated the topology code to resolve the detected endpoint so that the local node will use :

the endpoint it detected itself (to avoid changing a working configuration)
the endpoint detected by a node in the topology (looking segments and nodes in sorted order to get a stable output)

I tested this on a cluster with public nodes and 2 NAT regions and the NAT regions could then see each-other.

The code probably still needs polishing.

JulienVdG · 2021-04-19T16:28:28Z

@remyleone if you want to have a look.

squat

@JulienVdG 😍 thank you for taking a look at this issue. I definitely love the approach. I think that keeping a map of discovered endpoints in each node is great and will indeed help maintain stability in the cluster!

Also, thank you for adding the tests! I know these are very verbose and daunting.

~~One question: don't we need to remove the requirement in node.Ready for the node to have an endpoint?~~
Edit: as commented in #109 (comment), we don't need to modify node.Ready

squat · 2021-04-19T17:19:44Z

-	updateNATEndpoints(nodes, peers, oldConf)
+	natEndpoints := updateNATEndpoints(nodes, peers, oldConf, m.logger)
+	nodes[m.hostname].DiscoveredEndpoints = natEndpoints
+	m.nodes[m.hostname].DiscoveredEndpoints = natEndpoints


nodes is a map of pointers to Nodes, so I think updating the nodes map should suffice and we should not need to update m.nodes as well

The applyTopology function starts by doing shallow copies of the Ready Nodes ...
Updating nodes will provide the updated endpoint to NewTopology.
Updating m.nodes ensure that next call to checkIn will publish the DiscoveredEndpoints to k8s.

There are others way to do it, but this looked easy enough.

Yes I definitely agree that we want both nodes[m.hostname] and m.nodes[m.hostname] to include the updated endpoints so that the checkIn method can update the endpoints annotation.

What I don't see is why we need to explicitly set .DiscoveredEndpoints = natEndpoints for both of them. Since nodes is a shallow copy, updating nodes[m.hostname] will also update m.nodes[m.hostname], no?

e.g. https://play.golang.com/p/JXCE1jnUN-c

Well the copy is not a pointer copy but a shallow copy so more like:
https://play.golang.com/p/istIdrj__0_P

for it to work I need to make the endpoint list a pointer so that the copy points to the same map
eg: https://play.golang.com/p/M2uunAQXRyI

I can do that if you think it's better that way.

Yes, you're totally right 🤦

kilo/pkg/mesh/mesh.go

Line 435 in 749cdd1

// Make a shallow copy of the node.

This lgtm then

Thinking about this some more, it occurs to me that the only reason we really had to do the shallow copies was to prevent the endpoints of the nodes to be overwritten by the updateNATEndpoints func, which used to mutate the parameters. Now that the func no longer mutates data we could probably switch from shallow copies to just copying the pointers. We can do this in a follow up however

leonnicolas · 2021-04-19T20:10:48Z

-			n.Endpoint = peer.Endpoint
+			level.Debug(logger).Log("msg", "WireGuard Update NAT Endpoint", "node", n.Name, "endpoint", peer.Endpoint, "former-endpoint", n.Endpoint, "same", n.Endpoint.Equal(peer.Endpoint))
+			// Only public ip ? (should check location leader but only in topology ... or have topology handle that list)
+			// Better check wg latest-handshake


So the idea is to only set the endpoint, if the last handshake is not older than some constant, instead of how it is now?

So nodes that have an outdated endpoint will eventually forget about it and get the updated endpoint from another nodes' annotation?

That would be a great feature to add eventually

The bare minimum is when latest-handshake is 0 wireguard didn't see any connection yet, so better not advertise it (anyway in that cas it's still equal to n.Endpoint). Then yes we could detect outdated endpoint.

Also in topology instead of a static ordering we could sort by latest-handshake, the fact that we give priority to the current detected endpoint already ensure stability, but having more noise would allow to try the different endpoints one node could have (in case of really complicated network)

The cost however in another call to wg (ie wg show kilo0 latest-handshakes) as the wg showconf call will not return the latest-handshake

squat · 2021-04-20T10:14:20Z

 PKG := github.com/squat/$(PROJECT)
 REGISTRY ?= index.docker.io
 IMAGE ?= squat/$(PROJECT)
+ifneq ($(REGISTRY),index.docker.io)


Would you mind splitting this up into a new PR so we can keep each chunk of work separate and still squash the commits related to NAT?

I missed the drop from this PR, sorry...

Compare IP first by default and compare DNS name first when we know the Endpoint was resolved.

Now that updateNATEndpoints was updated to discoverNATEndpoints and that the endpoints are overridden by topology instead of mutating the nodes and peers object, we can safely drop this copy.

JulienVdG · 2021-04-21T11:07:32Z

This one should be fine now, do you want me to squash the commits ?

squat

Great work!

squat

No need to squash in the PR, I'll squash now while merging 🛸✈️🚀🚢

JulienVdG mentioned this pull request Apr 19, 2021

Allow nodes/peers behind NAT to communicate directly (NAT to NAT) #109

Closed

JulienVdG force-pushed the nat2nat branch from 7dcbb4e to 058569a Compare April 19, 2021 16:41

squat reviewed Apr 19, 2021

View reviewed changes

Comment thread pkg/wireguard/conf.go Outdated

squat reviewed Apr 19, 2021

View reviewed changes

leonnicolas reviewed Apr 19, 2021

View reviewed changes

leonnicolas approved these changes Apr 19, 2021

View reviewed changes

JulienVdG force-pushed the nat2nat branch from a15f664 to 749cdd1 Compare April 20, 2021 07:48

squat reviewed Apr 20, 2021

View reviewed changes

JulienVdG added 6 commits April 20, 2021 17:07

wireguard: export an Endpoint comparison method

f66efc7

Record discovered endpoints in node

ace8eb9

Synchronize DiscoveredEndpoints in k8s backend

ec35e03

Add discoveredEndpointsAreEqual

2b2437b

Handle discovered Endpoints in topology to enable NAT 2 NAT

babace5

Refactor to use Endpoint.Equal

4fadaee

Compare IP first by default and compare DNS name first when we know the Endpoint was resolved.

JulienVdG force-pushed the nat2nat branch from 749cdd1 to 4fadaee Compare April 20, 2021 15:07

Drop the shallow copies of nodes and peers

82fcdab

Now that updateNATEndpoints was updated to discoverNATEndpoints and that the endpoints are overridden by topology instead of mutating the nodes and peers object, we can safely drop this copy.

squat approved these changes Apr 21, 2021

View reviewed changes

squat merged commit 2ac000c into squat:main Apr 21, 2021

JulienVdG deleted the nat2nat branch April 23, 2021 09:29

squat mentioned this pull request May 7, 2021

Kilo in Cross-LAN communication #156

Closed

leonnicolas mentioned this pull request Jun 17, 2021

[Question] Routing in partial meshed cluster? #188

Closed

leonnicolas mentioned this pull request Nov 14, 2021

Repeated attempts to reconcile mesh network #253

Open

Conversation

JulienVdG commented Apr 19, 2021

Uh oh!

JulienVdG commented Apr 19, 2021

Uh oh!

Uh oh!

squat left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JulienVdG commented Apr 21, 2021

Uh oh!

squat left a comment

Choose a reason for hiding this comment

Uh oh!

squat left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

squat left a comment •

edited

Loading

squat left a comment •

edited

Loading