fix: reuse existing client endpoint when configuring server interface#3157
fix: reuse existing client endpoint when configuring server interface#3157xDev789 wants to merge 1 commit intoliqotech:masterfrom
Conversation
|
Hi @xDev789. Thanks for your PR! I am @adamjensenbot.
Make sure this PR appears in the liqo changelog, adding one of the following labels:
|
|
/rebase |
312ef30 to
b76e8e0
Compare
|
/rebase test=true |
|
Hi @xDev789, I've checked your PR and something is not clear to me. You are forcing the wg interface to use always the same peer's endpoint, but what happens if it changes? I tested it with cilium and I've noticed that this field is populated with an IP which is the one assigned to the |
|
/rebase test=true |
Reuse existing client endpoint when configuring server interface.
b76e8e0 to
1800b80
Compare
|
I tried to move pods from one node to another, keeping the wrong IP in the server, and it seems that everything is working. It seems that the peer's endpoint is set in server mode but it's balue is ignored. I need to take some additional tests |
|
Hi @cheina97! Thank you for reviewing my PR. You are right to be sceptical about reusing the same peer endpoint but it is only being explicitly set when configureDevice function gets called (i.e. when PublicKey custom resource gets reconciled). In case the peer endpoint changes, wireguard will update the endpoint due to its automatic peer discovery. It is the same mechanism Liqo currently relies on, except without this PR, reconciling PublicKey CR results in connection interruption because discovered peer endpoint gets erased. As for the testing part, I've tested it on two Kind clusters with two nodes each and it worked as expected. We also use the stable version of Liqo with these patches applied to establish tunnels between the control and worker clusters. |
I've tried and it seems that wireguard was not able to reconfigure automatically the IP since it is set forcefully by the controller. Instead, without explicit setup, that IP is updated correctly. I'm not against this change, but I would prefer to wait a little bit and test it properly. Can you share all the scenarios you have tried? I would like to know what provider have you tried and which CNI were you using (also with Kind clusters) |
I totally agree with you, I strongly believe in shipping quality product and I'm not trying to rush this change either. We use RKE2 with Cilium in Kube-Proxy replacement mode for both server and client clusters. With Kind clusters, I also used Cilium. I've tried rescheduling the gateway client pod on another node. I've also tried the HA scenario when another pod obtains the lease and becomes the active gateway. Both scenarios resulted in connection being reestablished as expected. Could you share more details on the problematic case? You said it worked initially but some other tests helped you spot an issue. |
Thanks for your help, we just need to test it properly with other CNIs. I just need some time to test it and to process |
|
Hi @xDev789 , sorry for the delay! We chatted with the other maintainers about this feature, and we’re leaning towards keeping it under a feature flag for now. Would you mind adding an if statement around your feature and including a flag in the helm chart to enable or disable it? |
Reuse existing client endpoint when configuring server interface.
Description
Wireguard container now preserves peer's endpoint when configuring the server interface, which allows for uninterrupted connectivity in an event of PublicKey resource reconciliation.
Fixes #3090
How Has This Been Tested?