Skip to content

feat: Direct connection support between provider clusters#3115

Open
Gabbe64 wants to merge 12 commits intoliqotech:masterfrom
Gabbe64:no-fcc-implementation
Open

feat: Direct connection support between provider clusters#3115
Gabbe64 wants to merge 12 commits intoliqotech:masterfrom
Gabbe64:no-fcc-implementation

Conversation

@Gabbe64
Copy link

@Gabbe64 Gabbe64 commented Aug 28, 2025

Description

Fixes #3011

This PR introduces the possibility to make pods deployed on provider clusters communicate directly by using an established network connection (liqoctl network connect).

This feature is opt-in: users can enable it by adding the annotation key use-direct-connections: true to the Service exposing the pods that should communicate directly. This way, the default behavior remains unchanged, while also giving users the possibility to choose whether to route traffic through the consumer cluster or directly.

Implementation

As discussed during the last community meetings, the modified components are the virtual-kubelet and the ShadowEndpointslice controller.

Transmission - virtual-kubelet

When reflecting Endpointslices and the corresponding Service has the right annotation, some data is collected from the informers (without direct the API server calls) then serialized as JSON and reflected in the annotation field of the ShadowEndpointslice.

This data is:

  1. The clusterID where the endpoint is deployed,
  2. The local IPs contained in the Endpointslice of the consumer,
  3. The same IPs, remapped by the consumer cluster.

Reception

The ShadowEndpointslice controller (in the provider cluster) receives this data and uses it to remap the addresses so to use the direct connections towards other provider clusters (if available).

To achieve this:

  • Endpoints leading to pods on a different provider are identified using point 3.
  • They are replaced with direct addresses, derived from the remote cluster’s podCIDR (point 1) and the host part extracted from the local IP (point 2); this required the implementation of a "forced mapping".

@adamjensenbot
Copy link
Collaborator

Hi @Gabbe64. Thanks for your PR!

I am @adamjensenbot.
You can interact with me issuing a slash command in the first line of a comment.
Currently, I understand the following commands:

  • /rebase: Rebase this PR onto the master branch (You can add the option test=true to launch the tests
    when the rebase operation is completed)
  • /merge: Merge this PR into the master branch
  • /build Build Liqo components
  • /test Launch the E2E and Unit tests
  • /hold, /unhold Add/remove the hold label to prevent merging with /merge

Make sure this PR appears in the liqo changelog, adding one of the following labels:

  • feat: 🚀 New Feature
  • fix: 🐛 Bug Fix
  • refactor: 🧹 Code Refactoring
  • docs: 📝 Documentation
  • style: 💄 Code Style
  • perf: 🐎 Performance Improvement
  • test: ✅ Tests
  • chore: 🚚 Dependencies Management
  • build: 📦 Builds Management
  • ci: 👷 CI/CD
  • revert: ⏪ Reverts Previous Changes

@github-actions github-actions bot added the feat Adds a new feature to the codebase label Aug 28, 2025
@Gabbe64 Gabbe64 marked this pull request as ready for review August 29, 2025 11:47
@Gabbe64 Gabbe64 force-pushed the no-fcc-implementation branch from 07a8051 to 0c46130 Compare August 30, 2025 18:01
@github-actions github-actions bot added the refactor Reorganizes or optimizes code without changing its behavior label Aug 30, 2025
@cheina97
Copy link
Member

cheina97 commented Sep 2, 2025

/test

1 similar comment
@aleoli
Copy link
Member

aleoli commented Sep 10, 2025

/test

@frisso frisso added the release-1.1 Feature that is planned to be integrated within a given Liqo release label Sep 10, 2025
)

// MapEndpointsWithConfiguration maps the endpoints of the shadowendpointslice.
func MapEndpointsWithConfiguration(ctx context.Context, cl client.Client,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is has a parametr called clusterID whic is overrided by the GetConnectionDataByIP function. The options are 2:

  • the parameter is useless
  • the GetConnectionDataByIP is shadowing the value

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was a mistake.

The name of the parameter was changed and it stores the clusterID of the local cluster.

Comment on lines 54 to 55
if addrHasBeenRemapped {
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The break statement exits the inner address loop (for j := range endpoints[i].Addresses) after the first address is remapped. If an endpoint has multiple addresses, only the first one gets processed

Comment on lines 47 to 56
func (l *InfoList) GetConnectionDataByIP(ip string) (clusterID, originalIP string, found bool) {
for _, entry := range l.Items {
for i, remappedIP := range entry.RemappedIPs {
if ip == remappedIP {
return entry.ClusterID, entry.IPs[i], true
}
}
}
return "", "", false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assumes len(entry.IPs) == len(entry.RemappedIPs) without validation. If these slices have different lengths (which is possible via the Add() method), accessing entry.IPs[i] will cause a panic.

continue
}

objectName := endpoint.TargetRef.Name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

endpoint.TargetRef is an optional pointer (*v1.ObjectReference). According to Kubernetes API specifications, it can be nil. This code will panic if an endpoint has no TargetRef

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err := remoteConnectionsData.FromJSON([]byte(val))

if err != nil {
klog.Errorf("failed to unmarshal direct connection data for shadowendpointslice %q: %v", nsName, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not returning an error?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous implementation the error was not returned because in case of problems during the unmarshalling, the controller could just skip the remapping of the direct address and use the standard external CIDR address (of the consumer) provided by the ShadowEndpointSlice.
It thought it to be some sort of fallback mechanism.

With the new implementation, since the address in the ShadowEPS is not the one from the external CIDR, it's mandatory to return an error, because there is no "usable" endpoint in case the unmarshalling goes wrong.

Comment on lines 27 to 31
type Info struct {
ClusterID string `json:"ID"`
IPs []string `json:"IPs"`
RemappedIPs []string `json:"rIPs"`
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why you need all these info in the annotation. I think just an annotation with the clusterID should be enough.
In theory you should just reflect the endpointslie as it is on the remote cluster, then the remote cluster will reconcile the shadow_endpointslice and will use the "configuration" resource related to the clusterID contained inside the annotation, to remap the IP

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I missing something?

klog.Errorf("failed to unmarshal direct connection data for shadowendpointslice %q: %v", nsName, err)
}
// JSON is not propagated to the EndpointSlice
delete(shadowEps.Annotations, directConnectionAnnotationLabel)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove it in a more explicit way? For example create a function that "forge" the endpointslice annotations starting from the shadowendpointslice

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gabbe64
Copy link
Author

Gabbe64 commented Feb 15, 2026

Given your feedback I decided to refactor this feature. The core logic remains the same, but the exchanged data has changed slightly.

Changes:

  • EndpointSlice Reflection: When an endpoint belongs to a cluster other than the local or remote one, we no longer send the address on the consumer's external CIDR. Instead, the address is sent as is (as seen by the consumer cluster).

  • Annotations: The data payload in the annotation is now smaller since it no longer includes the external CIDR address. This required an update to the data structure used for marshalling/unmarshalling.

I also added a few utilities, including a helper for annotation forging and an optimized lookup for the data structure embedded within the annotations.

@github-actions github-actions bot added the fix Fixes a bug in the codebase. label Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat Adds a new feature to the codebase fix Fixes a bug in the codebase. refactor Reorganizes or optimizes code without changing its behavior release-1.1 Feature that is planned to be integrated within a given Liqo release size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Enhance inter-cluster networking: direct path support between 'leaf' clusters

5 participants