Consider combining the xds-translator and the xds-server runners into 1 runner #3980

arkodg · 2024-07-31T20:24:36Z

Description:

Describe the issue.

The xds-translator runners subscribes to Xds IR and translates it to Xds https://github.com/envoyproxy/gateway/blob/main/internal/xds/translator/runner/runner.go
The xds-server runner subscribes to Xds and services Xds requests from Envoy clients https://github.com/envoyproxy/gateway/blob/main/internal/xds/server/runner/runner.go

these two runners can be combined into 1 to reduce the extra layer of Xds memory saved in watchable

gateway/internal/message/types.go

Line 126 in 847d554

// Xds message

[optional Relevant Links:]

Any extra documentation required to understand the issue.

arkodg · 2024-07-31T20:26:03Z

wdyt @envoyproxy/gateway-maintainers ?

zhaohuabing · 2024-08-01T13:26:15Z

@arkodg the intention is to save the memory for the xds as we don't have to publish and subscript to the generated xds? Do you have an estimate of how much memory we can save by merging these two runners? I suggest finding the memory bottle first before making an optimizing decision.

arkodg · 2024-08-01T15:57:18Z

yes, the goal is to eliminate the Xds watchable layer. the optimization will be visible in the benchmarking report, the effort here is minimal

qicz · 2024-09-08T13:36:19Z

IMO needs to split the resources too. some resources for subscription like gateway API resources and EG APIs, but others are just kept in memory. like follows:

for subscription:

type WatchableResources struct {
	// This field is only used for marshalling/unmarshalling purposes and is not used by
	// the translator

	// EnvoyProxyForGatewayClass holds EnvoyProxy attached to GatewayClass
	EnvoyProxyForGatewayClass *egv1a1.EnvoyProxy `json:"envoyProxyForGatewayClass,omitempty" yaml:"envoyProxyForGatewayClass,omitempty"`
	// EnvoyProxiesForGateways holds EnvoyProxiesForGateways attached to Gateways
	EnvoyProxiesForGateways []*egv1a1.EnvoyProxy `json:"envoyProxiesForGateways,omitempty" yaml:"envoyProxiesForGateways,omitempty"`

	GatewayClass            *gwapiv1.GatewayClass          `json:"gatewayClass,omitempty" yaml:"gatewayClass,omitempty"`
	Gateways                []*gwapiv1.Gateway             `json:"gateways,omitempty" yaml:"gateways,omitempty"`
	HTTPRoutes              []*gwapiv1.HTTPRoute           `json:"httpRoutes,omitempty" yaml:"httpRoutes,omitempty"`
	GRPCRoutes              []*gwapiv1.GRPCRoute           `json:"grpcRoutes,omitempty" yaml:"grpcRoutes,omitempty"`
	TLSRoutes               []*gwapiv1a2.TLSRoute          `json:"tlsRoutes,omitempty" yaml:"tlsRoutes,omitempty"`
	TCPRoutes               []*gwapiv1a2.TCPRoute          `json:"tcpRoutes,omitempty" yaml:"tcpRoutes,omitempty"`
	UDPRoutes               []*gwapiv1a2.UDPRoute          `json:"udpRoutes,omitempty" yaml:"udpRoutes,omitempty"`
	ReferenceGrants         []*gwapiv1b1.ReferenceGrant    `json:"referenceGrants,omitempty" yaml:"referenceGrants,omitempty"`
	
	ExtensionRefFilters     []unstructured.Unstructured    `json:"extensionRefFilters,omitempty" yaml:"extensionRefFilters,omitempty"`
	EnvoyPatchPolicies      []*egv1a1.EnvoyPatchPolicy     `json:"envoyPatchPolicies,omitempty" yaml:"envoyPatchPolicies,omitempty"`
	ClientTrafficPolicies   []*egv1a1.ClientTrafficPolicy  `json:"clientTrafficPolicies,omitempty" yaml:"clientTrafficPolicies,omitempty"`
	BackendTrafficPolicies  []*egv1a1.BackendTrafficPolicy `json:"backendTrafficPolicies,omitempty" yaml:"backendTrafficPolicies,omitempty"`
	SecurityPolicies        []*egv1a1.SecurityPolicy       `json:"securityPolicies,omitempty" yaml:"securityPolicies,omitempty"`
	BackendTLSPolicies      []*gwapiv1a3.BackendTLSPolicy  `json:"backendTLSPolicies,omitempty" yaml:"backendTLSPolicies,omitempty"`
	EnvoyExtensionPolicies  []*egv1a1.EnvoyExtensionPolicy `json:"envoyExtensionPolicies,omitempty" yaml:"envoyExtensionPolicies,omitempty"`
	ExtensionServerPolicies []unstructured.Unstructured    `json:"extensionServerPolicies,omitempty" yaml:"extensionServerPolicies,omitempty"`
	Backends                []*egv1a1.Backend              `json:"backends,omitempty" yaml:"backends,omitempty"`
}

just in memory:(the Secret, EndpointSlice that too large bad for watable compare)

type InMemoryResources struct {
	
	Namespaces              []*corev1.Namespace            `json:"namespaces,omitempty" yaml:"namespaces,omitempty"`
	Services                []*corev1.Service              `json:"services,omitempty" yaml:"services,omitempty"`
	ServiceImports          []*mcsapiv1a1.ServiceImport    `json:"serviceImports,omitempty" yaml:"serviceImports,omitempty"`
	EndpointSlices          []*discoveryv1.EndpointSlice   `json:"endpointSlices,omitempty" yaml:"endpointSlices,omitempty"`
	Secrets                 []*corev1.Secret               `json:"secrets,omitempty" yaml:"secrets,omitempty"`
	ConfigMaps              []*corev1.ConfigMap            `json:"configMaps,omitempty" yaml:"configMaps,omitempty"`
}

cc @arkodg

shawnh2 · 2024-09-12T14:10:13Z

Like @qicz 's idea.

One possible way to reduce memory is by triming unused field for these resources / CRD. And this achievable via the Transform func defined in client-go,

https://github.com/kubernetes/client-go/blob/v0.31.1/informers/factory.go#L104

for controller-runtime is in

https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/cache/cache.go#L207

arkodg · 2024-09-12T16:00:30Z

@shawnh2 @qicz can we use a separate GH issue to discuss this additional design that could increase memory savings ?

shawnh2 · 2024-09-17T05:13:22Z

+1 on this!

From heap flame graph in #3698, combining the xds-translator and the xds-server runners into 1 runner will prevent lots of deepcopy for message.Xds, which will result in saving ~30% memory.

* The combined runner is called `xds` * This should eliminate one layer of caching resulting in mem savings Fixes: envoyproxy#3980 Signed-off-by: Arko Dasgupta <arko@tetrate.io>

arkodg added the triage label Jul 31, 2024

arkodg added help wanted Extra attention is needed and removed triage labels Aug 1, 2024

arkodg added this to the v1.2.0-rc1 milestone Aug 1, 2024

arkodg mentioned this issue Aug 12, 2024

Why the memory is not increase linearly when the HTTPRoutes scales up? #3698

Closed

arkodg mentioned this issue Sep 24, 2024

feat: combine xds-translator and xds-server runners #4326

Closed

arkodg modified the milestones: v1.2.0-rc1, Backlog Oct 15, 2024

arkodg self-assigned this Oct 15, 2024

qicz mentioned this issue Oct 20, 2024

bugfix: filter repeated resources and optimize memory usage. #4454

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider combining the xds-translator and the xds-server runners into 1 runner #3980

Consider combining the xds-translator and the xds-server runners into 1 runner #3980

arkodg commented Jul 31, 2024

arkodg commented Jul 31, 2024

zhaohuabing commented Aug 1, 2024 •

edited

Loading

arkodg commented Aug 1, 2024

qicz commented Sep 8, 2024 •

edited

Loading

shawnh2 commented Sep 12, 2024

arkodg commented Sep 12, 2024

shawnh2 commented Sep 17, 2024

Consider combining the xds-translator and the xds-server runners into 1 runner #3980

Consider combining the xds-translator and the xds-server runners into 1 runner #3980

Comments

arkodg commented Jul 31, 2024

arkodg commented Jul 31, 2024

zhaohuabing commented Aug 1, 2024 • edited Loading

arkodg commented Aug 1, 2024

qicz commented Sep 8, 2024 • edited Loading

shawnh2 commented Sep 12, 2024

arkodg commented Sep 12, 2024

shawnh2 commented Sep 17, 2024

zhaohuabing commented Aug 1, 2024 •

edited

Loading

qicz commented Sep 8, 2024 •

edited

Loading