Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid panic applying TProxy Envoy extensions #17537

Merged
merged 1 commit into from
Jun 1, 2023

Conversation

zalimeni
Copy link
Member

@zalimeni zalimeni commented May 31, 2023

When UpstreamEnvoyExtender was introduced, some code was left duplicated between it and BasicEnvoyExtender. One path in that code panics when a TProxy listener patch is attempted due to no upstream data in RuntimeConfig matching the local service (which would only happen in rare cases).

Instead, we can remove the special handling of upstream VIPs from BasicEnvoyExtender entirely, greatly simplifying the listener filter patch code and avoiding the panic. UpstreamEnvoyExtender, which needs this code to function, is modified to ensure a panic does not occur.

This also fixes a second regression in which the Lua extension was not applied to TProxy outbound listeners.

Description

This fixes two regressions caused by #17415.

Testing & Reproduction steps

I've hand-tested this fix in addition to adding tests to guard against the regressions.

PR Checklist

  • updated test coverage
  • external facing docs updated - N/A
  • appropriate backport labels added
  • not a security concern

@github-actions github-actions bot added the theme/envoy/xds Related to Envoy support label May 31, 2023
@zalimeni zalimeni added pr/no-changelog PR does not need a corresponding .changelog entry pr/no-metrics-test pr/no-docs PR does not include docs and should not trigger reminder for cherrypicking them. backport/1.15 This release series is no longer active on CE. Use backport/ent/1.15. labels May 31, 2023
@zalimeni zalimeni force-pushed the zalimeni/net-3900-fix-tproxy-extension-panic branch from f8b8034 to 8c30455 Compare May 31, 2023 23:45
Comment on lines 289 to 328
switch config.Kind {
case api.ServiceKindTerminatingGateway:
return b.patchTerminatingGatewayListenerFilterChains(config, l, nameOrSNI)
case api.ServiceKindConnectProxy:
return b.patchConnectProxyListenerFilterChains(config, l, nameOrSNI)
}
return l, nil
}

func (b *BasicEnvoyExtender) patchTerminatingGatewayListenerFilterChains(config *RuntimeConfig, l *envoy_listener_v3.Listener, nameOrSNI string) (*envoy_listener_v3.Listener, error) {
var resultErr error
for idx, filterChain := range l.FilterChains {
if patchedFilterChain, err := b.patchFilterChain(config, filterChain, l); err == nil {
l.FilterChains[idx] = patchedFilterChain
} else {
resultErr = multierror.Append(resultErr, fmt.Errorf("error patching filter chain of terminating gateway listener %q: %w", nameOrSNI, err))
}
}

return l, resultErr
}

func (b *BasicEnvoyExtender) patchConnectProxyListenerFilterChains(config *RuntimeConfig, l *envoy_listener_v3.Listener, nameOrSNI string) (*envoy_listener_v3.Listener, error) {
if IsOutboundTProxyListener(l) {
patchedListener, err := b.patchTProxyListenerFilterChains(config, l)
if err == nil {
return patchedListener, nil
} else {
return l, fmt.Errorf("error patching filter chain of TProxy listener %q: %w", nameOrSNI, err)
}
} else {

patchedListener, err := b.patchNonTProxyConnectProxyListenerFilterChains(config, l)
if err == nil {
return patchedListener, nil
} else {
return l, fmt.Errorf("error patching filter chain of connect proxy listener %q: %w", nameOrSNI, err)
}
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the filter chain patching code turned out to be identical except for the removed check for VIPs below (which is not necessary for the BasicEnvoyExtender, because UpstreamEnvoyExtender now handles extensions - Lambda and Validate - configured via an upstream of the local proxy).

func (b *BasicEnvoyExtender) patchTProxyListenerFilterChains(config *RuntimeConfig, l *envoy_listener_v3.Listener) (*envoy_listener_v3.Listener, error) {
var resultErr error

vip := config.Upstreams[config.ServiceName].VIP
Copy link
Member Author

@zalimeni zalimeni May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was the source of the panic. We wouldn't expect config.Upstreams[config.ServiceName] to exist except if the local service has itself configured as an upstream; regardless, it's not necessary here.

Removing this check fixes the second regression in which TProxy filters were not patched by BasicEnvoyExtender (Lua).

Comment on lines +217 to +221
upstream := config.Upstreams[config.ServiceName]
if upstream == nil {
return l, false, nil
}
vip := upstream.VIP
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guards against the same bug without removing the VIP check, which is necessary in this case.

@@ -728,7 +728,7 @@ func TestConfigSnapshotTransparentProxyDestination(t testing.T) *ConfigSnapshot
})
}

func TestConfigSnapshotTransparentProxyDestinationHTTP(t testing.T) *ConfigSnapshot {
func TestConfigSnapshotTransparentProxyDestinationHTTP(t testing.T, nsFn func(ns *structs.NodeService)) *ConfigSnapshot {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows us to configure the Lua extension for the local service, which will target TProxy upstreams.

Comment on lines +58 to +62
"name": "envoy.filters.http.lua",
"typedConfig": {
"@type": "type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua",
"inlineCode": "\nfunction envoy_on_request(request_handle)\n request_handle:headers():add(\"test\", \"test\")\nend"
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patched HTTP upstream 1/2

Comment on lines +97 to +101
"name": "envoy.filters.http.lua",
"typedConfig": {
"@type": "type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua",
"inlineCode": "\nfunction envoy_on_request(request_handle)\n request_handle:headers():add(\"test\", \"test\")\nend"
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patched HTTP upstream 2/2

Copy link
Member

@hashi-derek hashi-derek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me if you're certain that the removed code in the basic envoy extender wasn't needed for anything.

Copy link
Contributor

@cthain cthain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. I'm super glad that you caught this.

When UpstreamEnvoyExtender was introduced, some code was left duplicated
between it and BasicEnvoyExtender. One path in that code panics when a
TProxy listener patch is attempted due to no upstream data in
RuntimeConfig matching the local service (which would only happen in
rare cases).

Instead, we can remove the special handling of upstream VIPs from
BasicEnvoyExtender entirely, greatly simplifying the listener filter
patch code and avoiding the panic. UpstreamEnvoyExtender, which needs
this code to function, is modified to ensure a panic does not occur.

This also fixes a second regression in which the Lua extension was not
applied to TProxy outbound listeners.
@zalimeni zalimeni force-pushed the zalimeni/net-3900-fix-tproxy-extension-panic branch from 8c30455 to c66703f Compare June 1, 2023 15:59
Comment on lines -288 to +291
func (b *BasicEnvoyExtender) patchListenerFilterChains(config *RuntimeConfig, l *envoy_listener_v3.Listener, nameOrSNI string) (*envoy_listener_v3.Listener, error) {
func (b *BasicEnvoyExtender) patchSupportedListenerFilterChains(config *RuntimeConfig, l *envoy_listener_v3.Listener, nameOrSNI string) (*envoy_listener_v3.Listener, error) {
switch config.Kind {
case api.ServiceKindTerminatingGateway:
return b.patchTerminatingGatewayListenerFilterChains(config, l, nameOrSNI)
case api.ServiceKindConnectProxy:
return b.patchConnectProxyListenerFilterChains(config, l, nameOrSNI)
case api.ServiceKindTerminatingGateway, api.ServiceKindConnectProxy:
return b.patchListenerFilterChains(config, l, nameOrSNI)
Copy link
Member Author

@zalimeni zalimeni Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After double checking the original change that introduced this switch and the matching one above, I restored it to align with that and reduce changes here (I'd originally thought it was more copy-pasta). This could prevent a future bug if we were to support more proxy types for extensions in general without aligning that support with listeners specifically.

No other changes since the last review. I've also re-tested by hand w/ this change.

@zalimeni zalimeni merged commit ad03a5d into main Jun 1, 2023
@zalimeni zalimeni deleted the zalimeni/net-3900-fix-tproxy-extension-panic branch June 1, 2023 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.15 This release series is no longer active on CE. Use backport/ent/1.15. pr/no-changelog PR does not need a corresponding .changelog entry pr/no-docs PR does not include docs and should not trigger reminder for cherrypicking them. pr/no-metrics-test theme/envoy/xds Related to Envoy support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants