Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for idle_connection_timeout to elasticsearch output #36843

Merged
merged 4 commits into from
Oct 25, 2023

Conversation

leehinman
Copy link
Contributor

Proposed commit message

add support for idle_connection_timeout for ES output. This allows connections to be closed if they aren't being used.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

  1. Add idle_connection_timeout to elasticsearch output config
output.elasticsearch:
  idle_connection_timeout: 10s
  1. Start beat, and send some events to elasticsearch
  2. observer network connection with netstat -an
  3. wait timeout time, and observer that network connection is being closed

Related issues

Use cases

Screenshots

Logs

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 13, 2023
@leehinman leehinman added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Oct 13, 2023
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Oct 13, 2023
@mergify
Copy link
Contributor

mergify bot commented Oct 13, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @leehinman? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

elasticmachine commented Oct 13, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-10-25T16:45:34.408+0000

  • Duration: 101 min 39 sec

Test stats 🧪

Test Results
Failed 0
Passed 28618
Skipped 2015
Total 30633

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@leehinman leehinman marked this pull request as ready for review October 13, 2023 19:46
@leehinman leehinman requested review from a team as code owners October 13, 2023 19:46
Comment on lines 84 to 87
# The maximum amount of time an idle connection will remain idle
# before closing itself. Zero means no limit. The format is a Go
# language duration (example 60s is 60 seconds). The default is 0.
#idle_connection_timeout: 0
Copy link
Member

@cmacknz cmacknz Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the default being 0 actually the behavior from before? There are a lot of layers to follow in elastic-agent-libs, but I see when we create an HTTP round tripper we clone the default transport:

https://github.com/elastic/elastic-agent-libs/blob/9ec47704eb509f0944507fa11eb48a580711bbba/transport/httpcommon/httpcommon.go#L254-L259

func (settings *HTTPTransportSettings) httpRoundTripper(
	tls *tlscommon.TLSConfig,
	dialer, tlsDialer transport.Dialer,
	opts ...TransportOption,
) *http.Transport {
	t := http.DefaultTransport.(*http.Transport).Clone()

The default transport uses a 90 second timeout by default:

https://cs.opensource.google/go/go/+/refs/tags/go1.21.3:src/net/http/transport.go;l=38-54

// DefaultTransport is the default implementation of Transport and is
// used by DefaultClient. It establishes network connections as needed
// and caches them for reuse by subsequent calls. It uses HTTP proxies
// as directed by the environment variables HTTP_PROXY, HTTPS_PROXY
// and NO_PROXY (or the lowercase versions thereof).
var DefaultTransport RoundTripper = &Transport{
	Proxy: ProxyFromEnvironment,
	DialContext: defaultTransportDialContext(&net.Dialer{
		Timeout:   30 * time.Second,
		KeepAlive: 30 * time.Second,
	}),
	ForceAttemptHTTP2:     true,
	MaxIdleConns:          100,
	IdleConnTimeout:       90 * time.Second,
	TLSHandshakeTimeout:   10 * time.Second,
	ExpectContinueTimeout: 1 * time.Second,
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear 0 might be what this configuration was set to, but I am not sure the result IdleConnTimeout was 0, I think it was 90s because IdleConnTimeout is only written if it is not zero which means it uses the underlying Go default:

https://github.com/elastic/elastic-agent-libs/blob/9ec47704eb509f0944507fa11eb48a580711bbba/transport/httpcommon/httpcommon.go#L309-L313

func (opts WithKeepaliveSettings) applyTransport(_ *HTTPTransportSettings, t *http.Transport) {
	t.DisableKeepAlives = opts.Disable
	if opts.IdleConnTimeout != 0 {
		t.IdleConnTimeout = opts.IdleConnTimeout
	}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were both wrong. The esleg client sets it to 60s by default.

if s.IdleConnTimeout == 0 {
s.IdleConnTimeout = 1 * time.Minute
}

This also matches testing I did with Wireshark where I observed the default at 60s.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mergify
Copy link
Contributor

mergify bot commented Oct 18, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 35615_idle_timeout upstream/35615_idle_timeout
git merge upstream/main
git push upstream 35615_idle_timeout

@leehinman leehinman enabled auto-merge (squash) October 20, 2023 15:54
@rdner rdner removed their request for review October 23, 2023 12:50
@@ -81,6 +81,11 @@ output.elasticsearch:
# Elasticsearch after a network error. The default is 60s.
#backoff.max: 60s

# The maximum amount of time an idle connection will remain idle
# before closing itself. Zero means no limit. The format is a Go
Copy link
Member

@ebeahan ebeahan Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording here states zero is no limit. But in the conversation and snippet here, isn't setting idle_connection_timeout to 0 going to set the client to still use a 60s timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, should be fixed now.

Copy link
Contributor

@andrewvc andrewvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@leehinman leehinman force-pushed the 35615_idle_timeout branch 2 times, most recently from 77102a3 to 17bc768 Compare October 25, 2023 13:34
@leehinman leehinman merged commit 55df09f into elastic:main Oct 25, 2023
22 checks passed
Scholar-Li pushed a commit to Scholar-Li/beats that referenced this pull request Feb 5, 2024
…tic#36843)

* Add support for idle_connection_timeout to elasticsearch output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants