Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a TIMEOUT environment variable for es rollover #2938

Merged

Conversation

ediezh
Copy link
Contributor

@ediezh ediezh commented Apr 15, 2021

Signed-off-by: Edie Zhang edie.zhang@o8t.com

Which problem is this PR solving?

  • es rollover constantly getting timeout
Rollover jaeger-span-write, based on conditions {'max_age': '1d', 'max_docs': '10000000'}
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 384, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 380, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.9/http/client.py", line 1347, in getresponse
    response.begin()
  File "/usr/local/lib/python3.9/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.9/http/client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/elasticsearch/connection/http_urllib3.py", line 245, in perform_request
    response = self.pool.urlopen(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 637, in urlopen
    retries = retries.increment(method, url, error=e, _pool=self,
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/retry.py", line 344, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.9/site-packages/urllib3/packages/six.py", line 686, in reraise
    raise value
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 597, in urlopen
    httplib_response = self._make_request(conn, method, url,
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 306, in _raise_timeout
    raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='elasticsearch-master', port=9200): Read timed out. (read timeout=10)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/curator/actions.py", line 1088, in do_action
    self.log_result(self.doit())
  File "/usr/local/lib/python3.9/site-packages/curator/actions.py", line 1067, in doit
    return self.client.indices.rollover(
  File "/usr/local/lib/python3.9/site-packages/elasticsearch/client/utils.py", line 152, in _wrapped
    return func(*args, params=params, headers=headers, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/elasticsearch/client/indices.py", line 1223, in rollover
    return self.transport.perform_request(
  File "/usr/local/lib/python3.9/site-packages/elasticsearch/transport.py", line 392, in perform_request
    raise e
  File "/usr/local/lib/python3.9/site-packages/elasticsearch/transport.py", line 358, in perform_request
    status, headers_response, data = connection.perform_request(
  File "/usr/local/lib/python3.9/site-packages/elasticsearch/connection/http_urllib3.py", line 257, in perform_request
    raise ConnectionTimeout("TIMEOUT", str(e), e)
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='elasticsearch-master', port=9200): Read timed out. (read timeout=10))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/es-rollover/esRollover.py", line 232, in <module>
    main()
  File "/es-rollover/esRollover.py", line 64, in main
    perform_action(action, client, write_alias, read_alias, prefix+'jaeger-span', 'jaeger-span')
  File "/es-rollover/esRollover.py", line 89, in perform_action
    rollover(client, write_alias, read_alias, cond)
  File "/es-rollover/esRollover.py", line 146, in rollover
    roll.do_action()
  File "/usr/local/lib/python3.9/site-packages/curator/actions.py", line 1090, in do_action
    utils.report_failure(e)
  File "/usr/local/lib/python3.9/site-packages/curator/utils.py", line 175, in report_failure
    raise exceptions.FailedExecution(
curator.exceptions.FailedExecution: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='elasticsearch-master', port=9200): Read timed out. (read timeout=10))

The default 10s client timeout is too short.

Short description of the changes

  • Added a TIMEOUT environment variable. Default to 120s.

Signed-off-by: Edie Zhang <edie.zhang@o8t.com>
@ediezh ediezh requested a review from a team as a code owner April 15, 2021 05:51
@ediezh ediezh requested a review from vprithvi April 15, 2021 05:51
@codecov
Copy link

codecov bot commented Apr 15, 2021

Codecov Report

Merging #2938 (866befb) into master (ae47c0e) will increase coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2938      +/-   ##
==========================================
+ Coverage   95.96%   95.98%   +0.02%     
==========================================
  Files         224      224              
  Lines        9731     9731              
==========================================
+ Hits         9338     9340       +2     
+ Misses        324      323       -1     
+ Partials       69       68       -1     
Impacted Files Coverage Δ
cmd/query/app/static_handler.go 96.77% <0.00%> (+1.61%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae47c0e...866befb. Read the comment docs.

Signed-off-by: Edie Zhang <edie.zhang@o8t.com>
albertteoh
albertteoh previously approved these changes Apr 19, 2021
Copy link
Contributor

@albertteoh albertteoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -13,7 +13,7 @@ def main():
print('USAGE: [INDEX_PREFIX=(default "")] [ARCHIVE=(default false)] ... {} NUM_OF_DAYS http://HOSTNAME[:PORT]'.format(sys.argv[0]))
print('NUM_OF_DAYS ... delete indices that are older than the given number of days.')
print('HOSTNAME ... specifies which Elasticsearch hosts URL to search and delete indices from.')
print('TIMEOUT ... number of seconds to wait for master node response.'.format(TIMEOUT))
print('TIMEOUT ... number of seconds to wait for master node response(default {}).'.format(TIMEOUT))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print('TIMEOUT ... number of seconds to wait for master node response(default {}).'.format(TIMEOUT))
print('TIMEOUT ... number of seconds to wait for master node response (default {}).'.format(TIMEOUT))

@@ -53,11 +54,14 @@ def main():
'\tUNIT ... used with lookback to remove indices from read alias e.g. ..., days, weeks, months, years (default {}).'.format(
UNIT))
print('\tUNIT_COUNT ... count of UNITs (default {}).'.format(UNIT_COUNT))
print('TIMEOUT ... number of seconds to wait for master node response(default {}).'.format(TIMEOUT))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print('TIMEOUT ... number of seconds to wait for master node response(default {}).'.format(TIMEOUT))
print('TIMEOUT ... number of seconds to wait for master node response (default {}).'.format(TIMEOUT))

@yurishkuro
Copy link
Member

@albertteoh sometimes it's easier to make the minor tweaks ourselves and push to the author's branch (unless they explicitly disabled that permission). It looks like this is ready to merge otherwise but the author didn't come back in a month.

Signed-off-by: albertteoh <albert.teoh@logz.io>
@yurishkuro yurishkuro enabled auto-merge (squash) May 16, 2021 15:29
@yurishkuro yurishkuro closed this May 16, 2021
auto-merge was automatically disabled May 16, 2021 17:49

Pull request was closed

@yurishkuro yurishkuro reopened this May 16, 2021
@yurishkuro yurishkuro merged commit 7636530 into jaegertracing:master May 16, 2021
@jpkrohling jpkrohling added this to the Release 1.23.0 milestone Jun 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants