Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random timeouts in api calls to bigquery.googleapis.com #40

Closed
peku33 opened this issue Feb 19, 2020 · 2 comments · Fixed by #43
Closed

Random timeouts in api calls to bigquery.googleapis.com #40

peku33 opened this issue Feb 19, 2020 · 2 comments · Fixed by #43
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@peku33
Copy link

peku33 commented Feb 19, 2020

After updating google-cloud-bigquery from version 1.19.0 to 1.24.0,
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='bigquery.googleapis.com', port=443): Read timed out. (read timeout=11.0) started to pop up randomly.

This is likely related to #34

Environment details

  • OS type and version: Ubuntu 19.10 x64
  • Python version: 3.7.5
  • pip version: pip 20.0.2
  • google-cloud-bigquery version: 1.24.0

Steps to reproduce

The error is not deterministic, however we observed it on production environments (compute instances in GCP) and local developement environments.

For about 100 queries, at least 1-2 fails with this error (which makes it reproducible). This makes something like 1-2% of ALL requests to fail!

If I understand correctly, BQ API endpoint responsible for result() method will block for at most 10 seconds. There is also 1 second margin to neutralize network lags etc. No retry mechanism covering the timeout is present, so in case of delay of more than 1 second, the whole request will fail.

In my opinion, having 1.0s non-configurable timeout is not safe. Also not-mutating endpoints (like "is job finished") should automatically retry in case of timeout. This is not implemented at the moment.

We had to rollback to 1.19.0 to make everything stable again.

Code example

Nothing really helpful could be placed here.

The easiest way to reproduce this error is to run query that takes MORE than 10s in 100x loop.

Stack trace

The stack trace is always the same:

Traceback (most recent call last):
  File "<proj>/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "<proj>/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 416, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.7/http/client.py", line 1344, in getresponse
    response.begin()
  File "/usr/lib/python3.7/http/client.py", line 306, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.7/http/client.py", line 267, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.7/ssl.py", line 1071, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.7/ssl.py", line 929, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<proj>/venv/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "<proj>/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "<proj>/venv/lib/python3.7/site-packages/urllib3/util/retry.py", line 400, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "<proj>/venv/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "<proj>/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "<proj>/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 423, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "<proj>/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 331, in _raise_timeout
    self, url, "Read timed out. (read timeout=%s)" % timeout_value
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='bigquery.googleapis.com', port=443): Read timed out. (read timeout=11.0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<proj>/venv/lib/python3.7/site-packages/google/cloud/bigquery/job.py", line 3196, in result
    super(QueryJob, self).result(retry=retry, timeout=timeout)
  File "<proj>/venv/lib/python3.7/site-packages/google/cloud/bigquery/job.py", line 818, in result
    return super(_AsyncJob, self).result(timeout=timeout)
  File "<proj>/venv/lib/python3.7/site-packages/google/api_core/future/polling.py", line 122, in result
    self._blocking_poll(timeout=timeout)
  File "<proj>/venv/lib/python3.7/site-packages/google/cloud/bigquery/job.py", line 3098, in _blocking_poll
    super(QueryJob, self)._blocking_poll(timeout=timeout)
  File "<proj>/venv/lib/python3.7/site-packages/google/api_core/future/polling.py", line 101, in _blocking_poll
    retry_(self._done_or_raise)()
  File "<proj>/venv/lib/python3.7/site-packages/google/api_core/retry.py", line 289, in retry_wrapped_func
    return retry_wrapped_func
  File "<proj>/venv/lib/python3.7/site-packages/google/api_core/retry.py", line 184, in retry_target
    return target()
  File "<proj>/venv/lib/python3.7/site-packages/google/api_core/future/polling.py", line 80, in _done_or_raise
    if not self.done():
  File "<proj>/venv/lib/python3.7/site-packages/google/cloud/bigquery/job.py", line 3085, in done
    timeout=timeout,
  File "<proj>/venv/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 1287, in _get_query_results
    retry, method="GET", path=path, query_params=extra_params, timeout=timeout
  File "<proj>/venv/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 556, in _call_api
    return call()
  File "<proj>/venv/lib/python3.7/site-packages/google/api_core/retry.py", line 289, in retry_wrapped_func
    return retry_wrapped_func
  File "<proj>/venv/lib/python3.7/site-packages/google/api_core/retry.py", line 184, in retry_target
    return target()
  File "<proj>/venv/lib/python3.7/site-packages/google/cloud/_http.py", line 419, in api_request
    timeout=timeout,
  File "<proj>/venv/lib/python3.7/site-packages/google/cloud/_http.py", line 277, in _make_request
    method, url, headers, data, target_object, timeout=timeout
  File "<proj>/venv/lib/python3.7/site-packages/google/cloud/_http.py", line 315, in _do_request
    url=url, method=method, headers=headers, data=data, timeout=timeout
  File "<proj>/venv/lib/python3.7/site-packages/google/auth/transport/requests.py", line 317, in request
    **kwargs
  File "<proj>/venv/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "<proj>/venv/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "<proj>/venv/lib/python3.7/site-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='bigquery.googleapis.com', port=443): Read timed out. (read timeout=11.0)
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Feb 19, 2020
@peku33 peku33 changed the title Spurious timeouts in api calls to bigquery.googleapis.com Random timeouts in api calls to bigquery.googleapis.com Feb 19, 2020
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Feb 20, 2020
@plamut
Copy link
Contributor

plamut commented Feb 21, 2020

In the past versions there was a problem with requests sometimes stalling indefinitely, because by default no timeouts were used for the transport layer. After introducing the default timeouts, these sometimes kick in too aggressively, because a transport timeout is not independent from the timeoutMs that is used when polling the server whether a query job has completed.

The latter timeout has a maximum of 10 seconds, and if the job is not done by, an internal _OperationNotComplete() error is raised, which is automatically retried until the job is done (or the termination condition is fulfilled).

Now, using that same timeout as the timeout for the request itself (at the transport layer) does not always work that well, and these two should probably be separated. However, there still needs be some transport timeout (in order to not block indefinitely).

I'll probably have time too take a closer look at this next week.

@plamut plamut self-assigned this Feb 24, 2020
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Feb 24, 2020
@plamut plamut added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Feb 25, 2020
@plamut plamut closed this as completed in #43 Mar 9, 2020
@shegokarm
Copy link

Hi, I couldn't get the solution but I'm facing the same error. Could anyone provide the solution for this error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants