Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caching athena client in partition refresh function to help with throttling #815

Merged
merged 1 commit into from
Sep 14, 2018

Conversation

ryandeivert
Copy link
Contributor

to: @chunyong-lin
cc: @airbnb/streamalert-maintainers
size: small

Background

Currently, the Athena refresh function can be throttled if too many queries happen in a short time. Every time the athena refresh function is invoked, it first checks if the database exists (using 1 query) and then will run another query to perform actions. We can limit our number of queries by caching the athena client, effectively cutting our # of queries in half (most of the time).

Example of error:

Athena query failed:
An error occurred (ThrottlingException) when calling the StartQueryExecution operation (reached max retries: 4): Rate exceeded: AthenaQueryExecutionError
Traceback (most recent call last):
File "/var/task/stream_alert/athena_partition_refresh/main.py", line 212, in handler
AthenaRefresher().run(event)
File "/var/task/stream_alert/athena_partition_refresh/main.py", line 172, in run
if not self._athena_client.check_database_exists():
File "/var/task/stream_alert/shared/athena.py", line 296, in check_database_exists
response = self.run_query_for_results('SHOW DATABASES LIKE \'
{}
\';'.format(self.database))
File "/var/task/stream_alert/shared/athena.py", line 281, in run_query_for_results
execution_id = self._execute_and_wait(query)
File "/var/task/stream_alert/shared/athena.py", line 96, in _execute_and_wait
response = self._execute_query(query)
File "/var/task/stream_alert/shared/athena.py", line 127, in _execute_query
raise AthenaQueryExecutionError('Athena query failed:\n
{}
'.format(err))
AthenaQueryExecutionError: Athena query failed:
An error occurred (ThrottlingException) when calling the StartQueryExecution operation (reached max retries: 4): Rate exceeded

Changes

  • Caching the athena client, and only checking if the database exists upon instance/client creation.
  • This could potentially result in an error if someone deletes the backing database while this function is executing, but this error would not be any different than any error that would already occur if someone performed this action.

@ryandeivert ryandeivert changed the title caching athena client caching athena client in partition refresh function to help with throttling Sep 14, 2018
@ryandeivert ryandeivert added this to the 2.0.0 milestone Sep 14, 2018
@coveralls
Copy link

Coverage Status

Coverage increased (+0.004%) to 96.874% when pulling 6f28d4e on ryandeivert-athena-improvement into 9e86f70 on master.

@ryandeivert ryandeivert merged commit 777f423 into master Sep 14, 2018
@ryandeivert ryandeivert deleted the ryandeivert-athena-improvement branch September 14, 2018 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants