Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt compile fails against redshift when using multi-threading #2756

Closed
1 of 5 tasks
jweibel22 opened this issue Sep 15, 2020 · 3 comments · Fixed by #2766
Closed
1 of 5 tasks

dbt compile fails against redshift when using multi-threading #2756

jweibel22 opened this issue Sep 15, 2020 · 3 comments · Fixed by #2766
Labels
bug Something isn't working good_first_issue Straightforward + self-contained changes, good for new contributors! redshift
Milestone

Comments

@jweibel22
Copy link
Contributor

Describe the bug

When I run dbt compile against our redshift data warehouse the command fails with the error

KeyError: 'endpoint_resolver'

The error only occurs when threads > 1 and method: iam is used.

From what I can gather this is happening because the boto session object is not thread safe and it is being accessed from multiple threads without being protected, and this happens during the call to get_tmp_iam_cluster_credentials.

Steps To Reproduce

  • Create a dbt project containing a significant number of models.
  • Configure the target in the profiles.yml file to point to a redshift dwh, with threads > 1 and method iam:
      type: redshift
      method: iam
      threads: 8
      host: xxxx.redshift.amazonaws.com
      cluster_id: xxx
      port: 5439
      user: xxx
      dbname: xxx
      schema: xxx
  • Run dbt compile

Expected behavior

That dbt compile succeeds.

Screenshots and log output

2020-09-14 11:15:23.743840 (MainThread): Traceback (most recent call last):
  File "/Users/xxx/venv/lib/python3.7/site-packages/dbt/adapters/postgres/connections.py", line 46, in exception_handler
    yield
  File "/Users/xxx/venv/lib/python3.7/site-packages/dbt/adapters/sql/connections.py", line 76, in add_query
    cursor = connection.handle.cursor()
  File "/Users/xxx/venv/lib/python3.7/site-packages/dbt/contracts/connection.py", line 69, in handle
    self._handle.resolve(self)
  File "/Users/xxx/venv/lib/python3.7/site-packages/dbt/contracts/connection.py", line 90, in resolve
    return self.opener(connection)
  File "/Users/xxx/venv/lib/python3.7/site-packages/dbt/adapters/postgres/connections.py", line 77, in open
    credentials = cls.get_credentials(connection.credentials)
  File "/Users/xxx/venv/lib/python3.7/site-packages/dbt/adapters/redshift/connections.py", line 152, in get_credentials
    return cls.get_tmp_iam_cluster_credentials(credentials)
  File "/Users/xxx/venv/lib/python3.7/site-packages/dbt/adapters/redshift/connections.py", line 128, in get_tmp_iam_cluster_credentials
    credentials.db_groups,
  File "/Users/xxx/venv/lib/python3.7/site-packages/dbt/adapters/redshift/connections.py", line 93, in fetch_cluster_credentials
    boto_client = boto3.client('redshift')
  File "/Users/xxx/venv/lib/python3.7/site-packages/boto3/__init__.py", line 91, in client
    return _get_default_session().client(*args, **kwargs)
  File "/Users/xxx/venv/lib/python3.7/site-packages/boto3/session.py", line 263, in client
    aws_session_token=aws_session_token, config=config)
  File "/Users/xxx/venv/lib/python3.7/site-packages/botocore/session.py", line 828, in create_client
    endpoint_resolver = self._get_internal_component('endpoint_resolver')
  File "/Users/xxx/venv/lib/python3.7/site-packages/botocore/session.py", line 695, in _get_internal_component
    return self._internal_components.get_component(name)
  File "/Users/xxx/venv/lib/python3.7/site-packages/botocore/session.py", line 907, in get_component
    del self._deferred[name]
KeyError: 'endpoint_resolver'

Sometimes the error returned is

KeyError: 'credential_provider'

but the stack trace is identical.

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.17.2
   latest version: 0.18.0

Your version of dbt is out of date! You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation

Plugins:
  - bigquery: 0.17.2
  - snowflake: 0.17.2
  - redshift: 0.17.2
  - postgres: 0.17.2

The operating system you're using:
macOS Catalina

The output of python --version:
Python 3.7.3

Additional context

The error surfaced after I bumped dbt from version 0.14.2 to 0.17.2

@jweibel22 jweibel22 added bug Something isn't working triage labels Sep 15, 2020
@jtcohen6
Copy link
Contributor

Thanks for the detailed report @jweibel22. What you're saying seems in line with the boto3 docs. If I understand it right, this indicates that each dbt thread (open) should instantiate its own boto3 session, rather than instantiating the boto3 client once to get a set of temporary credentials.

The error surfaced after I bumped dbt from version 0.14.2 to 0.17.2

Confirming that you were able to connect to Redshift via IAM with multiple threads when running with v0.14.2?

While this is tricky stuff, I think the change here should be fairly self-contained. Is this a fix you'd be interested in contributing?

@jtcohen6 jtcohen6 added good_first_issue Straightforward + self-contained changes, good for new contributors! redshift and removed triage labels Sep 15, 2020
@jweibel22
Copy link
Contributor Author

Yes I can confirm that it was indeed working with version 0.14.2.

Sure, I'll give it a shot and create a PR. Thank you so far :-)

@jtcohen6
Copy link
Contributor

Closed by #2766

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good_first_issue Straightforward + self-contained changes, good for new contributors! redshift
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants