Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Implement EXTERNAL_ACCESS_INTEGRATIONS and secrets in submit_python_job #925

Closed
3 tasks done
Lindblomsebastian opened this issue Mar 8, 2024 · 5 comments · Fixed by #955
Closed
3 tasks done
Labels

Comments

@Lindblomsebastian
Copy link
Contributor

Lindblomsebastian commented Mar 8, 2024

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-snowflake functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Propagate external_access_integration and secrets from the dbt config into the procedure being created when submitting a python model.

Describe alternatives you've considered

Extracting external_access_integration and secrets from the dbt config and propagate it to the common_procedure_call in

common_procedure_code = f"""
IF they are passed in the config.

Who will this benefit?

This will solve #925

Anyone that wants to use Snowflake’s external access together with secrets to query external APIs within dbt Python models.

Examples

Are you interested in contributing this feature?

Yes

Anything else?

See current vs. expected behavior below.

Current Behavior

EXTERNAL_ACCESS_INTEGRATION and SECRETS not being propagated into the temporary procedure when running a dbt python model.

With a very simple python model like:

import pandas
import snowflake.snowpark as snowpark


def model(dbt, session: snowpark.Session):
    dbt.config(
        materialized="table",
        external_access_integration=["TEST_EXTERNAL_ACCESS_INTEGRATION"],
        secrets={"secret_key_1": "secret_name_1"}
        packages=["httpx==0.26.0"],
        python_version="3.11"
    )
    import httpx
    return session.create_dataframe(
        pandas.DataFrame(
            [{"result": httpx.get(url="https://www.google.com").stats_code}]
        )
    )

with model.yml like:

version: 2

models:
    - name: dummy_python_model
      config:
        packages:
          - snowflake-snowpark-python
        external_access_integration:
          - "TEST_EXTERNAL_ACCESS_INTEGRATION"
        secrets:
          - secret_key_1: secret_name_1

DBT generates this from this model:

WITH dummy_python_model__dbt_sp AS PROCEDURE ()

RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = '3.11'
PACKAGES = ('httpx==0.26.0', 'snowflake-snowpark-python')
HANDLER = 'main'
EXECUTE AS CALLER
AS
$$

Expected Behavior

I would expect the generated procedure to look like:

WITH dummy_python_model__dbt_sp AS PROCEDURE ()

RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = '3.11'
EXTERNAL_ACCESS_INTEGRATION=(TEST_EXTERNAL_ACCESS_INTEGRATION)
SECRETS = ('secret_key_1' = secret_name_1)
PACKAGES = ('httpx==0.26.0', 'snowflake-snowpark-python')
HANDLER = 'main'
EXECUTE AS CALLER
AS
$$

containing the EXTERNAL_ACCESS_INTEGRATION=(TEST_EXTERNAL_ACCESS_INTEGRATION) and SECRETS = ('secret_key_1' = secret_name_1 parameter so that the function returns "200".

Steps To Reproduce

  1. Create network rule:
    CREATE OR REPLACE NETWORK RULE TEST_NETWORK_RULE
              TYPE = HOST_PORT
              MODE = EGRESS
              VALUE_LIST = ('www.google.com:443);
  1. Create the secret:
CREATE OR REPLACE SECRET SECRET_KEY_1
TYPE = GENERIC_STRING
SECRET_STRING='MY-KEY'; 
  1. Create the external access integration
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION TEST_INTEGRATION
ALLOWED_NETWORK_RULES = (TEST_NETWORK_RULE)
ALLOWED_AUTHENTICATION_SECRETS = (SECRET_KEY_1)
ENABLED=TRUE;

defined model.yml

version: 2

models:
    - name: dummy_python_model
      config:
        packages:
          - snowflake-snowpark-python
        external_access_integration:
          - "TEST_INTEGRATION"
         secrets:
           - secret_key_1: secret_name_1          

Run the DBT python model:

import pandas
import snowflake.snowpark as snowpark


def model(dbt, session: snowpark.Session):
    dbt.config(
        materialized="table",
        external_access_integration=["TEST_INTEGRATION"],
        secrets={"secret_key_1": "secret_name_1"}
        packages=["httpx==0.26.0"],
        python_version="3.11"
    )
    import httpx
    return session.create_dataframe(
        pandas.DataFrame(
            [{"result": httpx.get(url="https://www.google.com").stats_code}]
        )
    )

Relevant log output

Output from running the model:

15:12:38  1 of 1 ERROR creating python table model sebastian.dum_python_model ............ [ERROR in 9.13s]
15:12:38  
15:12:38  Finished running 1 table model, 1 hook in 0 hours 0 minutes and 16.14 seconds (16.14s).
15:12:39  
15:12:39  Completed with 1 error and 0 warnings:
15:12:39  
15:12:39    Database Error in model dum_python_model (models/dum_python_model.py)
  100357 (P0000): Python Interpreter Error:
  Traceback (most recent call last):
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions
      yield
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpcore/_backends/sync.py", line 206, in connect_tcp
      sock = socket.create_connection(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/socket.py", line 827, in create_connection
      for res in getaddrinfo(host, port, 0, SOCK_STREAM):
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/socket.py", line 962, in getaddrinfo
      for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  OSError: [Errno 16] Device or resource busy
  
  The above exception was the direct cause of the following exception:
  
  Traceback (most recent call last):
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpx/_transports/default.py", line 67, in map_httpcore_exceptions
      yield
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpx/_transports/default.py", line 231, in handle_request
      resp = self._pool.handle_request(req)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpcore/_sync/connection_pool.py", line 268, in handle_request
      raise exc
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpcore/_sync/connection_pool.py", line 251, in handle_request
      response = connection.handle_request(request)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpcore/_sync/connection.py", line 99, in handle_request
      raise exc
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpcore/_sync/connection.py", line 76, in handle_request
      stream = self._connect(request)
               ^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpcore/_sync/connection.py", line 124, in _connect
      stream = self._network_backend.connect_tcp(**kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpcore/_backends/sync.py", line 205, in connect_tcp
      with map_exceptions(exc_map):
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/contextlib.py", line 158, in __exit__
      self.gen.throw(typ, value, traceback)
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
      raise to_exc(exc) from exc
  httpcore.ConnectError: [Errno 16] Device or resource busy
  
  The above exception was the direct cause of the following exception:
  
  Traceback (most recent call last):
    File "_udf_code.py", line 110, in main
    File "_udf_code.py", line 24, in model
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpx/_api.py", line 196, in get
      return request(
             ^^^^^^^^
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpx/_api.py", line 104, in request
      return client.request(
             ^^^^^^^^^^^^^^^
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpx/_client.py", line 828, in request
      return self.send(request, auth=auth, follow_redirects=follow_redirects)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpx/_client.py", line 915, in send
      response = self._send_handling_auth(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpx/_client.py", line 943, in _send_handling_auth
      response = self._send_handling_redirects(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpx/_client.py", line 980, in _send_handling_redirects
      response = self._send_single_request(request)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpx/_client.py", line 1016, in _send_single_request
      response = transport.handle_request(request)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpx/_transports/default.py", line 230, in handle_request
      with map_httpcore_exceptions():
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/contextlib.py", line 158, in __exit__
      self.gen.throw(typ, value, traceback)
    File "/usr/lib/python_udf/295ff313ce212b6657fd014dfa3fbb8d02eff440266bbd0d39b024d3f36384a4/lib/python3.11/site-packages/httpx/_transports/default.py", line 84, in map_httpcore_exceptions
      raise mapped_exc(message) from exc
  httpx.ConnectError: [Errno 16] Device or resource busy

Environment

- OS: MacOS
- Python: 3.11
- dbt-core: 1.7.9
- dbt-snowflake: 1.7.2

Additional Context

This is very similar to the post https://discourse.getdbt.com/t/dbt-python-model-external-access-integrations-and-secrets/11066

@Lindblomsebastian Lindblomsebastian added bug Something isn't working triage labels Mar 8, 2024
@Lindblomsebastian
Copy link
Contributor Author

I might be able to solve this in https://github.com/dbt-labs/dbt-snowflake/blob/main/dbt/adapters/snowflake/impl.py#L200 but I'm unable to open up a PR.

@Lindblomsebastian
Copy link
Contributor Author

If you want me to I can open up a new feature request and get started on this?

@ernestoongaro
Copy link
Contributor

@Lindblomsebastian sure, you are welcome to open a FR and we can consider it for inclusion in a future version. As a shorter term workaround, you may just want to use something like a "Raw SQL materialization" as a workaround...effectively putting in the entire python SP code into the body of a model: https://github.com/randypitcherii/dbt_workspace/blob/production/dbt/macros/demos/materializations/raw_sql/raw_sql.sql

@Lindblomsebastian
Copy link
Contributor Author

Lindblomsebastian commented Mar 11, 2024

@Lindblomsebastian sure, you are welcome to open a FR and we can consider it for inclusion in a future version. As a shorter term workaround, you may just want to use something like a "Raw SQL materialization" as a workaround...effectively putting in the entire python SP code into the body of a model: https://github.com/randypitcherii/dbt_workspace/blob/production/dbt/macros/demos/materializations/raw_sql/raw_sql.sql

Cool, thanks! I opened a FR: #927. I can take a stab at implementing it too.
I'll checkout the Raw SQL materialization meanwhile.

@dbeatty10
Copy link
Contributor

Thanks for diving into this @Lindblomsebastian !

Being able to add EXTERNAL_ACCESS_INTEGRATIONS to the UDF for dbt models is crucial to use Snowflake’s external access to query external APIs.

But since it's not something we support yet, I'm going to recategorize this as a feature request (rather than a bug) and consolidate #925 and #927 into a single issue.

@dbeatty10 dbeatty10 added enhancement New feature or request and removed bug Something isn't working labels Mar 12, 2024
@dbeatty10 dbeatty10 changed the title [Bug] EXTERNAL_ACCESS_INTEGRATION not being propagated to the procedure [Feature] Implement EXTERNAL_ACCESS_INTEGRATION and secrets in submit_python_job Mar 12, 2024
@dbeatty10 dbeatty10 changed the title [Feature] Implement EXTERNAL_ACCESS_INTEGRATION and secrets in submit_python_job [Feature] Implement EXTERNAL_ACCESS_INTEGRATIONS and secrets in submit_python_job Mar 12, 2024
mirnawong1 added a commit to dbt-labs/docs.getdbt.com that referenced this issue Apr 12, 2024
… & secrets (Snowpark concept) (#5262)

## What are you changing in this pull request and why?
dbt-labs/dbt-snowflake#925
dbt-labs/dbt-snowflake#955

## Checklist
Wait until PR is merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants