Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not enough resources for query planning - BigQuery connection breaks with too many tests #232

Closed
alberto-maurel opened this issue Oct 27, 2022 · 6 comments
Labels
bug Something isn't working priority Priority issue or pull request

Comments

@alberto-maurel
Copy link

Hi

We are facing the following problem when running dbt_artifacts.upload_results(results) in the on-run-end hook:

11:39:18  Running 2 on-run-end hooks
11:39:22  Uploading model executions
11:39:29  Uploading seed executions
11:39:31  Uploading snapshot executions
11:39:33  Uploading test executions
11:40:07  Database error while running on-run-end
11:40:07  Encountered an error:
Database Error
  Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex.
...
google.api_core.exceptions.BadRequest: 400 Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex.

Location: us-east1
Job ID: 8eedde3d-33b8-4157-9dc8-c8130579f75b

...
  File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/dbt/adapters/bigquery/connections.py", line 186, in handle_error
    raise DatabaseException(error_msg)
dbt.exceptions.DatabaseException: Database Error
  Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex.

When checking that Job ID, the query executed is the following one:

insert into dataset.dbt_ci.artifacts_src__test_executions
    VALUES 
        ('run_id', 'test.dataset.test_1', '2022-10-27 11:25:36.300150+00:00', False, 'Thread-7', 'pass', '2022-10-27 11:28:22.783830', '2022-10-27 11:28:23.692153', 0.9184236526489258, null, 0),
        ('run_id', 'test.dataset.test_2', '2022-10-27 11:25:36.300150+00:00', False, 'Thread-9', 'pass', '2022-10-27 11:28:22.766246', '2022-10-27 11:28:23.897758', 1.1385383605957031, null, 0),
        ...
        ('run_id', 'test.dataset.test_1069', '2022-10-27 11:25:36.300150+00:00', False, 'Thread-9', 'pass', '2022-10-27 11:28:22.766246', '2022-10-27 11:28:23.897758', 1.1385383605957031, null, 0)

As we have 1050+ tests, BigQuery is not able to process the query and spits out that error. I've performed some manual tests and it seems to start complaining between 850-900 tests.

Would it be possible to modify the way the data is inserted in the tables in BigQuery to a more efficient one? As an alternative, I've thought about substituting this for something like this:

INSERT INTO capchase.dbt_ci.artifacts_src__test_executions
SELECT 
        tests_data.c1,
        tests_data.c2,
        CAST(tests_data.c3 AS TIMESTAMP),
        tests_data.c4,
        tests_data.c5,
        tests_data.c6,
        CAST(tests_data.c7 AS TIMESTAMP),
        CAST(tests_data.c8 AS TIMESTAMP),
        tests_data.c9,
        tests_data.c10,
        tests_data.c11
FROM UNNEST([STRUCT('run_id' AS c1, 'test.dataset.test_1' AS c2, '2022-10-27 11:25:36.300150+00:00' AS c3, False AS c4, 'Thread-7' AS c5, 'pass' AS c6, '2022-10-27 11:28:22.783830' AS c7, '2022-10-27 11:28:23.692153' AS c8, 0.9184236526489258 AS c9, null AS c10, 0 AS c11),
        ('run_id', 'test.dataset.test_2', '2022-10-27 11:25:36.300150+00:00', False, 'Thread-9', 'pass', '2022-10-27 11:28:22.766246', '2022-10-27 11:28:23.897758', 1.1385383605957031, null, 0),
        ...
        ('run_id', 'test.dataset.test_1069', '2022-10-27 11:25:36.300150+00:00', False, 'Thread-9', 'pass', '2022-10-27 11:28:22.766246', '2022-10-27 11:28:23.897758', 1.1385383605957031, null, 0)]) tests_data

This second approach is able to handle the 1050 tests seamlessly, and as side-effect, it runs 7 times faster (current left, proposed right):
image

Thanks a lot!

@jens-koster
Copy link

we're running into this as well.

@kieronellis
Copy link

Same here. Is there any fix coming?

@ghost
Copy link

ghost commented Apr 14, 2023

We also have this issue, with the upload_models step as well. If anyone know any workaround besides skipping this step, I'll take it 😀

@samw430
Copy link
Contributor

samw430 commented Apr 26, 2023

++

@glsdown
Copy link
Contributor

glsdown commented Apr 28, 2023

Hi @alberto-maurel .

Thanks for raising this, and I'm really sorry for how long it has taken for someone to get back to you. We will look into this and get a fix in place. Thank you for your suggested approach - this is really helpful, and will definitely speed up a resolution 🤞

@glsdown glsdown added bug Something isn't working priority Priority issue or pull request labels Apr 28, 2023
@glsdown
Copy link
Contributor

glsdown commented May 25, 2023

Hi @alberto-maurel

Thanks for your patience with this. We have just released 2.4.0 which should include a fix for this situation. Please let us know if it hasn't fixed it, and feel free to reopen the issue.

@glsdown glsdown closed this as completed May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority Priority issue or pull request
Projects
None yet
Development

No branches or pull requests

5 participants