-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-1015] job_retries not obeyed when job_execution_timeout_seconds is exceeded #263
Comments
We have been experiencing the same issue. It also affects dbt cloud. Thank you @jars for doing the work to reproduce this issue! |
Thanks @jars! There's a specific bug here: dbt should respect the dbt-bigquery/dbt/adapters/bigquery/connections.py Lines 51 to 56 in 9c36aa2
At least, I think so — but then after thinking a bit about it, I'm less sure! (If it does time out, and the query keeps going, and actually succeeds — but then dbt retries — don't we risk running the same query / performing the same operation twice? Can we lean on dbt's idempotent materializations here? Not to mention the expense...) Another option we could try taking here: If I'd prefer to avoid implementing more logic here than we need. As a larger point, we are looking to replace a lot of our current timeout/retry code in |
Thanks @jtcohen6, you've given us some ideas to chew on for the next couple of days. Do you have insight into the priority of #231, and if it would address this specific issue? Or do you think #231 would swap-out |
@jtcohen6 , I totally agree! Do you know if anyone is working on #231, and its priority? this is an important issue for us. I am happy to contribute to the implementation. |
Hi all! Wanted to check in on this issue, is a fix in the works? We're running into this problem as well -- so glad to see there's a thoughtful discussion about it! |
@hui-zheng sorry for the delayed answer. If you think you can contribute, please do so on #231, even if it's to scope work or a proof of concept. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
@Fleid how do we get this issue back onto the team's backlog? can we reopen this one? |
Flagging that we at Superhuman are continuing to encounter this issue as @lillikulak mentions above - any update on progress toward solving this problem? |
We are facing the same issue, regularly. I would be also very happy if this problem is acutally in scope. |
Thats a nice christmas present 🎄 Thanks! |
Describe the bug
The
job_retries
BigQuery job configuration setting is not used in conjunction withjob_execution_timeout_seconds
setting. Instead of retrying a job that times out, dbt immediately fails withOperation did not complete within the designated timeout.
Steps To Reproduce
Sure. First use the official dbt Docker image, and run
dbt init
to initialise a project:Make a
profiles.yml
file similar to this. Take special note of thejob_execution_timeout_seconds
value, which I've exaggerated to 1 second to force a query timeout to occur. Also note thatjob_retries=5
Fill in theproject
key.Then, run the built docker image, executing a dbt run on the example
my_first_dbt_model
model. Map yourprofiles.yml
file, andkeyfile.json
into the container:Expected behavior
The model
my_first_dbt_model
would fail, and retry five times.Screenshots and log output
After dbt errors/exits, the job continues running on BigQuery, and eventually succeeds:
Additional Context
The retries are obeyed in the event of other BigQuery failures. For example, if you introduce an intentional syntax error in your model, it will retry up to
job_retries
.System information
The output of
dbt --version
:The operating system you're using: Using the official Docker Image...
root@7fbf851eb622:/# lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 11 (bullseye) Release: 11 Codename: bullseye
The output of
python --version
: Python 3.10.3The text was updated successfully, but these errors were encountered: