-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vine: check the availability of staged inputs #3994
vine: check the availability of staged inputs #3994
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't think this is the right solution.
Regular files should not be "lost" because they should exist at the manager once the original is done.
The problem is evident in the log describing the problem: an output transfer is interrupted, and the corresponding task is (incorrectly) declared as complete.
The correct solution is for that original task to go back into the queue.
Thanks for the clarification, this sounds to be a more feasible way. |
@dthain Thanks for helping me find this bug! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, that definitely looks like a bug.
Probably left over from some old code that used the &=
pattern.
Good catch!
@btovar can you double check this key bit of logic? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JinZhou5042 nice catch!
* vine: check the availability of staged inputs * check for created files * revert changes * revert changes * vine: change &= to = * set t->output_received on success * break when result is not success * lint * remove redund code * always do vine_manager_get_output_files
Proposed Changes
Fix #3993
So the problem was that when a worker crashes when sending back permanent outputs, all children tasks fail because of input missing.
It turned out that the return value of
retrieve_output
andvine_manager_get_output_files
are not correctly received and handled.In the original version,
result
is initialized asvine_result_code_t result = VINE_SUCCESS;
which is0
in integer:And
result &= xxx
will never change its initial value, this results in the manager assuming that the outputs have always been successfully retrieved.I was able to reproduce the stated error and with these changes that issue disappeared.
Merge Checklist
The following items must be completed before PRs can be merged.
Check these off to verify you have completed all steps.
make test
Run local tests prior to pushing.make format
Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)make lint
Run lint on source code prior to pushing.