-
Notifications
You must be signed in to change notification settings - Fork 848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
version: every will skip versions if a parallel upstream job's latest build finishes before an older one #736
Comments
I'm also seeing this behavior on my end on the pullrequest-resource, currently running Concourse 2.4.0 |
So this is a tricky one. The semantics for tl;dr: The scheduling for the second job would have to know that a build is in-flight for the ideal next version, and know to wait for it to finish before determining the inputs. |
Our team also makes heavy use of the pullrequest-resource, and reliably running all |
@oppegard That's right. Until we figure out whether there's something better we could do. I'm a little wary of the changes this kind of implies; having scheduling so dependent on ephemeral job state feels wrong. |
Possible workaround: Have each job do a e.g.: resources:
- name: my-pr
type: pr
source: {...}
- name: dummy-time
type: time
source: {interval: 24h} # doesn't matter, but something has to be there
jobs:
- name: job-1
plan:
- get: my-pr
version: every
trigger: true
- task: test-pr
# ...
- put: dummy-time
- name: job-2
plan:
- get: my-pr
trigger: true
passed: [job-1]
- get: dummy-time
version: every
trigger: true
passed: [job-1]
- # ...
- put: dummy-time
- name: job-3
plan:
- get: my-pr
trigger: true
passed: [job-2]
- get: dummy-time
version: every
trigger: true
passed: [job-2]
- # ...
- put: dummy-time Note that |
Hi @vito, Is still an issue (and is the above workaround is still required) when using the pullrequest-resource, even though the mentioned problems with using If so, could you please explain the details, and why the problem seems to be specific to pullrequest-resource? My guess is that it is because the pullrequest-resource conflates two independent "versions" (PR numbers and SHAs) into a single "version", which breaks some of the concourse assumptions about how resources and versioning work. It would be great to have you confirm or deny that, and if so, possibly explain why from your perspective. Thanks! |
The problem is not specific to the PR resource. This issue is still open. The linked comment explains why this issue exists; it's a fundamental issue with |
@vito OK, thanks. EDIT
END EDIT But, with respect to the pullrequest-resource specifically:
Correct? If not, then please explain in more detail, because I'm still confused :) Thanks, |
@vito Also, I tried out the workaround above. It doesn't appear to work if you ever have more than one entry in a E.g., if you try this,
Is this expected (and thus the workaround doesn't work in this situation), or do I have something wrong? Thanks, |
The problem is in concourse/concourse#736. This is the workaround advised by the concourse folks. :)
Hey @vito we have the same setup as @thewoolleyman and, while the workaround seems to accomplish what it is meant for, we also experience Is this meant to work when there are multiple entries in the |
@nazrhom (and, well, everyone) To be honest we're starting to see It seems like what y'all are really trying to do is have a full pipeline run for every version. I think that's actually a challenge that can be met by a concept similar to spaces (#1707). We've got a new plan laid out in concourse/rfcs#1 (comment) that should make this kind of thing possible. At that point we might deprecate configuring both |
@vito thanks, will look into the rfcs. I did some more digging and the issue I am experiencing does not seem to be related to fan-in/out as I can reproduce the issue even with a setup like:
I have a question on the (current) expected semantics here: |
@nazrhom Yeah, that may be what's happening. |
Beep boop! This issue has been idle for long enough that it's time to check If it is, what is blocking it? Would anyone be interested in submitting a If no activity is observed within the next week, this issue will be |
Would ask that this issue and/or #1298 remain open as it's a common problem and the proposed fix (spaces) is a large and uncertain features (concourse/rfcs#24). |
I'll leave this issue open as long as the issue still stands. The resolution will probably be deprecating/disallowing I'll keep the stale bot at bay by placing this in our Spatial Resources epic as a long-term goal to close out once the epic is complete. |
We've been hitting this same issue. So, I've been trying to understand how the scheduling algorithm is currently implemented. I'd like to check my understanding, and see if it inspires any reasonable fixes. My understanding is,
If my understanding is correct about this process, the cause of this bug is pretty straightforward - When Given this, it seems like there might also be edge cases in which inputs that were A naiive suggestion might be, in cases where an |
@YenTheFirst Impressive sleuthing! We're actually near the end of a complete re-design and re-implementation of the scheduling algorithm, currently living on the For inputs with However, at least at some level, the problem with Neither of these options really feel compelling to me. Even |
I don't think the concepts of In our particular case, the pipeline is something like:
I suspect we could work around this by removing the initial "build a docker image", and instead having unit-tests, integration-tests do their own building, but that duplicates work and code. [In this particular case, yes, we're using I'll take a closer look at those Based only on your description,
At first glance, this sounds like it would have the exact same problem. If a downstream build for D completes before an upstream build for C, the downstream build for C will never start, since "D" will be the version that was previously used.
|
To be clear, I don't doubt the validity of your use case at all. I just don't see us improving this specific approach any time soon, because it will take a nontrivial amount of work, and it will just delay the real fix which we've recently gained a pretty clear picture of. The problem is that, while
This is why I think it's a fundamental issue. Concourse pipelines go forwards, not backwards. Changing this would have a ton of implications, and it isn't a clean fix anyway because now there's a question of when does We already know this feature is a pain in the butt to support, and doubling down on it to get it working correctly would require re-designing significant portions of the product around it, both internally and in the UI. So at this point I'm not on board with supporting Your use case of building every commit of every PR is totally fine. I think a model that better fits how Concourse works would be to configure a pipeline for each commit. We just need to improve the automation and navigation of pipelines. This is all broken down into many small features, laid out in the v10 roadmap post. Implementing most of these features will be easier or at least more beneficial in the long-term than doubling down on
Hmm good point, not yet. We should just open a PR for it. 🤔 I'll link to it here once we do (or just keep an eye out if I forget!) edit: PR is open #4721 |
we introduced max-in-flight hoping to limit the number of clusters we run at the same time. It turns out it doesn't work that way. As soon as a deploy job is done the next one will be queued although no cluster has been deleted. This will allow for as many clusters as we have time to deploy until the first cluster is deleteted. We also add the "version: every" setting to make sure we don't skip versions if jobs complete in random order. See also here: concourse/concourse#736 Signed-off-by: Dimitris Karakasilis <DKarakasilis@suse.com>
we introduced max-in-flight hoping to limit the number of clusters we run at the same time. It turns out it doesn't work that way. As soon as a deploy job is done the next one will be queued although no cluster has been deleted. This will allow for as many clusters as we have time to deploy until the first cluster is deleteted. We also add the "version: every" setting to make sure we don't skip versions if jobs complete in random order. See also here: concourse/concourse#736 Signed-off-by: Dimitris Karakasilis <DKarakasilis@suse.com>
because this together with `passed:` is confusing to Concourse and doesn't let the step trigger when it should. See also: concourse/concourse#736 (comment)
because that blocks automatic triggering of jobs concourse/concourse#736 This combination of settting was introduced when we tried to force Concourse to run later jobs on every version that passed the previous one. E.g. ver1 passed job 1, ver2 passed job 1 job2 should trigger both for ver1 and ver2 otherwise ver1 would never be tested on job2. The new scheduling algorighm may have solved this: https://concourse-ci.org/scheduler.html#scheduling-behavior Given our pipeline needs us to manually trigger jobs all the time, this worths a shot.
Bug Report
Follow up to #666 and #563
We're running 2.4.0.
What I saw this morning, Job starts multiple copies of itself; A, B, C, and D. In that order.
The jobs finished in A, C, D, B order. The next job that depends on that resource (which is
version: every
) did not run with resource version B.The text was updated successfully, but these errors were encountered: