-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stepchain: sort out duplicate outputModules #6787
Comments
How can different steps have exactly the same output ? |
not the same output, the same output module (e.g. RAWSIMoutput). This template simulates this problem: |
Ah, so it's a naming issue in the CMSSW configuration ? |
Well, WMCore reuses the names internally. Can we somehow attach the step name in the identifiers WMCore uses ? Only other solution I see is to tell people that configs in stepchains need unique output identifier across all steps, which I don't think will fly. |
No, it's a problem on our end. I still have to investigate it further, but quickly explaining what happened in my tests.
|
Ok, so the agent only used the name of an output module, as I expected. Solutions still stand. Either we change that to also take into account step name or we enforce unique output identifier across all steps. |
If I answer this, then I have a fix, which I don't yet :-)
Nope, it did not fly! Folks have tagged already StepChain as incomplete... |
maybe you want to lower this down from high priority, as it was agreed that we will not get this working. |
And for my information, the reason jobs from different datatiers ran under the same merge job is that they were associated to the same fileset, since the fileset follows the
where we - of course - cannot create two: thus the only way forward is changing how the fileset naming is created. I have a "almost working version" which also adds the datatier to the fileset name. Other possibility would be adding the cmsRun step (cmsRun1, cmsRun2), adding the stepname would be complicated because that's only available for StepChains... @ticoann @ericvaandering if you have an strong opinion about any of those, please let me know before it's too late :) |
Fileset names are just arbitrary placeholders in WMBS, right? From what I've seen on what I'm working on, if you don't name them they just get a UUID. I don't understand how adding the data tier helps. It's already there in GENSIM, right? |
Not really, filesets are named after workflow + task name + (merge or unmerge) + output module. Then this fileset is mapped back in JobAccountant for the proper file accounting: GENSIM in the example above is just my task name (actually, StepName for the first step (Step1)). |
As Stepchain is now, it does not support KeepOutput:True for different steps using exactly the same outputModule (not matter the datatier used). The problem is that merge jobs are created for all files created under a specific outputModule (including all datatiers).
The text was updated successfully, but these errors were encountered: