Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSOutput: read RelVal output data policy from configuration #11024

Merged
merged 2 commits into from
Mar 8, 2022

Conversation

amaltaro
Copy link
Contributor

@amaltaro amaltaro commented Mar 7, 2022

Fixes #10106

Status

ready

Description

This PR provides the following:

  • remove the hard-coded RelVal output logic in place of a configurable schema (defined within the MSOutput service configuration)
  • validate the policy data structure against is data type, RSE names, and datatier names
  • each RSE destination will get a copy of the output data

In addition to this, it also removes an obsolete use of the tapePledges configuration attribute.

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

None

External dependencies / deployment changes

Deployment changes: dmwm/deployment#1130
services_config: https://gitlab.cern.ch/cmsweb-k8s/services_config/-/merge_requests/131 and https://gitlab.cern.ch/cmsweb-k8s/services_config/-/merge_requests/130

@amaltaro
Copy link
Contributor Author

amaltaro commented Mar 7, 2022

@haozturk @bbilin could you please review the RelVal output data placement policy defined in this PR: dmwm/deployment#1130

I am planning to change the implementation of this PR, but I would appreciate if you could confirm that the RelVal policy is sound. Thanks

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12848/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 9 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12851/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 10 tests added
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12853/artifact/artifacts/PullRequestReport.html

@bbilin
Copy link

bbilin commented Mar 7, 2022

@amaltaro sorry for the latency in replying.

For the moment we are running all the relval chains at CERN, so there is no need to add FNAL to the rules. We would hence suggest for all the tiers you mention to be stored only at CERN.

Many thanks,

B. for PdmV

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 10 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12854/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 10 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12855/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@todor-ivanov todor-ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @amaltaro. I do like the idea for separating the whole RelVal policy generation as a separate class. I've left few comments inline, which may be worth taking a look. None of them is a showstopper. Mostly requests for adding few lines for code documentation.

self.msConfig.setdefault("dbsUrl", "https://cmsweb-prod.cern.ch/dbs/prod/global/DBSReader")
allDBSDatatiers = getDataTiers(self.msConfig['dbsUrl'])
allDiskRSEs = self.rucio.evaluateRSEExpression("*", returnTape=False)
self.relvalPolicy = RelValPolicy(self.msConfig.get("relvalPolicy", []),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say this configuration check here:
self.msConfig.get("relvalPolicy", [])

would be good to have as a separate default assignment like the rest msConfig fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good also to have a simple Note either in the docstring or as an inline comment about what should this list contain.

This module will contain the RelVal output data placement policy, where
destinations will be decided according to the dataset datatier.
"""
def __init__(self, policyDesc, listDatatiers, listRSEs, logger=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a doc string or a note explaining what would be the expected input to the init method?

:param policyDesc: list of dictionaries with the policy definition
:param validDBSTiers: list with existent DBS datatiers
:param validDiskRSEs: list with existent Rucio Disk RSEs
:return: raise an exception if any validation fails
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the method does not return a value, I would format this line as:
No return value. The method raises an exception if any validation fails instead
or something similar.

msg = "The RelVal output data placement policy is not in the expected data type. "
msg += "Type expected: list, while the current data type is: {}. ".format(type(policyDesc))
msg += "This critical ERROR must be fixed."
raise RelValPolicyException(msg) from None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this is raised during the parameter validation process here, is there a place in the MSOutput module code to handle this exception, or it is left to be treated as a general exception and just the proper messages to be logged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case it fails, it's supposed to fail the MSOutput object initialization. Meaning, the service will crash until someone spots the problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair.
Thanks

# validate the datatier
if not isinstance(item['datatier'], str):
msg = "The 'datatier' parameter must be a string, not {}.".format(type(item['datatier']))
raise RelValPolicyException(msg) from None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above. And also for the few similar cases bellow.

validate the RSE name as well

Create RelValPolicy module; revert MSOutput changes

fix __str__ method to use json.dumps instead

set number of copies according to number of destinations

fix attribute name

apply Todors suggestions
remake unit tests

added stringification unit test
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 10 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12856/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

amaltaro commented Mar 8, 2022

Thanks for the review, Todor. I think I either answered or fixed the comments/requests you made.

If it looks good to you, could you please buy us some time and:

  • merge this PR
  • cut a new WMCore tag (I guess it's 2.0.1.pre5)
  • make a new cmsdist pull request with that tag
  • request that cmsdist PR - and all of the deployment/services_config - in the usual HG2203 requests gitlab ticket
    ?

Copy link
Contributor

@todor-ivanov todor-ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good.
Thanks @amaltaro

@todor-ivanov todor-ivanov merged commit a1da43b into dmwm:master Mar 8, 2022
@amaltaro
Copy link
Contributor Author

@haozturk
Copy link

perfect, thanks Alan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problems in RelVal Output Data Placement
5 participants