Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSTransferor: support input data placement with Rucio #9759

Merged
merged 3 commits into from
Sep 2, 2020

Conversation

amaltaro
Copy link
Contributor

@amaltaro amaltaro commented Jun 19, 2020

Fixes #9725
Fixes #9461

Status

In development

Description

Summary of changes are:

  • implemented a Rucio wrapper based on pycurl for concurrent requests; only a few very needed APIs implemented so far;
  • when listing Rucio replication rules, it could be that rules in state STUCK and SUSPENDED will not be considered and a new rule can be created;
  • updated MSCore __init__ method to support sub-classes to provide service names that they do not want to be initialized;
  • updated MSCore to initialize a Rucio object;
  • support RSE quotas update based on Rucio (using quota limit and usage)
  • added all the necessary logic in MSTransferor to deal with either PhEDEx-based or Rucio-based input data placement;
    • including check on the RSE quota and usage
    • relying on a new Rucio account: wmcore_transferor
    • logic for the pileup is: if the container is not locked anywhere within the SiteWhitelist (with that Rucio acct), then we make a new rule with grouping=ALL against one specific RSE
    • logic for the primary is: find all the blocks and make a rule for all of them against all RSEs to be used (grouping=DATASET)
    • logic for the primary + parent is: create chunks of primary + parent and make a rule against one specific RSE (other chunks go to other RSEs), using grouping=DATASET
  • Rucio rules are created in asynchronous mode, ask_approval always False;
  • Rucio rules - in non production server/mode - will contain a lifetime of 24h;
  • Rucio RSE quota and rucio rules are created with the python client APIs; container size, list of blocks and their sizes and current rules locking data is done via RESTful APIs;
  • renew the Rucio token if it's to expiry within 30min;
  • Todor/Kenyi/I are notified on large data placement;
  • on the Rucio wrapper, support lifetime and asynchronous parameters for replication rule creation;

In addition to that, there is:

  • fixed a bug in MSMonitor when counting workflows that had to be skipped;
  • fixed a bug, inconsistent progress between Rucio and PhEDEx transfers. Now their progress is expressed in percentage, thus between 0 and 100;
  • added a debug line to print some extra information about rules being evaluated

It depends on the following new configuration parameters:

  • useRucio: to enable/disable Rucio as a DM service (if disabled, PhEDEx is used)
  • rulesLifetime: defines an expiration time for all rules created by MSTransferor. Production rules (against the production Rucio server) have no expiration time.

Is it backward compatible (if not, which system it affects?)

yes

Related PRs

none

External dependencies / deployment changes

Deployment changes:
dmwm/deployment#924
dmwm/deployment#942

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: failed
    • 4 new failures
  • Pylint check: failed
    • 1 warnings and errors that must be fixed
    • 5 warnings
    • 14 comments to review
  • Pycodestyle check: succeeded
    • 7 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10147/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: failed
    • 1 new failures
    • 27 tests deleted
    • 1 tests added
  • Pylint check: failed
    • 13 warnings and errors that must be fixed
    • 13 warnings
    • 65 comments to review
  • Pycodestyle check: succeeded
    • 19 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10159/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: failed
    • 27 tests deleted
    • 1 tests added
  • Pylint check: failed
    • 12 warnings and errors that must be fixed
    • 13 warnings
    • 65 comments to review
  • Pycodestyle check: succeeded
    • 19 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10160/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: failed
    • 27 tests deleted
    • 1 tests added
    • 1 changes in unstable tests
  • Pylint check: failed
    • 13 warnings and errors that must be fixed
    • 13 warnings
    • 69 comments to review
  • Pycodestyle check: succeeded
    • 19 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10209/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: failed
    • 27 tests deleted
    • 1 tests added
  • Pylint check: failed
    • 11 warnings and errors that must be fixed
    • 13 warnings
    • 72 comments to review
  • Pycodestyle check: succeeded
    • 19 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10210/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: failed
    • 27 tests deleted
    • 1 tests added
  • Pylint check: failed
    • 11 warnings and errors that must be fixed
    • 13 warnings
    • 72 comments to review
  • Pycodestyle check: succeeded
    • 19 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10212/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: failed
    • 27 tests deleted
    • 1 tests added
  • Pylint check: failed
    • 11 warnings and errors that must be fixed
    • 13 warnings
    • 72 comments to review
  • Pycodestyle check: succeeded
    • 19 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10215/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

amaltaro commented Jul 1, 2020

For some reason, the import cert, ckey from PycurlRucio does not want to work, I get the same import error in my container too:

Traceback (most recent call last):
  File "/home/dmwm/unittestdeploy/wmagent/1.3.6.pre4/sw/slc7_amd64_gcc630/external/py2-nose/1.3.7-comp3/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "/home/dmwm/unittestdeploy/wmagent/1.3.6.pre4/sw/slc7_amd64_gcc630/external/py2-nose/1.3.7-comp3/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/home/dmwm/unittestdeploy/wmagent/1.3.6.pre4/sw/slc7_amd64_gcc630/external/py2-nose/1.3.7-comp3/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/home/dmwm/wmcore_unittest/WMCore/test/python/WMCore_t/MicroService_t/DataStructs_t/Workflow_t.py", line 8, in <module>
    from WMCore.MicroService.DataStructs.Workflow import Workflow
  File "/home/dmwm/wmcore_unittest/WMCore/src/python/WMCore/MicroService/DataStructs/Workflow.py", line 10, in <module>
    from WMCore.MicroService.Unified.Common import getMSLogger, gigaBytes
  File "/home/dmwm/wmcore_unittest/WMCore/src/python/WMCore/MicroService/Unified/Common.py", line 18, in <module>
    from WMCore.MicroService.Tools.PycurlRucio import getPileupDatasetSizesRucio, getPileupSubscriptionsRucio, \
  File "/home/dmwm/wmcore_unittest/WMCore/src/python/WMCore/MicroService/Tools/PycurlRucio.py", line 20, in <module>
    from WMCore.MicroService.Unified.Common import cert, ckey
ImportError: cannot import name cert

it must be missing some subtle detail of the import behavior.
As a workaround, I copied those 2 methods over to the PycurlRucio module.

@todor-ivanov I believe there will be further changes to this PR. But just in case you can have a look at these changes by tomorrow morning, it would be great! Thanks

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: failed
    • 4 new failures
    • 2 changes in unstable tests
  • Pylint check: failed
    • 11 warnings and errors that must be fixed
    • 13 warnings
    • 72 comments to review
  • Pycodestyle check: succeeded
    • 19 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10216/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: succeeded
  • Pylint check: failed
    • 11 warnings and errors that must be fixed
    • 13 warnings
    • 72 comments to review
  • Pycodestyle check: succeeded
    • 19 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10218/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@todor-ivanov todor-ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alan, I made few minor comments in the code. Most of them could really be skipped, but one or two may be worth taking care of or at least double checked if not changed.

@amaltaro
Copy link
Contributor Author

amaltaro commented Jul 2, 2020

Todor, thanks for your review. Following your review, I have made further changes in my last commit. Please have another look, and if you agree, mark those conversations as resolved.

@todor-ivanov
Copy link
Contributor

Thanks @amaltaro I took a quick look at your new commit and I think you addressed everything I mentioned. I think the changes are good to go now.

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: succeeded
    • 1 tests no longer failing
  • Pylint check: failed
    • 11 warnings and errors that must be fixed
    • 14 warnings
    • 72 comments to review
  • Pycodestyle check: succeeded
    • 19 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10220/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: succeeded
    • 1 tests no longer failing
  • Pylint check: failed
    • 11 warnings and errors that must be fixed
    • 14 warnings
    • 72 comments to review
  • Pycodestyle check: succeeded
    • 19 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10221/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: succeeded
    • 1 tests no longer failing
  • Pylint check: failed
    • 11 warnings and errors that must be fixed
    • 14 warnings
    • 72 comments to review
  • Pycodestyle check: succeeded
    • 19 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10222/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: succeeded
    • 1 tests added
  • Pylint check: failed
    • 11 warnings and errors that must be fixed
    • 26 warnings
    • 73 comments to review
  • Pycodestyle check: succeeded
    • 20 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10288/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: failed
    • 1 new failures
    • 4 tests added
  • Pylint check: failed
    • 6 warnings and errors that must be fixed
    • 26 warnings
    • 105 comments to review
  • Pycodestyle check: succeeded
    • 60 comments to review
  • Python3 compatibility checks: succeeded
    • there are suggested fixes for newer python3 idioms

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10377/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: failed
    • 1 new failures
    • 4 tests added
  • Pylint check: failed
    • 6 warnings and errors that must be fixed
    • 29 warnings
    • 108 comments to review
  • Pycodestyle check: succeeded
    • 61 comments to review
  • Python3 compatibility checks: succeeded
    • there are suggested fixes for newer python3 idioms

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10378/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: failed
    • 4 tests added
  • Pylint check: failed
    • 7 warnings and errors that must be fixed
    • 29 warnings
    • 108 comments to review
  • Pycodestyle check: succeeded
    • 61 comments to review
  • Python3 compatibility checks: succeeded
    • there are suggested fixes for newer python3 idioms

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10379/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: succeeded
    • 4 tests added
    • 1 changes in unstable tests
  • Pylint check: failed
    • 6 warnings and errors that must be fixed
    • 29 warnings
    • 108 comments to review
  • Pycodestyle check: succeeded
    • 60 comments to review
  • Python3 compatibility checks: succeeded
    • there are suggested fixes for newer python3 idioms

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10380/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: succeeded
    • 4 tests added
  • Pylint check: failed
    • 6 warnings and errors that must be fixed
    • 29 warnings
    • 108 comments to review
  • Pycodestyle check: succeeded
    • 60 comments to review
  • Python3 compatibility checks: succeeded
    • there are suggested fixes for newer python3 idioms

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10381/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: succeeded
    • 4 tests added
  • Pylint check: failed
    • 11 warnings and errors that must be fixed
    • 41 warnings
    • 128 comments to review
  • Pycodestyle check: succeeded
    • 62 comments to review
  • Python3 compatibility checks: succeeded
    • there are suggested fixes for newer python3 idioms

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10382/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: failed
    • 4 tests added
  • Pylint check: failed
    • 11 warnings and errors that must be fixed
    • 41 warnings
    • 128 comments to review
  • Pycodestyle check: succeeded
    • 62 comments to review
  • Python3 compatibility checks: succeeded
    • there are suggested fixes for newer python3 idioms

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10383/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

amaltaro commented Sep 2, 2020

@nsmith- and @ericvaandering I think I have finally converged on this PR and tested the most common use cases.
I'm not asking for your review because it became very big, but feel free to go through the initial description at the very top and ask/suggest anything else that you might consider important.

I might merge it today, but feel free to leave questions/comments at any time and I can follow up on those in a different PR, if needed.

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: failed
    • 4 tests added
  • Pylint check: failed
    • 11 warnings and errors that must be fixed
    • 41 warnings
    • 128 comments to review
  • Pycodestyle check: succeeded
    • 62 comments to review
  • Python3 compatibility checks: succeeded
    • there are suggested fixes for newer python3 idioms

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10384/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: succeeded
    • 4 tests added
  • Pylint check: failed
    • 11 warnings and errors that must be fixed
    • 41 warnings
    • 128 comments to review
  • Pycodestyle check: succeeded
    • 62 comments to review
  • Python3 compatibility checks: succeeded
    • there are suggested fixes for newer python3 idioms

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10385/artifact/artifacts/PullRequestReport.html

Implement whole logic of input data placement with Rucio

fix import from PycurlRucio

clean init file

minor aesthetic changes

import absolute_import

one more absolute_import

copy cert/ckey methods over to pycurlRucio

useRucio flag might not be in the configuration file

apply Todors suggestions

use getattr for self.phedex

set self.phedex instead

Rucio initialization message

complete RSE limits/usage logic

fix key name name -> rse

bytes_remaining is always 0, set it to quota

fix getRucioToken

getdata has no decode parameter

fix getPileupDatasetSizesRucio function

fix getDatasetBlocksRucio and getPileupSubscriptionsRucio

fix getBlockReplicasAndSizeRucio method

New pycurl function listReplicationRules

fix listReplicationRule fail case

API to fetch blocks and their sizes given a container

Make the correct calls for PhEDEx and Rucio

integrate pickRSE

update long=1 to long=True

fix logic for renewing the rucio token

RequestInfo child class of MSCore; skip some service initialization

fix keyword arguments

fix MSMonitor NoneType; add debug rule log

bugfixes to rule monitoring; rule creation; others

support lifetime parameter for rule replicas

fix MSOutput initialization

more fixes to the pycurl rucio function; print grouping of the rules

Make Rucio rule completion consistent with PhEDEx, percentage

stuck_at returns a string datetime format

minor logging fix
cleanup init.py module

more fixes to unit tests

unit test for blocks/sizes per container

fix unit tests
@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: succeeded
    • 4 tests added
  • Pylint check: failed
    • 10 warnings and errors that must be fixed
    • 41 warnings
    • 128 comments to review
  • Pycodestyle check: succeeded
    • 62 comments to review
  • Python3 compatibility checks: succeeded
    • there are suggested fixes for newer python3 idioms

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10386/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

amaltaro commented Sep 2, 2020

All right, it should be tested enough by now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants