Skip to content

Unified Porting

todor-ivanov edited this page Mar 25, 2020 · 53 revisions

NOTE: most of this content has been adapted and migrated to https://github.com/dmwm/WMCore/wiki/ReqMgr2-MicroService-Transferor , reflecting the latest discussions as of 2019. So we will probably deprecated this wiki in the near future.

Transferer.py porting

  1. Provide API for the information which can determine datasets to transfer:

    • Create "Hold" states before "Available" state for Global Workqueue elements. (don't replicate to local workqueue when it is "Hold" state
    • When request is assign-approved states, populates GQ elements with "Hold" state.
    • Provide the API to get jobs per dataset, site white list, cpu requirement, (Alan, Jean-Roch - which information need to determine the dataset to transfer? - do we have to consider block level transfer?)
  2. Trigger the transfer and keep track of the status.

    • MicroService (daemon) use GQ api and Unified config to determine which dataset need to be transferred periodically.
    • MicroService makes PhEDEx subscription call for the datasets selected.
    • For the request which contains "not selected" datasets, update the request status in request manager to "assigned" (Provide the api to change request status and WQE status at the same time - change WQE status first).
    • MicroService keeps track of the subscription made until data is finished. (Memory queue, local database)
    • Update the request status in request manager to "assigned" using the same API above.
  3. Dataset deletion

    • DDM service will query to reqmgr to check whether given dataset is ready to be deleted.
    • DDM will handle internal locks from other sources as well as reqmgr query result to handle the deletion

Output data placement from Unified Perspective:

NOTE: The following content is about to be removed soon. These are some findings about how output data placement is implemented in Unified so far. We are keeping it here until we have a solid design plans for the implementation in MicroServices.

Block level subscriptions:

+--------------+----------------------------------------+-----------------+
| Script       | workflow/request status                | comments        | 
+--------------+----------------------------------------+-----------------+
| subsribor.py | 'assigned', 'acquired', 'running-open',| running on:     |
|              | 'running-closed', 'force-complete',    | vocms0268       |
|              | 'completed', 'closed-out'              | single threaded |
+--------------+----------------------------------------+-----------------+

General description:

Checks for dataset distributions per site and makes replica requests.

Information sources:

  • ReqMgr2 - workflow list
  • Phedex - blocks per site
  • Oracle - ??

Code flow:

  • Gets the Workflows list from reqmgr2
  • Iterates through the workflows list and collects the information about all the blocks produced
  • From Phedex gets the information about block distribution per site (looses the information regarding the workflow)
  • Iterates through the list of all blocks per site and makes replica requests for all of them. No check is done on previous replica status. (auto approves them)

Dataset level subscriptions:

+--------------+----------------------------------------+-----------------+
| Script       | workflow/request status                | comments        | 
+--------------+----------------------------------------+-----------------+
| closor.py    | 'close', 'announce', 'closed-out'      | running on:     |
|              |                                        | vocms0272       |
|              |                                        | multithreaded   |
|	       |                                        | 2 run modes:    |
|              |                                        | announce & close|	
+--------------+----------------------------------------+-----------------+

General description:

Checks for workflows to be closed and moves their status in both ReqMgr and Oracle Database (oracle reflects the workflow status regarding the unified state machinery). Makes Output Datasets subscriptions && Dataset announcements.

Information sources:

  • MongoDB - closeoutInfo, campaignInfo
  • ReqMgr2 - worflowInfo, workflow status in general, builds a list of all workflows in status ‘closed-out’, reads expected lumis
  • Oracle - wf lists (it may also use direct sql queries to DBS - not sure)
  • unifiedConfiguration.json
  • DDM (Dynamo Data Management system) - transfer subscriptions
  • DBS3 - dataset lumis, blocks list per dataset and sites
  • Phedex - checks block replica distribution per site
  • Checks the status of the following components/external services at startup: Mcm, wtc, jira (uses direct queries to the service)
  • There are a few hard coded references to Dima’s page like: https://dmytro.web.cern.ch/dmytro/cmsprodmon/workflows.php?campaign=%

Code flow:

  • Builds Workflow lists from sql/Oracle database:
  • For every workflow tries to create the proper dataset transfer subscription during the final status transition - complete/closed-out/announced and then update workflow status in both Oracle database and Reqmgr.
  • Treats RelVal and non-RelVal workflows differently:
    • RelVal:
      • Assigns different destinations in a per data tier basis. Uses the following mapping:
 ALCARECO:         T2_CH_CERN
 GEN-SIM:          T1_US_FNAL_Disk
 GEN-SIM-DIGI-RAW: T1_US_FNAL_Disk
 GEN-SIM-RECO:     T1_US_FNAL_Disk
 if "RelValTTBar" in dsn and "TkAlMinBias" in process_string and tier != "ALCARECO":       T2_CH_CERN
 if "MinimumBias" in dsn and "SiStripCalMinBias" in process_string and tier != "ALCARECO": T2_CH_CERN

... * Makes replica request to Phedex with the following parameters: priority='normal', approve=True, group='RelVal'

* Non-RelVal;
  * Takes all the information about the transfer parameters from the campaign configuration from MongoDB. The list of parameters is: `priority, group, number of copies, destination`

  * Makes subscriptions to DDM
Clone this wiki locally