-
Notifications
You must be signed in to change notification settings - Fork 107
ReqMgr2 MicroService Output
MSOutput is a microservice which is responsible for output data placement in central production of CMS. Currently, MSOutput solely runs with Rucio data management software. It processes requests whose states are closed-out
and announced
in RegMgr2.
MSOutput performs disk and tape placements in dataset level with CMS terminology, in container level with Rucio terminology. In other words, there is no block or file level distribution of the data across the grid.
Here is an important note about the usage of Unified configuration and campaign configuration in WMCore, which will be useful in reading this document:
- P&R is in charge of setting Unified configuration and campaign configurations.
- Currently WMCore fetches a copy of the Unified configuration (every hour), and that’s how the Unified is used within the MicroServices. However, in the future, it is planned to have all these configurations available within WMCore.
- WMCore does not use Unified campaigns. However, whenever there is a change in the Unified campaigns, Unified also pushes the same changes to the WMCore maintained campaign configuration. So, any decisions based on campaign configuration are taken from the WMCore-based campaigns.
Note that Tape and Disk RSE expression are configured in the MSOutput service configuration, their current values in production are:
data.rucioTapeExpression = "rse_type=TAPE\cms_type=test"
data.rucioDiskExpression = "(tier=2|tier=1)&cms_type=real&rse_type=DISK"
For each produced output dataset, MSOutput determines whether it is going to be placed into disk as well as the parameters required for these placements such as the destination, lifetime, number of copies etc.
MSOutput decides if a dataset is going to be sent to disk or not according 3 configurations:
-
MSOutput Configuration:
- If the data tier of the dataset is blacklisted in the MSOutput configuration file, then that dataset is not placed into disk.
excludeDataTier
parameter specifies the blacklisted data tiers. Currently, MSOutput does not apply any restriction for any data tiers:
- If the data tier of the dataset is blacklisted in the MSOutput configuration file, then that dataset is not placed into disk.
-
Campaign Configuration:
- If the data tier is whitelisted in the campaign configuration, then it is placed into disk.
toDDM
parameter in Unified campaign configuration (TiersToDM
in WMCore) specifies the whitelisted data tiers. Currently onlyGEN
tiers of some campaigns are whitelisted.
- If the data tier is whitelisted in the campaign configuration, then it is placed into disk.
-
Unified Configuration:
- If the data tier is whitelisted in Unified configuration, then it is placed into disk.
tiers_to_DDM
parameter in Unified configuration specifies these data tiers - If the data tier is blacklisted in Unified configuration, then it is not placed into disk.
tiers_no_DDM
parameter in Unified configuration specifies these data tiers: - Note that
tiers_to_DDM
has a precedence overtiers_no_DDM
.
- If the data tier is whitelisted in Unified configuration, then it is placed into disk.
It is important to note that MSOutput does these checks in the aforementioned order and once a data tier falls into a category, later checks become redundant. For instance, if the GEN
tier is whitelisted in campaign configuration, then it goes to disk even if it is blacklisted in Unified configuration.
-
Destination:
-
RelVal:
- Destination is decided based on the output dataset datatier. The dictionary policy is defined in the MSOutput service configuration. For each final destination, a new copy of the output dataset will be made.
-
Non-RelVal:
- MSOutput gives an RSE expression to Rucio which specifies that this dataset can be placed to any T1 and T2 disk and Rucio handles the rest. This is the RSE expression:
(tier=2|tier=1)&cms_type=real&rse_type=DISK
- MSOutput gives an RSE expression to Rucio which specifies that this dataset can be placed to any T1 and T2 disk and Rucio handles the rest. This is the RSE expression:
-
RelVal:
-
Lifetime:
-
RelVal:
- Determined by
rulesLifetimeRelVal
parameter of MSOutput configuration. - It is configured as
12 months
currently.
- Determined by
-
Non-RelVal:
- Determined by
rulesLifetime
parameter of MSOutput configuration. - It is configured as
1 month
currently.
- Determined by
-
RelVal:
-
Number of Copies:
-
Resubmission workflows:
- The number of copies is set as 0. In other words, MSOutput does not make an output data placement for ACDC workflows to avoid duplicate placements. Original workflow handles it.
-
Original workflows:
- The number of copies is set to the value determined by the
maxcopies
parameter of the campaign configuration (MaxCopies
in WMCore). If there is no such parameter defined, then it is set to 1. Note that currently, all campaigns except one are configured with 1 maxcopies. The exception: https://github.com/CMSCompOps/WmAgentScripts/blob/master/campaigns.json#L928
- The number of copies is set to the value determined by the
-
Resubmission workflows:
-
Weight:
- This attribute specifies the disk quota for CMS. It is set as
ddm_quota
- With the attribute, Rucio makes sure that we do not overload a given RSE and properly use the space.
- This attribute specifies the disk quota for CMS. It is set as
-
Grouping:
- It is set to
ALL
- It is set to
-
Activity:
- It is set to
Production Output
- It is set to
-
Account:
- This is the Rucio account name which will be used while creating rules in Rucio.
- It is set to
wmcore_output
currently. - It is configurable by the
rucioAccount
parameter in MSOutput configuration file.
-
Comment:
- It is set to
WMCore MSOutput output data placement
- It is set to
-
RelVal Check:
-
enableRelValCustodial
parameter of the MSOutput configuration determines whether to make tape placements for RelVal outputs or not. Note that, currently, this parameter is set to False, i.e. RelVal outputs do not go to tape.
-
-
Resubmission Check:
- MSOutput does not make tape placements for “Resubmission” workflows. Original workflows handle it.
-
MSOutput Configuration:
- If the data tier of the dataset is blacklisted in the MSOutput configuration file, then that dataset is not placed to disk.
excludeDataTier
parameter specifies the blacklisted data tiers. Currently, MSOutput does not apply any restriction for any data tiers:
- If the data tier of the dataset is blacklisted in the MSOutput configuration file, then that dataset is not placed to disk.
-
Unified Configuration:
- If the data tier of the dataset is blacklisted for tape in Unified configuration, then no tape placement is done for this dataset.
tiers_with_no_custodial
parameter specifies this decision
- If the data tier of the dataset is blacklisted for tape in Unified configuration, then no tape placement is done for this dataset.
-
Number of Copies:
- Number of copies is always 1.
-
Destination:
- Firstly, note that all allowed outputs of a workflow goes to the same destination. MSOutput sums up all output dataset sizes of a workflow and the total amount is used while choosing the tape destination.
- MSOutput gives the following RSE Expression to Rucio and fetches a list of RSEs:
2.
rse_type=TAPE\cms_type=test\\rse=T0_CH_CERN_Tape
- Then, MSOutput fetches _ddm_quota _for each RSE and eliminates the RSEs whose quota is less than the size of the output dataset.
- Then, MSOutput makes a weighted random selection from the list of RSEs whose quota is sufficient, where the weight is defined as the quota of the RSE. In other words, it is more likely to choose a tape as a destination if its available space is more than that of others.
-
Lifetime:
- Note that, MSOutput does not specify a
lifetime
parameter for tape placements, which is different from disk placements. So, tape placements are done with the intention that the data will be there forever unless someone wants to delete it on purpose.
- Note that, MSOutput does not specify a
-
Ask for approval
- Each RSE has an attribute which specifies whether it is required to get an approval for the placement or not.
- Note that, if the tape placements are not approved, then CMS might lose data. So, tape placements are done with the assumption that every tape placement will be approved.
- This is a necessity, since some sites need to create the tape libraries before they can receive data.
-
Grouping:
-
Activity:
- It is set to
Production Output
- It is set to
-
Account:
- This is the Rucio account name which will be used while creating rules in Rucio.
- It is set to
wmcore_output
currently. - It is configurable by the
rucioAccount
parameter in MSOutput configuration file.
-
Comment:
- It is set to
WMCore MSOutput output data placement
- It is set to
Starting in March 2022, with this PR: https://github.com/dmwm/WMCore/pull/11024, RelVal Disk output data placement has been redesigned such that the output data policy can be configured - by datatier - in the MSOutput service configuration, as a python object (list/dict). The current policy defined in production is:
data.relvalPolicy = [{"datatier": "GEN-SIM", "destinations": ["T2_CH_CERN"]},
{"datatier": "ALCARECO", "destinations": ["T2_CH_CERN"]},
{"datatier": "default", "destinations": ["T2_CH_CERN"]}]
where all RelVal output datasets are placed under T2_CH_CERN, with a single copy. If the dataset has a datatier that is not defined in the policy, then destination would be set according to the default
value, thus T2_CH_CERN.
If a given datatier is defined to have more than one destination, then the Rucio rule would be modified to have more than 1 copies as well, it would actually have 1 copy for each destination.
Note that this policy is validated during the startup of MSOutput, including the validation of the datatier and destination names. In case the policy is updated, we need to push the new configuration to the CMSWEB MSOutput production system and restart the service.
- Data-tier selection for disk placement:
- As discussed above, whether a data-tier is going to be placed to disk or not is determined by Unified configuration (
tiers_to_DDM
andtiers_no_DDM
) and it seems like this configuration does not have a strong justification behind and it should be re-visited.
- As discussed above, whether a data-tier is going to be placed to disk or not is determined by Unified configuration (
- Lifetimes of relVal and non-RelVal outputs:
- Lifetimes are set as 12 months and 1 month respectively and this does not have a strong justification. This information should be discussed with PPD.
- Data-tier selection for tape placement:
- As discussed above, whether a data-tier is banned for tape placements or not is determined by Unified configuration (
tiers_with_no_custodial
) and it seems like this configuration does not have a strong justification behind and it should be re-visited.
- As discussed above, whether a data-tier is banned for tape placements or not is determined by Unified configuration (
- Asking for approval for tape selections:
- If a tape placement is not approved, then CMS might lose that data. I guess, all tape requests are approved, but it would be good to think about the case where they are not for some reason.
- Considering remaining space:
- Disk and tape selections are performed according to the
weight=ddm_quota
parameter which specifies the CMS quota for each RSE. If I am not wrong, this parameter is static and it does not take the remaining space information into account. This might be an obstacle for distributing the outputs in a balanced manner and it should be re-visited
- Disk and tape selections are performed according to the