Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev: parse: cli_to_xml: populate advised monitor/start/stop operations values #1038

Merged

Conversation

liangxin1300
Copy link
Collaborator

@liangxin1300 liangxin1300 commented Oct 17, 2022

Changes include:

  • Set the advised operation values if monitor/start/stop not input in command line
crm(live/15sp4-1)configure# primitive vip IPaddr2 params ip=10.10.10.176
crm(live/15sp4-1)configure# show
node 1: 15sp4-1
primitive vip IPaddr2 \
        params ip=10.10.10.176 \
        op monitor timeout=20s interval=10s \
        op start timeout=20s interval=0s \
        op stop timeout=20s interval=0s
  • Complete advised operation values if miss interval or timeout attributes
crm(live/15sp4-1)configure# primitive vip IPaddr2 params ip=10.10.10.176 op monitor timeout=22s op start timeout=23s
crm(live/15sp4-1)configure# show
node 1: 15sp4-1
primitive vip IPaddr2 \
        params ip=10.10.10.176 \
        op monitor timeout=22s interval=10s \
        op start timeout=23s interval=0s \
        op stop timeout=20s interval=0s
  • Keep origin configured RA unchanged
crm(live/15sp4-1)configure# show
node 1: 15sp4-1
node 2: 15sp4-2
primitive d Dummy
primitive d Dummy \
        op monitor timeout=20s interval=10s \
        op start timeout=20s interval=0s \
        op stop timeout=20s interval=0s

d was original configured
d2 was configured after applied this PR

  • And for sure, the sbd ra will be configured like this after bootstraping:
primitive stonith-sbd stonith:external/sbd \
        op monitor timeout=20 interval=3600 \
        op start timeout=20 interval=0s \
        op stop timeout=15 interval=0s
  • For the Promoted/Unpromoted resource
crm(live/15sp4-1)configure# primitive st ocf:pacemaker:Stateful
crm(live/15sp4-1)configure# show
primitive st ocf:pacemaker:Stateful \
        op monitor timeout=20s interval=10s role=Promoted \
        op monitor timeout=20s interval=11s role=Unpromoted \
        op start timeout=20s interval=0s \
        op stop timeout=20s interval=0s \
        op promote timeout=10s interval=0s \
        op demote timeout=10s interval=0s

@zzhou1
Copy link
Contributor

zzhou1 commented Oct 17, 2022

From my impression, I prefer to change the commit title to make the use case it a little more clear, something like,

From

Dev: parse: Populate some default operations when configuring primitive resource

To

Dev: parse: cli_to_xml: populate default monitor/start/stop operations

@liangxin1300 liangxin1300 force-pushed the 20221012_default_option branch from be11111 to 3604f76 Compare October 17, 2022 09:32
@liangxin1300 liangxin1300 changed the title Dev: parse: Populate some default operations when configuring primitive resource Dev: parse: cli_to_xml: populate default monitor/start/stop operations Oct 17, 2022
@liangxin1300 liangxin1300 force-pushed the 20221012_default_option branch from 3604f76 to 2e773e6 Compare October 17, 2022 09:42
crmsh/ra.py Outdated Show resolved Hide resolved
@liangxin1300 liangxin1300 force-pushed the 20221012_default_option branch 10 times, most recently from fa77847 to 26fd13f Compare October 21, 2022 08:59
@@ -523,4 +523,6 @@
REJOIN_COUNT = 60
REJOIN_INTERVAL = 10
DC_DEADTIME_DEFAULT = 20

DEFAULT_ACTION_LIST = ['monitor', 'start', 'stop']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from these, technically promotable resources also support promote and demote actions.

crmsh/parse.py Outdated
@@ -653,6 +655,51 @@ def match_arguments(self, out, name_map, implicit_initial=None, terminator=None)
else:
break

self.complete_op_action_default_value(out)

def complete_op_action_default_value(self, out):
Copy link
Member

@gao-yan gao-yan Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we call it something like complete_advised_ops? Which is more precise. Basically because they are not actually "default". They are just "advised" by the authors of the RAs, and need to be explicitly configured to be applied anyways.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense
I will change related places and commit logs

crmsh/parse.py Outdated
action_default_attr_dict[action] = {}
if action in ra_actions_dict:
_dict = ra_actions_dict[action][0] if isinstance(ra_actions_dict[action], list) else ra_actions_dict[action]
action_default_attr_dict[action] = {k:v for k, v in _dict.items() if k in ['interval', 'timeout']}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if/how these cases are currently handled:

  • For promotable resources, they usually have two "monitor" actions specified with respective "role", which we should probably like to populate both for users. And be careful with the role names we put into the configuration for the different OCF standards, "Master"/"Slave" vs "Promoted"/"Unpromoted".

  • Some resource agents might have multiple "monitor" actions specified with different "depth". We might want to populate all of them as well and transform "depth" to "OCF_CHECK_LEVEL".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is not considered yet
Good point. Thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crm(live/15sp4-1)configure# primitive st ocf:pacemaker:Stateful
crm(live/15sp4-1)configure# show
node 1: 15sp4-1
node 2: 15sp4-2
primitive st ocf:pacemaker:Stateful \
        op monitor timeout=20s interval=10s role=Promoted \
        op_params depth=0 \
        op monitor timeout=20s interval=11s role=Unpromoted \
        op_params depth=0 \
        op start timeout=20s interval=0s \
        op stop timeout=20s interval=0s \
        op promote timeout=10s interval=0s \
        op demote timeout=10s interval=0s

@@ -830,14 +830,13 @@ def parse_cli_to_xml(cli, oldnode=None):
for s in lines2cli(cli):
node = parse.parse(s, comments=comments)
else: # should be a pre-tokenized list
node = parse.parse(cli, comments=comments, ignore_empty=False)
node = parse.parse(cli, comments=comments, ignore_empty=False, complete_op_default=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure for what cases the "automatic population" feature is applied by this.

I think it's quite straightforward that it's suitable upon creating a new primitive, especially with interactive crm (live/...)configure# primitive commands. Should we do it for non-interactive crm configure primitive commands at all? And we probably shouldn't do this for already existing primitives, or the cases where there's already existing configuration for example crm configure load, otherwise it could be surprising/unexpected for users?

How does pcs behave in such different use cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And/or a command-line option that can be specified to disable/enable the "automatic population of operations" would be useful AFAICT.

Copy link
Collaborator Author

@liangxin1300 liangxin1300 Oct 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already considered this

primitive d Dummy
primitive d2 Dummy \
        op monitor timeout=20s interval=10s \
        op start timeout=20s interval=0s \
        op stop timeout=20s interval=0s

d is from a xml file added by running crm configure load xml replace <xml_file>
d2 is from a command line

Since the line 830

node = parse.parse(s, comments=comments)

Which not contain the complete_op_default parameter, will handle all existing configurations

@liangxin1300 liangxin1300 changed the title Dev: parse: cli_to_xml: populate default monitor/start/stop operations Dev: parse: cli_to_xml: populate advised monitor/start/stop operations values Oct 24, 2022
@liangxin1300 liangxin1300 force-pushed the 20221012_default_option branch 2 times, most recently from d0e4517 to d548aa7 Compare October 24, 2022 15:32
@liangxin1300
Copy link
Collaborator Author

liangxin1300 commented Oct 28, 2022

After updated, it looks like now:

crm(live/15sp4-1)configure# primitive vip IPaddr2 params ip=10.10.10.176
crm(live/15sp4-1)configure# show
node 1: 15sp4-1
primitive vip IPaddr2 \
        params ip=10.10.10.176 \
        op monitor timeout=20s interval=10s \
        op_params depth=0 \
        op start timeout=20s interval=0s \
        op stop timeout=20s interval=0s

crm(live/15sp4-1)configure# primitive vip IPaddr2 params ip=10.10.10.176 op monitor timeout=22s op start timeout=23s
crm(live/15sp4-1)configure# show
node 1: 15sp4-1
primitive vip IPaddr2 \
        params ip=10.10.10.176 \
        op monitor timeout=22s interval=10s \
        op_params depth=0 \
        op start timeout=23s interval=0s \
        op stop timeout=20s interval=0s

crm(live/15sp4-1)configure# primitive st ocf:pacemaker:Stateful
crm(live/15sp4-1)configure# show
node 1: 15sp4-1
primitive st ocf:pacemaker:Stateful \
        op monitor timeout=20s interval=10s role=Promoted \
        op_params depth=0 \
        op monitor timeout=20s interval=11s role=Unpromoted \
        op_params depth=0 \
        op start timeout=20s interval=0s \
        op stop timeout=20s interval=0s \
        op promote timeout=10s interval=0s \
        op demote timeout=10s interval=0s

I'm wondering if just skip depth since it seems rarely used

@liangxin1300
Copy link
Collaborator Author

Hi @fmherschel, @lpinne
Any comment about this PR?

Thanks!

@liangxin1300 liangxin1300 force-pushed the 20221012_default_option branch 6 times, most recently from 1e56956 to cd0daf2 Compare November 7, 2022 15:01
@liangxin1300 liangxin1300 force-pushed the 20221012_default_option branch from cd0daf2 to 1455848 Compare November 7, 2022 15:17
…s values

Changes include:
- Set the advised operation values if monitor/start/stop not input in command line
- Keep origin configured RA unchanged
for k, v in v_dict.items():
# set normal attributes
if k in constants.ADVISED_KEY_LIST:
op_node.set(k, v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's occurring to me, depending on whether the schema version of CIB XML supports the new terminology Promoted/Unpromoted, we need to handle the role here accordingly like other places where handle_role_for_ocf_1_1() is called: https://github.com/ClusterLabs/crmsh/blob/master/crmsh/utils.py#L3065

# complete advised value if interval or timeout not configured
if action == "monitor":
adv_interval = extract_advised_monitor_value(action_advised_attr_dict, 'interval', op_node.get('role')) or \
constants.DEFAULT_INTERVAL_IN_ACTION
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the schema for the cib XML, interval is a required attribute. It makes sense to define a default interval value to be added for a monitor op in case it's neither specified by the user nor advised in the metadata.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I defined in constants.DEFAULT_INTERVAL_IN_ACTION

adv_interval = extract_advised_monitor_value(action_advised_attr_dict, 'interval', op_node.get('role')) or \
constants.DEFAULT_INTERVAL_IN_ACTION
adv_timeout = extract_advised_monitor_value(action_advised_attr_dict, 'timeout', op_node.get('role')) or \
constants.DEFAULT_TIMEOUT_IN_ACTION
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, timeout is an optional attribute, meaning pacemaker has an internal default (20s), or users could have their own global default defined in op_defaults section. So we don't really need yet another crmsh defined default here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's no harm to keep this?:)

Copy link
Member

@gao-yan gao-yan Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case timeout is not set by the user meanwhile no value is advised in the metadata, we'd like to omit the setting rather than taking a crmsh-default DEFAULT_TIMEOUT_IN_ACTION.

constants.DEFAULT_TIMEOUT_IN_ACTION
if op_node.get('interval') is None:
op_node.set('interval', adv_interval)
if op_node.get('timeout') is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the logic here could rather be, if timeout is not set by the user meanwhile there's a timeout value explicitly advised in the metadata, we set with the advised one.

for op_node in op_nodes_list:
action = op_node.get('name')
# complete advised value if interval or timeout not configured
if action == "monitor":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what about other operations than monitor?

I assume interval=0 is automatically added if it's missing for those cases? What if timeout is missing while it's value advised in the metadata?

Copy link
Collaborator Author

@liangxin1300 liangxin1300 Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!
Changed in 957e606

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants