Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kubernetes labels #1236

Merged
merged 23 commits into from
Apr 17, 2023
Merged

Conversation

dhpollack
Copy link
Contributor

@dhpollack dhpollack commented Jan 16, 2023

This adds support for adding kubernetes labels to pods. Either via the kubernetes decorator with labels={"label": "value", "label2": "value2"}, with an env var METAFLOW_KUBERNETES_LABELS="label=value,label2=value2", or add to the with command python hello_world.py run --with kubernetes:labels="cli_label=cli_value". I also created an issue for this:

#1235

@savingoyal savingoyal self-requested a review January 16, 2023 15:18
@shrinandj
Copy link
Contributor

Hey @dhpollack, thanks for looking into this. Metaflow also has the concept of tags (https://docs.metaflow.org/scaling/tagging).

Is there any overlap between tags and labels? And could there be some option to simply use one to avoid confusion? WDYT?

@dhpollack
Copy link
Contributor Author

dhpollack commented Jan 16, 2023

Hey @dhpollack, thanks for looking into this. Metaflow also has the concept of tags (https://docs.metaflow.org/scaling/tagging).

Is there any overlap between tags and labels? And could there be some option to simply use one to avoid confusion? WDYT?

We use kubernetes labels everywhere in our organization for cost tracking and logging. Some of these labels are not very useful as metaflow tags.

@dhpollack
Copy link
Contributor Author

Hey @dhpollack, thanks for looking into this. Metaflow also has the concept of tags (https://docs.metaflow.org/scaling/tagging).

Is there any overlap between tags and labels? And could there be some option to simply use one to avoid confusion? WDYT?

But specifically about overlap... There is a little but they are not the same nor do I think one should try to use them in the same.

For one, your tags are strings, while the kubernetes labels are key-valur pairs. But even from a user perspective, metaflow tags seem designed for data scientists to track experiments in metaflow. While kubernetes labels are what a lot of the kubernetes universe uses for filtering and organizing things that happen on kubernetes...

If you changed metaflow tags to mirror kubernetes labels (key-valur, restrictions on length and valid characters, etc etc) then i think you could simply use one of them. But that's a lot more work for you and I think people that want to add kubernetes labels to their pods generally know what kubernetes labels are and wouldn't be too bothered by the additional complexity.

@dhpollack
Copy link
Contributor Author

@shrinandj can you re-run the failed CI test? It looks like it failed but shouldn't have. Probably just a hiccup on the machine.

@dhpollack dhpollack force-pushed the add-kubernetes-labels branch 4 times, most recently from ea76440 to 2503653 Compare January 18, 2023 10:50
@shrinandj
Copy link
Contributor

Generally looks good. Thanks for doing this @dhpollack! Can you add some details of the tests that were run for testing this PR? Especially, did you create argo workflows with this code and ensure that the pods created by Argo have the correct labels?

@dhpollack
Copy link
Contributor Author

dhpollack commented Jan 24, 2023

So to test this I used something like the following flow:

from metaflow import FlowSpec, step, kubernetes

class TestFlow(FlowSpec):
    @step
    def start(self):
        self.my_var = "hello world"
        self.next(self.a)

    @kubernetes(labels={"my_label": "my_value", "label2": None, "label3": "val"})
    @step
    def a(self):
        print("the data artifact is: %s" % self.my_var)
        self.next(self.b)

    @kubernetes(
        labels={
            "my_label": "superlonglabelthatshouldgethashedbytheautomatedhashingfunctionorsomethinglikethat",  # noqa: E501
            "my_label2": "ein bißchen falsch",
        }
    )
    @step
    def b(self):
        pass
        self.next(self.end)

    @step
    def end(self):
        print("the data artifact is still: %s" % self.my_var)

if __name__ == "__main__":
    TestFlow()

Then I ran the following (note imagine each space is a new terminal):

export METAFLOW_KUBERNETES_LABELS='env_label=env_var`
python test_workflow.py run  --with kubernetes

python test_workflow.py run

python test_workflow.py run --with kubernetes:labels="label1=val1,label2=val2,label3,label4=val4"

# remove kubernetes decorators from the flow (this is actually how we generally would use this)
export METAFLOW_KUBERNETES_LABELS='env_label=env_var`
python test_workflow.py run --with kubernetes

export METAFLOW_KUBERNETES_LABELS='env_label=env_var`
python test_workflow.py argo-workflows create && python test_workflows argo-workflows trigger

There are a few quirks.

  1. If you set an env var and use --with kubernetes the steps decorated with the kubernetes labels overwrite the env var labels completely.
  2. The argo labels seem to work, but it feels a bit like magic because metaflow automatically imports the kubernetes decorator and that adds the labels. If you look at the argo template that gets generated, the labels are not in the argo metadata. But I feel like that was ok, because if you wanted to set those then perhaps you'd have a separate parameter for the argo decorator that is a flow decorator rather than a step decorator.

from metaflow.plugins.kubernetes.kubernetes_decorator import KubernetesDecorator


@pytest.mark.parametrize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Thank you so much for doing this!

@shrinandj
Copy link
Contributor

Thanks for the details about testing. It definitely increases the reviewers confidence in the commits.

Just a couple of other tests:

  • Can you test with K8s labels AND nodeSelectors since this code touches that functionality as well.
  • Can you test original @kubernetes and --with kubernetes functionality without labels. Basically, to ensure that nothing has regressed because of this change.

@dhpollack
Copy link
Contributor Author

Thanks for the details about testing. It definitely increases the reviewers confidence in the commits.

Just a couple of other tests:

* Can you test with K8s labels AND nodeSelectors since this code touches that functionality as well.

* Can you test original `@kubernetes` and `--with kubernetes` functionality without labels. Basically, to ensure that nothing has regressed because of this change.

I forgot to mention that in my comment about the tests. I also tested without labels and with node selectors in different combinations as well (alone, via the decorator, env vars, and using --with kubernetes:.... The main change was the change in the runtime_step_cli. Previously, if you had more than one node selector, then it would create a tuple with a single item like ('label1=val1,label2=val2',) and now it will do ('label1=val1', 'label2=val2') which I think was just a bug in the previous code.

shrinandj
shrinandj previously approved these changes Jan 25, 2023
@dhpollack
Copy link
Contributor Author

@savingoyal Anything you would like done here?

@shrinandj
Copy link
Contributor

I download the source code locally and tried to test this change.

  • Ran a flow with Kubernetes and a single label. This worked just fine.
$ python3 /tmp/a.py run --with kubernetes:labels="abc=pqr"
...
  • Ran a flow with Kubernetes decorator and added a label there. This worked just fine.
    @kubernetes(labels={"abc":"pqr"})
  • Ran a flow with Kubernetes decorator and add two labels. This worked just fine too.
    @kubernetes(labels={"abc":"pqr", "lmn":"xyz"})
  • Ran a flow with Kubernetes on cli with two labels. THIS FAILED with the following error
$ python3 a.py run --with kubernetes:labels="abc=pqr,lmn=xyz"
Metaflow 2.7.19 executing StarterFlow for user:shri
    Unknown decorator attribute:
    Decorator 'kubernetes' does not support the attribute 'lmn'. These attributes are supported: cpu, memory, disk, image, service_account, secrets, node_selector, labels, namespace, gpu, gpu_vendor, tolerations.
  • exported the env variable METAFLOW_KUBERNETES_LABELS with one label and ran a flow. This worked just fine.

  • exported the env variable METAFLOW_KUBERNETES_LABELS with two labels and ran a flow. This worked just fine too.

  • Ran a few other flows with Kubernetes specifying cpu, memory, etc. on the cli. They worked just fine.

@dhpollack
Copy link
Contributor Author

dhpollack commented Jan 31, 2023

@shrinandj looking at how metaflow creates the --with kubernetes string for argo. It actually does something like --with kubernetes:labels='{"abc":"pqr", "lmn":"xyz"}'. I am not sure if this is undesired behavior or bad documentation (on my part). The way that I had it before was based on the node selector, but that also did not work correctly when having multiple node selectors.

#1236 (comment)

It's related to this issue which I brought up earlier.

@dhpollack
Copy link
Contributor Author

dhpollack commented Feb 1, 2023

The following code will split "labels=l1=v1,l2=v2" as [("labels", "l1=v1"), ("l2", "v2")]. It looks like the only way to have multiple objects in this format is to use a json object.

https://github.com/Netflix/metaflow/blob/master/metaflow/decorators.py#L132

    @classmethod
    def _parse_decorator_spec(cls, deco_spec):
        if len(deco_spec) == 0:
            return cls()

        attrs = {}
        # TODO: Do we really want to allow spaces in the names of attributes?!?
        for a in re.split(""",(?=[\s\w]+=)""", deco_spec):
            name, val = a.split("=", 1)
            ...

Update:

I do a preprocessing step on the original _parse_decorator_spec method to turn labels=l1=v1,l2=v2 into json that will be properly parsed by the original parser. I tested your case above. I also tested using an additional valid parameter, an invalid parameter, and a valid+invalid parameter.

run --with kubernetes:labels=l1=v1,l2=v2

run --with kubernetes:labels=l1=v1,l2=v2,cpu=2

run --with kubernetes:labels=l1=v1,l2=v2,invalid=something

run --with kubernetes:labels=l1=v1,l2=v2,cpu=2,invalid=something

run --with kubernetes:labels='{"l1":"v1","l2":"v2"}'

METAFLOW_KUBERNETES_LABELS=l1=v1,l2=v2 ... run --with kubernetes

@shrinandj
Copy link
Contributor

There's some quirkiness involved related to the shell and the OS involved when there are multiple labels to be specified on the CLI. Using the json style input works just fine in that case.

Verified this with:

python a.py run --with kubernetes:labels='{"l1":"v1","l2":"v2"}'

This change should be good to go!

shrinandj
shrinandj previously approved these changes Feb 3, 2023
* Allow dictionaries in decorator
* Convert strings to dictionaries
* Make parse node selector function more generic
shrinandj
shrinandj previously approved these changes Mar 13, 2023
shrinandj
shrinandj previously approved these changes Mar 15, 2023
if requires_both:
item[1] # raise IndexError
if str(item[0]) in ret:
raise KubernetesException("Duplicate key found: %s" % str(item[0]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In its current form the exception does not convey where the duplicate key was found, as it makes no mention of 'labels' when I set duplicate labels.

As this is used for checks other than the labels as well, A more suitable place to perform the duplicate label check might be in the validate_kube_labels instead. That way the exception can be more specific as well (Duplicate label found)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used on both labels and node selectors and putting the check here ensures that we catch duplicate keys in both sets.

@savingoyal savingoyal merged commit a992dde into Netflix:master Apr 17, 2023
savingoyal added a commit that referenced this pull request Apr 17, 2023
@savingoyal
Copy link
Collaborator

@dhpollack looks like this PR is backward incompatible -

Screen Shot 2023-04-17 at 3 32 17 PM

I have created #1359 which reverts this PR so that releases are unblocked. Can you please address this issue and resubmit the PR?

savingoyal added a commit that referenced this pull request Apr 17, 2023
@dhpollack
Copy link
Contributor Author

@dhpollack looks like this PR is backward incompatible -

Screen Shot 2023-04-17 at 3 32 17 PM

I have created #1359 which reverts this PR so that releases are unblocked. Can you please address this issue and resubmit the PR?

Yea, looks like it should be a quick fix. I'll work on it tomorrow.

dhpollack added a commit to dhpollack/metaflow that referenced this pull request Apr 18, 2023
savingoyal pushed a commit that referenced this pull request May 23, 2023
* Revert "Revert "Add kubernetes labels (#1236)" (#1359)"

This reverts commit e68d63f.

* Fix null labels value in argo

* Only allow env vars to add labels

* Fix empty label case

* Adjustments from PR comments

* refactor argo kubernetes label getter. add labels to argo-workflow sensors

---------

Co-authored-by: Sakari Ikonen <sakari.a.ikonen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants