-
Notifications
You must be signed in to change notification settings - Fork 774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add kubernetes labels #1236
Add kubernetes labels #1236
Conversation
Hey @dhpollack, thanks for looking into this. Metaflow also has the concept of Is there any overlap between |
We use kubernetes labels everywhere in our organization for cost tracking and logging. Some of these labels are not very useful as metaflow tags. |
But specifically about overlap... There is a little but they are not the same nor do I think one should try to use them in the same. For one, your tags are strings, while the kubernetes labels are key-valur pairs. But even from a user perspective, metaflow tags seem designed for data scientists to track experiments in metaflow. While kubernetes labels are what a lot of the kubernetes universe uses for filtering and organizing things that happen on kubernetes... If you changed metaflow tags to mirror kubernetes labels (key-valur, restrictions on length and valid characters, etc etc) then i think you could simply use one of them. But that's a lot more work for you and I think people that want to add kubernetes labels to their pods generally know what kubernetes labels are and wouldn't be too bothered by the additional complexity. |
@shrinandj can you re-run the failed CI test? It looks like it failed but shouldn't have. Probably just a hiccup on the machine. |
ea76440
to
2503653
Compare
Generally looks good. Thanks for doing this @dhpollack! Can you add some details of the tests that were run for testing this PR? Especially, did you create argo workflows with this code and ensure that the pods created by Argo have the correct labels? |
So to test this I used something like the following flow: from metaflow import FlowSpec, step, kubernetes
class TestFlow(FlowSpec):
@step
def start(self):
self.my_var = "hello world"
self.next(self.a)
@kubernetes(labels={"my_label": "my_value", "label2": None, "label3": "val"})
@step
def a(self):
print("the data artifact is: %s" % self.my_var)
self.next(self.b)
@kubernetes(
labels={
"my_label": "superlonglabelthatshouldgethashedbytheautomatedhashingfunctionorsomethinglikethat", # noqa: E501
"my_label2": "ein bißchen falsch",
}
)
@step
def b(self):
pass
self.next(self.end)
@step
def end(self):
print("the data artifact is still: %s" % self.my_var)
if __name__ == "__main__":
TestFlow() Then I ran the following (note imagine each space is a new terminal): export METAFLOW_KUBERNETES_LABELS='env_label=env_var`
python test_workflow.py run --with kubernetes
python test_workflow.py run
python test_workflow.py run --with kubernetes:labels="label1=val1,label2=val2,label3,label4=val4"
# remove kubernetes decorators from the flow (this is actually how we generally would use this)
export METAFLOW_KUBERNETES_LABELS='env_label=env_var`
python test_workflow.py run --with kubernetes
export METAFLOW_KUBERNETES_LABELS='env_label=env_var`
python test_workflow.py argo-workflows create && python test_workflows argo-workflows trigger There are a few quirks.
|
from metaflow.plugins.kubernetes.kubernetes_decorator import KubernetesDecorator | ||
|
||
|
||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! Thank you so much for doing this!
Thanks for the details about testing. It definitely increases the reviewers confidence in the commits. Just a couple of other tests:
|
I forgot to mention that in my comment about the tests. I also tested without labels and with node selectors in different combinations as well (alone, via the decorator, env vars, and using |
@savingoyal Anything you would like done here? |
I download the source code locally and tried to test this change.
|
@shrinandj looking at how metaflow creates the It's related to this issue which I brought up earlier. |
The following code will split https://github.com/Netflix/metaflow/blob/master/metaflow/decorators.py#L132
Update:I do a preprocessing step on the original
|
There's some quirkiness involved related to the shell and the OS involved when there are multiple labels to be specified on the CLI. Using the json style input works just fine in that case. Verified this with:
This change should be good to go! |
* Allow dictionaries in decorator * Convert strings to dictionaries * Make parse node selector function more generic
if requires_both: | ||
item[1] # raise IndexError | ||
if str(item[0]) in ret: | ||
raise KubernetesException("Duplicate key found: %s" % str(item[0])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In its current form the exception does not convey where the duplicate key was found, as it makes no mention of 'labels' when I set duplicate labels.
As this is used for checks other than the labels as well, A more suitable place to perform the duplicate label check might be in the validate_kube_labels
instead. That way the exception can be more specific as well (Duplicate label found)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is used on both labels and node selectors and putting the check here ensures that we catch duplicate keys in both sets.
This reverts commit a992dde.
@dhpollack looks like this PR is backward incompatible - I have created #1359 which reverts this PR so that releases are unblocked. Can you please address this issue and resubmit the PR? |
Yea, looks like it should be a quick fix. I'll work on it tomorrow. |
This reverts commit e68d63f.
* Revert "Revert "Add kubernetes labels (#1236)" (#1359)" This reverts commit e68d63f. * Fix null labels value in argo * Only allow env vars to add labels * Fix empty label case * Adjustments from PR comments * refactor argo kubernetes label getter. add labels to argo-workflow sensors --------- Co-authored-by: Sakari Ikonen <sakari.a.ikonen@gmail.com>
This adds support for adding kubernetes labels to pods. Either via the kubernetes decorator with
labels={"label": "value", "label2": "value2"}
, with an env varMETAFLOW_KUBERNETES_LABELS="label=value,label2=value2"
, or add to the with commandpython hello_world.py run --with kubernetes:labels="cli_label=cli_value"
. I also created an issue for this:#1235