Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question About Dsl.ContainerOp Deprecation #4644

Closed
munagekar opened this issue Oct 20, 2020 · 5 comments
Closed

Question About Dsl.ContainerOp Deprecation #4644

munagekar opened this issue Oct 20, 2020 · 5 comments
Assignees

Comments

@munagekar
Copy link
Contributor

munagekar commented Oct 20, 2020

Deprecation

I see that dsl.ContainerOp is being deprecated in favor of reusable components.

I would also like to point out that the Deprecation is not really visible. When you use it to create a pipeline. You can try out one compiling one of the pipelines using dsl.ContainerOp. No there would be no warnings thrown.

Usecase

I find components like the following really useful !

def gcs_download_op(url):
return dsl.ContainerOp(
name='GCS - Download',
image='google/cloud-sdk:279.0.0',
command=['sh', '-c'],
arguments=['gsutil cat $0 | tee $1', url, '/tmp/results.txt'],
file_outputs={
'data': '/tmp/results.txt',
}
)

Alternatives

  • Reusable Components : Too much boilerplate
  • Component from python function: Unnecessary python wrapper. The image must have python installed.

Solutions

  • Can the deprecation be avoided ?
  • Is there any other alternative way to do this?

I have thought of an alternative, which involves using a python function and generating the text for component.yaml and then load_component_from_text. I would happy to contribute this in. Do let me know your thoughts.

def docker_container(image:str, command:str):
    # Generate a Component.yaml text
    text = gen_comp(image,command)
    return load_component_from_text(text)
@munagekar munagekar changed the title Question About Dsl.ContainerOp Question About Dsl.ContainerOp Deprecation Oct 20, 2020
@munagekar
Copy link
Contributor Author

munagekar commented Oct 20, 2020

/assign @Ark-kun

@munagekar
Copy link
Contributor Author

/kind question

@Ark-kun
Copy link
Contributor

Ark-kun commented Oct 22, 2020

I see that dsl.ContainerOp is being deprecated in favor of reusable components.

Note that only the ContainerOp constructor is being deprecated.
ContainerOp objects have too many limitations and cannot be reused. They lose information about and types, default values and even the inputs and their names. Also, hard-coded output paths do not play nice with alternative orchestrators and data passing systems.
Our component library has more than a hundred of components and the users are building more and more of them.

I would also like to point out that the Deprecation is not really visible.

Thank you for reporting this. We'll investigate.

Reusable Components : Too much boilerplate

I wonder whether this is the case. The component.yaml format was designed to be as minimal as possible. This component does not look bigger than your example. And the conversion seems straightforward.

name: GCS - Download
inputs:
- {name: Url}
outputs:
- {name: Data}
implementation:
  container:
    image: 'google/cloud-sdk:279.0.0'
    command: ['gsutil', 'cp', {inputValue: Url}, {outputPath: Data}]

When actively developing command-line components I usually inline them like this:

gcs_download_op = load_component_from_text('''
name: GCS - Download
inputs:
- {name: Url}
outputs:
- {name: Data}
implementation:
  container:
    image: 'google/cloud-sdk:279.0.0'
    command: ['gsutil', 'cp', {inputValue: Url}, {outputPath: Data}]
''')

I have thought of an alternative, which involves using a python function and generating the text for component.yaml

I'd be interested in seeing a small PoC. There are several problems involved, mostly related to inferring inputs and outputs.

@munagekar
Copy link
Contributor Author

Thank you for your detailed explanation.

I'd be interested in seeing a small PoC. There are several problems involved, mostly related to inferring inputs and outputs.

Under the proposed wrapper I was going to hardcode the commands without any support for specifying outputs or inputs.

ContainerOp objects have too many limitations and cannot be reused. They lose information about and types, default values and even the inputs and their names. Also, hard-coded output paths do not play nice with alternative orchestrators and data passing systems.

This proposed wrapper would have had the exact same limitations as the ContainerOp constructor. Using component.yaml seem to be the way forward.

@munagekar
Copy link
Contributor Author

I have sent in #4658 for making the deprecation visible to end-users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants