Volume artifacts #1227

dtaniwaki · 2019-02-22T09:15:25Z

I know this is not a great practice in ML on Kubernetes, but saving/loading files from/to NFS is very useful for ML researchers who highly depend on NFS in their daily tasks. To provide NFS access, we restrict users of their containers as their own UID because running containers with root can do access any files in mounted NFS volume. So I made the following changes in this PR.

Implemented volume artifact
Added security context config of executors
Ensure write permission of /argo directory in executors

The reason why I just don't use mounted NFS volume in the main container is performance overhead. In legacy Deep Learning jobs, we want to interact with millions of files, but we want to avoid too many requests over network, so loading gzipped files from NFS to local and saving gzipped files from local to NFS make sense. Honestly speaking, this way of file interacting in Deep Learning jobs is still not smart and there's new method like RecordIO of MXNet, but many researchers haven't got used to it.

Do you think this artifact is useful for others too?

dtaniwaki · 2019-02-23T01:10:40Z

I think the idle timeout of make lint is too short and I observe it sometimes cause timeout error in CI.

Ark-kun · 2019-03-12T17:40:23Z

I'm not sure I understand what value does this provide over volumeMount:

Current:

  - name: whalesay
    outputs:
      artifacts:
      - name: hello-art
        path: /argo_outputs/hello-art.txt
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["cowsay hello world | tee /argo_outputs/hello-art.txt"]
      volumeMounts:
      - name: vol01
        path: /argo_outputs

Proposed:

  - name: whalesay
    outputs:
      artifacts:
      - name: hello-art
        path: /argo_outputs/hello-art.txt
        volume:
          name: vol01
          path: /argo_outputs
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["cowsay hello world | tee /argo_outputs/hello-art.txt"]

If someone wants to store many files as a single zip, then they can just work with files in a temp directory and then zip it to the mounted volume. With this strategy the user has the flexibility to do what they want:

Write the files directly to the output volume vs. work in a temp dir and then copy final results
Pack (any format) vs. not pack
Any directory structure.

I'm all for a volume-based artifact-passing solution, but I do not see this PR adding much value over volumeMounts.

Compare it with a system like this:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: volume-artifact-passing-
spec:
  entrypoint: volume-artifact-example
  artifactStorage:
    volume: #Will automatically add volume mounts to all task pods
      persistentVolumeClaim:
        claimName: vol01
      subPath: /artifact_storage/{{workflow.uid}}/{{pod.name}}/{{io.name}}/
      inputMountPath: /argo/inputs/{{io.name}}/
      outputMountPath: /argo/outputs/{{io.name}}/
 
 templates:

  - name: producer
    outputs:
      artifacts:
      - name: out-art
        generatePath: True #The local output path will be auto-generated following the "outputMountPath: /argo/outputs/{{io.name}}/data" template. We can make this the default when the `path` is missing
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["cowsay hello world | tee {{outputs.artifacts.out-art.path}}"]
      #{{outputs.artifacts.out-art.path}} will resolve to the path prepared for the out-art output. E.g. /argo/outputs/out-art/data

  - name: consumer
    inputs:
      artifacts:
      - name: in-art
        generatePath: True #The local input path will be auto-generated following the "inputMountPath: /argo/inputs/{{io.name}}/data" template. We can make this the default when the `path` is missing
    container:
      image: busybox
      command: [sh, -c]
      args: ["cat {{inputs.artifacts.in-art.path}}"]
      #{{inputs.artifacts.in-art.path}} will resolve to the path prepared for the out-art output. E.g. /argo/inputs/in-art/data

This system will setup volume-based artifact passing for you and relieve the user from manually managing the volumeMounts and paths.

dtaniwaki · 2019-03-13T02:47:47Z

I understand your solution also works.

My point is if a user has to write exporting logic by himself, the usability of artifacts becomes different. In other words, he needs to change his workflow a lot if he wants to change artifacts to S3.

Ark-kun · 2019-03-14T21:40:36Z

Just trying to understand the difference between the proposed approach and the current approach ( volumeMount):

Current:

  - name: whalesay
    outputs:
      artifacts:
      - name: hello-art
        path: /argo_outputs/hello-art.txt
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["cowsay hello world | tee /argo_outputs/hello-art.txt"]
      volumeMounts:
      - name: vol01
        path: /argo_outputs

Proposed:

  - name: whalesay
    outputs:
      artifacts:
      - name: hello-art
        path: /argo_outputs/hello-art.txt
        volume:
          name: vol01
          path: /argo_outputs
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["cowsay hello world | tee /argo_outputs/hello-art.txt"]

It looks like it just moves the volumeMounts section around a bit.

dtaniwaki · 2019-03-15T04:33:28Z

As I wrote in the PR,

The reason why I just don't use mounted NFS volume in the main container is performance overhead. In legacy Deep Learning jobs, we want to interact with millions of files, but we want to avoid too many requests over network, so loading gzipped files from NFS to local and saving gzipped files from local to NFS make sense. Honestly speaking, this way of file interacting in Deep Learning jobs is still not smart and there's new method like RecordIO of MXNet, but many researchers haven't got used to it.

We don't want to touch too many times on NFS volume.

So your example above is not wrong, but it's not the best way to compare the advantage of this feature.

The ideal comparison is...

Current (bad way):

  - name: whalesay
    outputs:
      artifacts:
      - name: hello-art
        path: /argo_outputs/tmp.tar.gz
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["for i in {1..10000}; do dd if=/dev/urandom of=/argo_outputs/sample-$i.txt bs=1K count=1; sleep 10; done; tar cvzf /argo_outputs/tmp.tar.gz /argo_outputs/sample-*.txt"]
      volumeMounts:
      - name: vol01
        path: /argo_outputs

Current (better way):

  - name: whalesay
    outputs:
      artifacts:
      - name: hello-art
        path: /argo_outputs/tmp.tar.gz
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["for i in {1..10000}; do dd if=/dev/urandom of=/tmp/sample-$i.txt bs=1K count=1; sleep 10; done; tar cvzf /argo_outputs/tmp.tar.gz /tmp"]
      volumeMounts:
      - name: vol01
        path: /argo_outputs

Proposed:

  - name: whalesay
    outputs:
      artifacts:
      - name: hello-art
        path: /tmp
        volume:
          name: vol01
          path: /argo_outputs/tmp.tar.gz
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["for i in {1..10000}; do dd if=/dev/urandom of=/tmp/sample-$i.txt bs=1K count=1; sleep 10; done"]

Ark-kun · 2019-04-19T22:49:54Z

From my perspective, the "Current (better way)" is a sweet spot.
It requires just a few extra characters in the code (tar cvzf ...) and is very flexible.

AFAIK, In the "Proposed" variant there is some implicit archiving configuration that cannot be changed. And if we allow changing it, the the implementation becomes too complex for the number of characters saved (it's already too complex).

fj-sanchez · 2019-04-20T10:41:40Z

I'm all for a volume-based artifact-passing solution, but I do not see this PR adding much value over volumeMounts.

@Ark-kun I think it would be awesome to have implemented something like what you described, is there any other issue where this is captured? I don't think this should be lost in the comments...

Ark-kun · 2019-04-30T19:47:37Z

I'm all for a volume-based artifact-passing solution, but I do not see this PR adding much value over volumeMounts.

@Ark-kun I think it would be awesome to have implemented something like what you described, is there any other issue where this is captured? I don't think this should be lost in the comments...

I've created the issue #1349 I'm working on implementing this.

dtaniwaki · 2019-05-01T05:31:11Z

@fj-sanchez Is it possible to achieve it with the change of this PR and the ArchiveLocation combo?
https://github.com/argoproj/argo/blob/4e37a444bde2a034885d0db35f7b38684505063e/pkg/apis/workflow/v1alpha1/types.go#L226

Ark-kun · 2019-10-16T21:14:44Z

Here is a draft version of my volume-based data passing rewriter script: https://github.com/Ark-kun/pipelines/blob/SDK---Compiler---Added-support-for-volume-based-data-passing/sdk/python/kfp/compiler/_data_passing_using_volume.py
It can be run as a command-line program to rewrite Argo Workflow from using artifacts to volume-based data passing.

What does everyone think? Please comment at #1349

dtaniwaki · 2019-11-22T04:34:53Z

I'm closing this issue as I no longer need it and it's not desired by others too.

dtaniwaki added 8 commits February 22, 2019 16:22

Implement volume artifact

918d133

Define CopyFile as util

330626d

Set security context of executor

3f2250f

Add more validations

0938b5a

Log file path in volume artifact

0b9ca77

Merge input and output examples of volume artifact

ccc61e9

Fix lint error

8ebeff1

Fix test

cace35b

Merge remote-tracking branch 'upstream/master' into volume-artifacts

f4461a9

Ark-kun mentioned this pull request Apr 30, 2019

Volume-based artifact passing system #1349

Open

dtaniwaki closed this Nov 22, 2019

dtaniwaki deleted the volume-artifacts branch November 22, 2019 04:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Volume artifacts #1227

Volume artifacts #1227

dtaniwaki commented Feb 22, 2019 •

edited

Loading

dtaniwaki commented Feb 23, 2019

Ark-kun commented Mar 12, 2019 •

edited

Loading

dtaniwaki commented Mar 13, 2019

Ark-kun commented Mar 14, 2019

dtaniwaki commented Mar 15, 2019 •

edited

Loading

Ark-kun commented Apr 19, 2019

fj-sanchez commented Apr 20, 2019

Ark-kun commented Apr 30, 2019

dtaniwaki commented May 1, 2019

Ark-kun commented Oct 16, 2019 •

edited

Loading

dtaniwaki commented Nov 22, 2019

Volume artifacts #1227

Volume artifacts #1227

Conversation

dtaniwaki commented Feb 22, 2019 • edited Loading

dtaniwaki commented Feb 23, 2019

Ark-kun commented Mar 12, 2019 • edited Loading

dtaniwaki commented Mar 13, 2019

Ark-kun commented Mar 14, 2019

dtaniwaki commented Mar 15, 2019 • edited Loading

Ark-kun commented Apr 19, 2019

fj-sanchez commented Apr 20, 2019

Ark-kun commented Apr 30, 2019

dtaniwaki commented May 1, 2019

Ark-kun commented Oct 16, 2019 • edited Loading

dtaniwaki commented Nov 22, 2019

dtaniwaki commented Feb 22, 2019 •

edited

Loading

Ark-kun commented Mar 12, 2019 •

edited

Loading

dtaniwaki commented Mar 15, 2019 •

edited

Loading

Ark-kun commented Oct 16, 2019 •

edited

Loading