Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement OpenStream func of OSS artifactdriver. Part of #8489 #12908

Merged
merged 1 commit into from
May 4, 2024

Conversation

AlbeeSo
Copy link
Contributor

@AlbeeSo AlbeeSo commented Apr 7, 2024

Part of #8489

Motivation

OSS Artifact Driver do not support OpenStream Func to avoid unnecessary file downloading.

Modifications

Implemented OpenStream Func of OSS Artifact Driver.

Verification

Get an artifact which storages in OSS from UI, check the logs of argo-server.
latest version:

time="2024-04-07T13:03:58.700Z" level=info msg="Efficient artifact streaming is not supported for type *oss.ArtifactDriver: see https://github.com/argoproj/argo-workflows/issues/8489"
time="2024-04-07T13:04:10.072Z" level=info msg="Efficient artifact streaming is not supported for type *oss.ArtifactDriver: see https://github.com/argoproj/argo-workflows/issues/8489"

submitted version:

time="2024-04-07T13:22:29.074Z" level=info msg="OSS OpenStream, key: argo-workflows/input-artifact-oss-k4htb/input-artifact-oss-k4htb/main.log"
time="2024-04-07T13:22:29.092Z" level=info msg="Stream artifact" artifactName=main-logs duration=18.139213ms error="<nil>" key=argo-workflows/input-artifact-oss-k4htb/input-artifact-oss-k4htb/main.log

Co-authored-by: AlbeeSo <suyashi1321@163.com>
Co-authored-by: shuangkun <tsk2013uestc@163.com>
Signed-off-by: AlbeeSo <suyashi1321@163.com>
@shuangkun shuangkun self-assigned this Apr 7, 2024
@shuangkun shuangkun added the area/artifacts S3/GCP/OSS/Git/HDFS etc label Apr 7, 2024
@shuangkun
Copy link
Member

@shuangkun
Copy link
Member

Could you add some info in doc: https://argo-workflows.readthedocs.io/en/latest/configure-artifact-repository/#artifact-streaming

Maybe can add it after it is released.

return false, origErr
}
// directory case:
// todo: make a .tgz file which can be streamed to user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we implement this in the same PR?

Copy link
Contributor Author

@AlbeeSo AlbeeSo Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that OSS SDK (or S3 also) does not currently support operations like "GetObjectsWithPrefix" to a .tgz file. So the only way might be to list objects with prefix first, load, tar them then and stream the .tgz file to user finally, more like a "LoadToStream" way.
https://github.com/argoproj/argo-workflows/blob/34967127bfe304da0f72dc1dba63558a1609b2d0/server/artifacts/artifact_server.go#L184C1-L233C36
Maybe it would be better to still return an html page?

@terrytangyuan
Copy link
Member

Can you test if a relatively large artifact can be viewed successfully as well?

@AlbeeSo
Copy link
Contributor Author

AlbeeSo commented Apr 8, 2024

Can you test if a relatively large artifact can be viewed successfully as well?

I tested with workflow:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-
spec:
  entrypoint: main
  templates:
    - name: main
      container:
        image: alpine:latest
        command:
          - sh
          - -c
        args:
          - |
            mkdir -p /out
            dd if=/dev/random of=/out/testfile.txt bs=20M count=1024
            echo "Download!"
      outputs:
        artifacts:
          - name: out
            path: /out/testfile.txt
            oss:
              endpoint: http://oss-cn-zhangjiakou-internal.aliyuncs.com
              bucket: bucket
              key: argo-workflows/test/bigfile
              accessKeySecret:
                name: my-argo-workflow-credentials
                key: accessKey
              secretKeySecret:
                name: my-argo-workflow-credentials
                key: secretKey

to download a 20GB output file from UI successfully.
Then, I used a video around 3GB instead which could be played normally after downloading. During this period, the memory usage of argo server is stable.

@AlbeeSo
Copy link
Contributor Author

AlbeeSo commented Apr 8, 2024

Could you add some info in doc: https://argo-workflows.readthedocs.io/en/latest/configure-artifact-repository/#artifact-streaming

Maybe can add it after it is released.

Sorry Im little bit confused about the support version, is it v3.5+?

@shuangkun
Copy link
Member

Could you add some info in doc: https://argo-workflows.readthedocs.io/en/latest/configure-artifact-repository/#artifact-streaming

Maybe can add it after it is released.

Sorry Im little bit confused about the support version, is it v3.5+?

No need to add it now. Please add it when a new version is released.

@shuangkun
Copy link
Member

you can change title reference #11823

@shuangkun shuangkun removed their assignment Apr 11, 2024
@terrytangyuan terrytangyuan merged commit ccb71bd into argoproj:main May 4, 2024
27 checks passed
@agilgur5
Copy link
Contributor

agilgur5 commented May 4, 2024

No need to add it now. Please add it when a new version is released.

We should try to get docs in ASAP with features, ideally within the same PR. The longer we wait the more likely that something is either undocumented or documented incorrectly.

Sorry Im little bit confused about the support version, is it v3.5+?

It would be v3.6+.

We have versioned documentation now (#11390) and already have various docs for new features coming to v3.6 with appropriate version specifiers (#12581)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants