Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

downloading artifact from s3 in ui, timed out waiting for condition #2129

Closed
3 tasks done
haghabozorgi opened this issue Jan 31, 2020 · 44 comments · Fixed by #2143 or #2147
Closed
3 tasks done

downloading artifact from s3 in ui, timed out waiting for condition #2129

haghabozorgi opened this issue Jan 31, 2020 · 44 comments · Fixed by #2143 or #2147
Assignees
Labels
type/bug type/regression Regression from previous behavior (a specific type of bug)
Milestone

Comments

@haghabozorgi
Copy link

Checklist:

  • I've included the version.
  • I've included reproduction steps.
  • [] I've included the workflow YAML.
  • I've included the logs.

What happened:
Installed the latest 2.5.0rc7 via install.yaml on eks 1.14 and added to the install.yaml the diff shown in the output below, so the archivedLogs and s3 config are enabled (workflow-controller-configmap):

336,344d322
< data:
<   config: |
<     artifactRepository:
<       archiveLogs: true
<       s3:
<         bucket: "example-argo"
<         keyPrefix: "example"
<         endpoint: "s3.amazonaws.com"
< 

to gain ui access on localhost to the ui in kubernetes
kubectl port-forward svc/argo-server 2746:2746 -n argo

run a basic hello world workflow via argo cli and the workflow completes as expected, and clicking on artifacts link in ui shows main-logs object as expected, but when you click to download the actual artfact in the ui, the browser eventually returns a "timed out waiting on condition"

What you expected to happen:
I expect clicking on the link to download the requested artifact.

How to reproduce it (as minimally and precisely as possible):
install.yaml with a s3 config similar to above and run any workflow, and then try and download the resulting main-log artifact.

Logs
argo-server log shows:

time="2020-01-31T23:02:06Z" level=info msg="S3 Load path: artifact368826374, key: example/local-script-gd5zj/local-script-gd5zj/main.log"
time="2020-01-31T23:02:06Z" level=info msg="Creating minio client s3.amazonaws.com using IAM role"
time="2020-01-31T23:02:06Z" level=info msg="Getting from s3 (endpoint: s3.amazonaws.com, bucket: example-argo, key: example/local-script-gd5zj/local-script-gd5zj/main.log) to artifact368826374"
time="2020-01-31T23:02:06Z" level=warning msg="Failed get file: Get https://s3.amazonaws.com/example-argo/?location=: x509: certificate signed by unknown authority"

Message from the maintainers:

If you are impacted by this bug please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

@alexec alexec added this to the v2.5 milestone Feb 1, 2020
@alexec alexec added the type/regression Regression from previous behavior (a specific type of bug) label Feb 1, 2020
@alexec
Copy link
Contributor

alexec commented Feb 1, 2020

@sarabala1979 before I investigate - have you seen similar before please?

@ddseapy
Copy link
Contributor

ddseapy commented Feb 1, 2020

generally this kind of error is because the CA that signed s3.amazonaws.com isn't being used in the request. Depending on your situation & security requirements (https://serverfault.com/questions/444186/is-it-safe-to-use-s3-over-http-from-ec2-as-opposed-to-https), this can likely be fixed by using insecure: true under your s3: configuration section.

Otherwise, it would appear that argo needs to have the CA when making the s3 request. I haven't looked at the code or docker image to know more.

@ddseapy
Copy link
Contributor

ddseapy commented Feb 1, 2020

It looks like the image is from scratch (https://github.com/argoproj/argo/blob/master/Dockerfile#L87-L91), which wouldn't have CA's by default. I also don't see them COPY'd in.

So unless the CA is built into the binary (which would seem odd), then I'm guessing no certs exist in the docker. You can always create a secret and mount them into the container and set AWS_CA_BUNDLE env var - requiring only yaml changes.

All this said, I'm guessing this worked previously so I'm probably missing how they got in. It's also possible they exist in the container, but are out of date.

@haghabozorgi
Copy link
Author

@ddseapy thanks for your suggestions. I tried with insecure: true, i get http: named cookie not present in browser window, and it does not seem to log anything on the argo-server nor the workflow-controller. So I tried to remove the insecure: true line to revert back and now i get "http: named cookie not present" message in browser even after i kubectl delete -f install.yaml and reinstall, and submit a new workflow and try and download that new artifact.

I assume the secret should have the contents of /etc/ssl/certs/ca-certificates.crt? I could give that a try, but I am starting to suspect there is something else going on (as well)? Thoughts welcome.

@ddseapy
Copy link
Contributor

ddseapy commented Feb 1, 2020

@haghabozorgi I am hitting the same http: named cookie not present error with insecure: true on rc7. This is a bug/regression, it worked as recent as 2.4.3. @alexec let me know if you want me to create a separate ticket for this. The artifacts (both logs and other output artifacts) are in s3 and the workflow succeeds.

Yes, that's generally where ca certs are at on machines. If AWS_CA_BUNDLE points to wherever that file is within the docker then i'd suspect you wouldn't see that error. A configmap might be a bit easier than a secret actually to get it into the docker. But I agree that this would be a quick hack to get past the error. One of the argo maintainers can probably point to where that cert is supposed to come from.

@haghabozorgi
Copy link
Author

@ddseapy is there extra config needed to get the artifact store setup in 2.4.3? I was trying to confirm if this is working in 2.4.3 but if i insert the same configmap snippet i see no artifacts list in the archive tab.

@tcolgate
Copy link
Contributor

tcolgate commented Feb 3, 2020

One quite way of doing this is to add the following to the argo-server deployment:

      containers:
      - volumeMounts:
        - mountPath: /etc/ssl/certs/ca-certificates.crt
          name: ssl-certs
          readOnly: true

      volumes:
      - hostPath:
          path: /etc/ssl/certs/ca-certificates.crt
          type: ""
        name: ssl-certs

@ddseapy
Copy link
Contributor

ddseapy commented Feb 3, 2020

@haghabozorgi I don't think the config is different. hopefully @tcolgate 's fix works for you.

@tcolgate
Copy link
Contributor

tcolgate commented Feb 3, 2020

The executor probably has a valid system CA as it's build on a debian image. the argocli container is more stripped down.

@alexec
Copy link
Contributor

alexec commented Feb 3, 2020

I think this is probably a bug and that the argo-server image may need to be built on the same base as the argocli. This is not straight forward, I'll own it.

@alexec alexec self-assigned this Feb 3, 2020
@ddseapy
Copy link
Contributor

ddseapy commented Feb 3, 2020

@alexec are you also testing with insecure: false or should I make a separate ticket? the certificate issue wouldn't fix that.

@alexec
Copy link
Contributor

alexec commented Feb 3, 2020

I'm working of a fix that will change from scratch - what would be useful is for someone to test it for me.

@alexec
Copy link
Contributor

alexec commented Feb 3, 2020

@tcolgate thoughts on a solution

  • Option 1 - update the manifests to include the example you provided?
  • Option 2 - use the same debian slim image as the executor?
  • Option 3 - add certs during docker build

@jessesuen - thoughts on this please?

@alexec
Copy link
Contributor

alexec commented Feb 3, 2020

alexec added a commit to alexec/argo-workflows that referenced this issue Feb 3, 2020
@haghabozorgi
Copy link
Author

@ddseapy when I add the volume mount and volume snippet per @tcolgate's suggestion, I still get http named cookie not present when trying to download the artifact from the ui

@alexec
Copy link
Contributor

alexec commented Feb 3, 2020

I don't have access to S3 is a way to reliably test this. I've create a PR, would you be able to use the Dockefile there to try it out?

@ddseapy
Copy link
Contributor

ddseapy commented Feb 3, 2020

@haghabozorgi i'm afraid I'm not sure what the cause of that issue is. I made the other ticket to track that.

@haghabozorgi
Copy link
Author

@alexec I am building from your repo now, but not sure if i can test properly given the issue I mentioned above. I assume i will see the same http named cookie not present message?

@ddseapy
Copy link
Contributor

ddseapy commented Feb 3, 2020

@haghabozorgi yes. if you deploy without @tcolgate's fix, and see "http named cookie not present" instead of the original error "x509: certificate signed by unknown authority", then the PR works.

@haghabozorgi
Copy link
Author

@alexec using the image from your docker file results in http named cookie not present message.

@alexec
Copy link
Contributor

alexec commented Feb 3, 2020

Ok, but we know that is a good fix for the certs.

@haghabozorgi
Copy link
Author

@alexec based on @ddseapy comments it seems the http named cookie not present is a separate issue, but that seems to be the case whether insecure flag is set to true or false. The end result being the user is still not able to download artifacts from ui.

@alexec
Copy link
Contributor

alexec commented Feb 3, 2020

Ok - should we close this issue once the PR is merged?

@ddseapy
Copy link
Contributor

ddseapy commented Feb 3, 2020

I use minio over http as well.

@alexec
Copy link
Contributor

alexec commented Feb 3, 2020

Can you please check you have secrets set-up?

---
apiVersion: v1
data:
  config: |
    artifactRepository:
      archiveLogs: true
      s3:
        bucket: my-bucket
        endpoint: minio:9000
        insecure: true
        accessKeySecret:
          name: my-minio-cred
          key: accesskey
        secretKeySecret:
          name: my-minio-cred
          key: secretkey
kind: ConfigMap
metadata:
  name: workflow-controller-configmap
---
apiVersion: v1
kind: Secret
metadata:
  labels:
    app: minio
  name: my-minio-cred
stringData:
  accesskey: admin
  secretkey: password
type: Opaque

@alexec alexec removed this from the v2.5 milestone Feb 3, 2020
@ddseapy
Copy link
Contributor

ddseapy commented Feb 3, 2020

as far as i can tell it's ok.

apiVersion: v1
data:
  config: |
    containerRuntimeExecutor: docker
    artifactRepository:
      archiveLogs: true
      s3:
        accessKeySecret:
          key: accesskey
          name: ddseapy-minio
        secretKeySecret:
          key: secretkey
          name: ddseapy-minio
        bucket: ds-argo-artifacts
        endpoint: ddseapy-minio
        insecure: true
        region: us-east-1
    metricsConfig:
      enabled: true
      path: /metrics
      port: 8080
    persistence:
      archive: true
      connectionPool:
        maxIdleConns: 100
        maxOpenConns: 0
      nodeStatusOffLoad: true
      postgresql:
        host: ddseapy-postgresql
        port: 5432
        database: argo
        tableName: argo_workflows
        userNameSecret:
          name: ddseapy-argo-workflow-controller
          key: postgresqlUsername
        passwordSecret:
          name: ddseapy-argo-workflow-controller
          key: postgresqlPassword
kind: ConfigMap
metadata:
  name: ddseapy-argo-workflow-controller
  namespace: ddseapy
---
apiVersion: v1
data:
  accesskey: REDACTED
  secretkey: REDACTED
kind: Secret
metadata:
  labels:
    app: minio
  name: ddseapy-minio
  namespace: ddseapy
type: Opaque

@alexec
Copy link
Contributor

alexec commented Feb 3, 2020

@ddseapy I redacted your paste as you shared un-encrypted credentials. If this is a production system, then you should immediately change your password.

@alexec
Copy link
Contributor

alexec commented Feb 3, 2020

I'm wondering where this error is coming from - can you open your browser console and share the HTTP request and response please?

@ddseapy
Copy link
Contributor

ddseapy commented Feb 3, 2020

no, that is simply admin password base64 encoded.

@alexec
Copy link
Contributor

alexec commented Feb 3, 2020

Oh - what auth mode are you using? Server?

@alexec
Copy link
Contributor

alexec commented Feb 3, 2020

Ok. I've reproduced this.

@alexec alexec added this to the v2.5 milestone Feb 3, 2020
alexec added a commit to alexec/argo-workflows that referenced this issue Feb 3, 2020
@alexec
Copy link
Contributor

alexec commented Feb 3, 2020

Fix implemented.

@haghabozorgi
Copy link
Author

@alexec does your fix address downloading from s3 or is it just for the minio over http use case?

@ddseapy
Copy link
Contributor

ddseapy commented Feb 3, 2020

minio is an s3-compatible api. while, i don't have the ability to test s3, and it sounds like neither does @alexec it almost certainly fixes the error for s3 as well..

@haghabozorgi
Copy link
Author

@alexec can we please re-open? I am testing install.yaml from master with same s3 snippet in my original comment on this issue and am still seeing http: named cookie not present in browser when I try and download main-logs for example. the url in the browser show http://localhost:2746/artifacts/argo/local-script-j7wg9/local-script-j7wg9/main-logs?Authorization=null

@alexec
Copy link
Contributor

alexec commented Feb 4, 2020

If bug persists - please re-open.

@alexec alexec reopened this Feb 4, 2020
@ddseapy
Copy link
Contributor

ddseapy commented Feb 4, 2020

@haghabozorgi master install.yaml points to latest docker tag. It might be that the docker wasn't on dockerhub yet. Do you mind setting the docker tags to 2.5.0-rc8 in the yaml just to make sure.

With rc8 I am able to download logs, though all other artifacts are downloaded as .gz, when they are .tgz so I have to rename them after downloading. I'll open a separate ticket.

@alexec
Copy link
Contributor

alexec commented Feb 4, 2020

The "named cookie" error should be fixed in rc8.

@haghabozorgi
Copy link
Author

@alexec @ddseapy thank you both for all your efforts, sincerely appreciate it. confirmed, it is working with rc8. I think we can close this.

@alexec
Copy link
Contributor

alexec commented Feb 4, 2020

Yay!

@alexec alexec closed this as completed Feb 4, 2020
@xrafhue
Copy link

xrafhue commented Jun 21, 2022

Hi,

i found another solution if it can help..

This is my config-map used in the controler to allow the container init to push logs on s3

apiVersion: v1
kind: ConfigMap
metadata:
  name: workflow-controller-configmap
[....]
data:
  artifactRepository: |
    archiveLogs: true
    s3:
      bucket: argoworkflow-logs
      endpoint: minio.minio.svc.cluster.local
      insecure: false
[....]
  executor: |
    resources:
[....]
    env:
      - name: SSL_CERT_FILE
        value: /run/secrets/kubernetes.io/serviceaccount/ca.crt

This is the conf of the argo-server deployment who mount the ca.crt and allow the UI to get logs from s3 after the pod is killed.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: argo-server
[....]
spec:
  template:
    spec:
      containers:
          volumeMounts:
            - name: kube-root-ca
              mountPath: /etc/ssl/certs/kube-root-ca.crt
              subPath: ca.crt
[....]
      volumes:
        - name: kube-root-ca
          configMap:
            name: kube-root-ca.crt
            defaultMode: 0755

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug type/regression Regression from previous behavior (a specific type of bug)
Projects
None yet
5 participants