Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elastic-agent: don't swallow download errors #23235

Merged
merged 2 commits into from
Dec 22, 2020

Conversation

axw
Copy link
Member

@axw axw commented Dec 22, 2020

What does this PR do?

Stop swallowing the error from io.Copy when reading from response bodies in the http downloader.

Why is it important?

This prevents storing a partial artifact download, which leads to a permanent error state.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Move to Australia, run elastic-agent with sourceURI pointed at the staging artifacts URL for 7.11.0 BC1 (https://staging.elastic.co/7.11.0-710164a0/summary-7.11.0.html) :)

(It seems there's no CDN on staging, and it's super slow from here. It's going to be a transient issue, but the last few attempts to fetch apm-server have failed part way through.)

More seriously, you would have to set up something (e.g. a reverse proxy) to inject faults into responses to elastic-agent.

Related issues

None.

Logs

2020-12-22T11:03:32.446+0800    INFO    operation/operation_fetch.go:75 operation 'operation-fetch' downloaded apm-server.7.11.0 into /tmp/zinga/elastic-agent-7.11.0-linux-x86_64/data/elastic-agent-fc48a3/downloads/apm-server-7.11.0-linux-x86_64.tar.gz                      
...
2020-12-22T11:03:32.515+0800    ERROR   log/reporter.go:36      2020-12-22T11:03:32+08:00: type: 'ERROR': sub_type: 'FAILED' message: Application: apm-server--7.11.0[aa633810-4400-11eb-bc64-2b65e7226012]: State changed to FAILED: operation 'operation-verify' marked 'apm-server.7.11.0' corrupted: /go/src/github.com/elastic/beats/x-pack/elastic-agent/pkg/agent/operation/operation_verify.go[77]: unknown error
...
2020-12-22T11:03:37.978+0800    INFO    operation/operation_fetch.go:61 apm-server.7.11.0 already exists in /tmp/zinga/elastic-agent-7.11.0-linux-x86_64/data/elastic-agent-fc48a3/downloads/apm-server-7.11.0-linux-x86_64.tar.gz. Skipping operation operation-fetch
...
2020-12-22T11:03:38.043+0800    ERROR   application/fleet_gateway.go:168        failed to dispatch actions, error: operator: failed to execute step sc-run, error: operation 'operation-verify' marked 'apm-server.7.11.0' corrupted: /go/src/github.com/elastic/beats/x-pack/elastic-agent/pkg/agent/operation/operation_verify.go[77]: unknown error: operation 'operation-verify' marked 'apm-server.7.11.0' corrupted: /go/src/github.com/elastic/beats/x-pack/elastic-agent/pkg/agent/operation/operation_verify.go[77]: unknown error

Stop swallowing the error from io.Copy when reading
from response bodies in the http downloader. This
prevents storing a partial download, which leads to
a permanent error state.
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label Team:Ingest Management labels Dec 22, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest-management (Team:Ingest Management)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 22, 2020
@axw axw added the v8.0.0 label Dec 22, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Dec 22, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #23235 event

  • Start Time: 2020-12-22T04:06:40.749+0000

  • Duration: 26 min 37 sec

Test stats 🧪

Test Results
Failed 0
Passed 1422
Skipped 4
Total 1426

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 1422
Skipped 4
Total 1426

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change LGTM, did not go through the full testing cycle ;-) I assume it is an oversight that we skipped that error.

@axw axw merged commit 3cb1aa3 into elastic:master Dec 22, 2020
@axw axw added the v7.12.0 label Dec 22, 2020
axw added a commit to axw/beats that referenced this pull request Dec 22, 2020
* elastic-agent: don't swallow download errors

Stop swallowing the error from io.Copy when reading
from response bodies in the http downloader. This
prevents storing a partial download, which leads to
a permanent error state.

(cherry picked from commit 3cb1aa3)
axw added a commit to axw/beats that referenced this pull request Dec 22, 2020
* elastic-agent: don't swallow download errors

Stop swallowing the error from io.Copy when reading
from response bodies in the http downloader. This
prevents storing a partial download, which leads to
a permanent error state.

(cherry picked from commit 3cb1aa3)
@axw axw deleted the agent-download-copy-error branch December 22, 2020 08:17
axw added a commit that referenced this pull request Dec 23, 2020
* elastic-agent: don't swallow download errors

Stop swallowing the error from io.Copy when reading
from response bodies in the http downloader. This
prevents storing a partial download, which leads to
a permanent error state.

(cherry picked from commit 3cb1aa3)
axw added a commit that referenced this pull request Dec 23, 2020
* elastic-agent: don't swallow download errors

Stop swallowing the error from io.Copy when reading
from response bodies in the http downloader. This
prevents storing a partial download, which leads to
a permanent error state.

(cherry picked from commit 3cb1aa3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants