Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If one gpg check is working, upgrade should continue #3426

Merged
merged 10 commits into from
Sep 20, 2023

Conversation

pierrehilbert
Copy link
Contributor

@pierrehilbert pierrehilbert commented Sep 17, 2023

What does this PR do?

This PR removes the break in the verify loop to continue in case one verify is successful.

Why is it important?

This change will allow upgrade to continue if remote gpg key is unreachable.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Closes #3368

Use cases

Screenshots

Logs

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@pierrehilbert pierrehilbert added bug Something isn't working Team:Elastic-Agent Label for the Agent team labels Sep 17, 2023
@pierrehilbert pierrehilbert requested a review from a team as a code owner September 17, 2023 16:53
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@mergify
Copy link
Contributor

mergify bot commented Sep 17, 2023

This pull request does not have a backport label. Could you fix it @pierrehilbert? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@pierrehilbert pierrehilbert added the backport-v8.10.0 Automated backport with mergify label Sep 17, 2023
@mergify mergify bot removed the backport-skip label Sep 17, 2023
@elasticmachine
Copy link
Contributor

elasticmachine commented Sep 17, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-09-20T14:00:06.567+0000

  • Duration: 26 min 47 sec

Test stats 🧪

Test Results
Failed 0
Passed 6301
Skipped 59
Total 6360

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Contributor

elasticmachine commented Sep 17, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.78% (81/82) 👍
Files 66.327% (195/294) 👍
Classes 65.751% (359/546) 👍
Methods 52.686% (1128/2141) 👎 -0.074
Lines 38.2% (12800/33508) 👎 -0.031
Conditionals 100.0% (0/0) 💚

Copy link
Contributor

@michalpristas michalpristas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not fixing the issue but we can push this in and have at least one strategy with verifiers to reduce number of failures

return "", errors.New(err, "failed verification of agent binary")
}

if err != nil {
Copy link
Member

@cmacknz cmacknz Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really the standard way to do this, if the operation succeeded err should be nil.

Now that I've seen this approach, I think a better solution would be to have the composed verifier log each verifier that failed but return a nil error if at least one of them succeeded.

If you wanted to keep doing it this way, you would want to return the details of what happened as a separate structure so it can be logged at this level but I think it will be much simpler to handle the logging directly in this block of the composed verifier

Something like the following should handle most of this, it already builds up a joined error of each failure and only returns it if all of the verifiers fail. We just need to add a log statement and a Name method to the verifier interface so you can log which one failed.

--- a/internal/pkg/agent/application/upgrade/artifact/download/composed/verifier.go
+++ b/internal/pkg/agent/application/upgrade/artifact/download/composed/verifier.go
@@ -5,7 +5,7 @@
 package composed

 import (
-       "github.com/hashicorp/go-multierror"
+       "errors"

        "github.com/elastic/elastic-agent/internal/pkg/agent/application/upgrade/artifact"
        "github.com/elastic/elastic-agent/internal/pkg/agent/application/upgrade/artifact/download"
@@ -43,11 +43,10 @@ func (e *Verifier) Verify(a artifact.Artifact, version string, skipDefaultPgp bo
                        return nil
                }

-               err = multierror.Append(err, e)
+               err = errors.Join(err, e)

                if errors.As(e, &checksumMismatchErr) || errors.As(err, &invalidSignatureErr) {
-                       // Stop verification chain on checksum/signature errors.
-                       break
+                       log.Warningw("Verifier failed!", "verifier", v.Name(), "error", err)
                }
        }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would still want the logger here to log the case where the all fail, which should be logged at the Error level.

You should be able to pass the logger you have here into the function call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to ensure that I get it correctly: you prefer me to log directly in the composed/verifier.go instead of step_download.go to avoid returning something in the error output if at the end the upgrade was successful (from a conceptual perspective, success = no error)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need success = "no error".

The trouble you are running into is that you need to log twice, once inside the composed/verifier.go file for non-fatal errors that won't be returned (since final success = no error) and once when the verifier is called from the upgrade process for the fatal error where all verifiers failed.

@elastic-sonarqube
Copy link

SonarQube Quality Gate

Quality Gate failed

Failed condition 26.7% 26.7% Coverage on New Code (is less than 40%)

See analysis details on SonarQube

@pierrehilbert pierrehilbert merged commit 7be7f62 into main Sep 20, 2023
22 of 24 checks passed
@pierrehilbert pierrehilbert deleted the gpg-unreachable-url branch September 20, 2023 16:07
mergify bot pushed a commit that referenced this pull request Sep 20, 2023
pierrehilbert added a commit that referenced this pull request Sep 20, 2023
(cherry picked from commit 7be7f62)

Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.10.0 Automated backport with mergify bug Something isn't working Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When the GPG URL is unreachable, the upgrade should continue if the embedded GPG key still works
4 participants