Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ingest Manager] Support for UPGRADE_ACTION #21002

Merged
merged 31 commits into from
Sep 22, 2020

Conversation

michalpristas
Copy link
Contributor

@michalpristas michalpristas commented Sep 7, 2020

What does this PR do?

Implements UPGRADE_ACTION defined in elastic/kibana#73069

Once action is invoked from fleet agent will initiate an update procedure
this consists of few steps:

  • download: fetches specified binaries from source specified by fleet
  • verify: agent verifies package is created and signed by elastic
  • unpack: agent unpack bits to correct location
  • switch and mark: agent switches symlinks to point to correct version, updates paths and generates file describing upgrade
  • reexec: agent reexecs itself so it is restarted in a new version

this PR does not add rollback capability

Why is it important?

Sets the cornerstone for upgrade process initiated by fleet

How to test

  • build package
  • copy them to some other location e.g ~/server/beats/elastic-agent
  • make a small change and commit (so commit inside a package changes and it is a valid upgrade)
  • run mage package again
  • unpack *.tar.gz
  • run ES and KBN from branch in this PR [Ingest Manager] add upgrade action kibana#77412
  • enroll & run agent
  • run python3 -m http.server inside ~/server so you can fetch update capable agent
  • execute and replace AGENT_ID, this will trigger update
curl -X POST -H 'kbn-xsrf: true'  -u elastic:changeme http://localhost:5601/api/ingest_manager/fleet/agents/AGENT_ID/upgrade \
  -H 'Content-Type: application/json' \
 -d '{"source_uri":"http://localhost:8000","version":"8.0.0"}'
  • check you have new agent inside data, it is running and action is ACKed

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

@michalpristas michalpristas self-assigned this Sep 7, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest-management (Team:Ingest Management)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Sep 7, 2020
@michalpristas michalpristas marked this pull request as draft September 7, 2020 08:34
@elasticmachine
Copy link
Collaborator

elasticmachine commented Sep 7, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #21002 updated]

  • Start Time: 2020-09-22T10:23:47.309+0000

  • Duration: 42 min 39 sec

Test stats 🧪

Test Results
Failed 0
Passed 2084
Skipped 6
Total 2090

@michalpristas michalpristas force-pushed the agent-update-action branch 2 times, most recently from 131ab80 to 9f1bbfd Compare September 8, 2020 08:06
@michalpristas michalpristas marked this pull request as ready for review September 21, 2020 08:45
Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good!

This gets the agent unpacked and re-exec'd, thats the first key piece. With this now we can add in the safe guards to rollback when upgrade is not successful.

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not test this locally. Two requests from my end:

  • It would be nice if the PR description would have more details on what this PR exactly does implemented. This would make reviewing easier and would also help as reference in the future to get back to to read more about the implementation.
  • Docs: We should doc somewhere as a follow up on how it exactly works also for our users.

settings.SourceURI = sourceURI
}

fetcher := downloader.NewDownloader(u.log, &settings)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could be moved to line 26 so this does not have to be created if NewVerifier fails.


marker := updateMarker{
Hash: hash,
UpdatenOn: time.Now(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Updaten/Updated/ ?

Action: action,
}

markerPath := filepath.Join(paths.Data(), markerFilename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Move to line 66?


func updateHomePath(hash string) error {
if err := createPathsSymlink(hash); err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if all these errors will be self explanatory if they bubble up or if we need to add more content to it. It applies to all errors in this code.


func createPathsSymlink(hash string) error {
// only on windows
// on other systems path is shared
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add some details on why?

var hash string
fileNamePrefix := strings.TrimSuffix(filepath.Base(archivePath), ".tar.gz") + "/" // omitting `elastic-agent-{version}-{os}-{arch}/` in filename

for {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment on what this loop does in more detail. It is more complex then I expected.

agentName = "elastic-agent"
hashLen = 6
agentCommitFile = ".elastic-agent.active.commit"
agentArtifactName = "beats/" + agentName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have a beats prefix here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is path where agent is located on our download page. atm it is beats/elastic-agent/*.tar.gz

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see :-( @ph Perhaps something to change in the future ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for 7.x when we move to a separate repository. :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is really related to the repository change. Is the prefix tied to a repo or hardcoded somewhere in the build process?

@@ -75,6 +75,10 @@ func NewCommandWithArgs(args []string, streams *cli.IOStreams) *cobra.Command {
cmd.AddCommand(newEnrollCommandWithArgs(flags, args, streams))
cmd.AddCommand(newInspectCommandWithArgs(flags, args, streams))

// TODO: remove before merging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just to make testing simpler. if you dont want to setup action and fleet, you can invoke this using elastic-agent upgrade. this will be removed when i merge the PR so only way of invoking upgrade will be using fleet

"github.com/elastic/beats/v7/x-pack/elastic-agent/pkg/fleetapi"
)

// TODO: remove before merging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

ActionID string `json:"id" yaml:"id"`
ActionType string `json:"type" yaml:"type"`
Version string `json:"version" yaml:"version"`
SourceURI string `json:"source_uri" yaml:"source_uri"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need the SourceURI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fleet can tell us where binary is located, maybe it is our official store but in case it is not, it can be kibana api or some fileserver within a company

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep it simple for now and make the upgrade with and without fleet identical. The Agent knows where to pull data from. We can extend it later on if we actually support different sources. My hope is that this makes the API simpler as Fleet only needs to send down the version and does not need to also figure out how the url looks, Agent already has this built in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin I don't think we should remove this, in isolated environments in most cases I would believe that a mirror of the builds will be provided at a different sourceURI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blakerouse But can we make it optional for now to not need to build this knowledge also into Kibana?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think it should be optional, if the field is not provided then Agent fallsback to the default artifacts API.

@michalpristas michalpristas merged commit f5cd53e into elastic:master Sep 22, 2020
michalpristas added a commit to michalpristas/beats that referenced this pull request Sep 22, 2020
v1v added a commit to v1v/beats that referenced this pull request Sep 24, 2020
…ne-2.0-arm

* upstream/master: (29 commits)
  Fix librpm installation in auditbeat build (elastic#21239)
  Fix prometheus default config (elastic#21253)
  Fix dev guide test command (elastic#21254)
  Move aws lambda metricset to GA (elastic#21255)
  [Docs] Typo in table syntax (elastic#20227)
  [ECS] Adds related.hosts to capture all hostnames and host identifiers on an event. (elastic#21160)
  Add recursive split to httpjson (elastic#21214)
  [DOCS] Add beat specific start widgets (elastic#21217)
  Fix timestamp handling in remote_write (elastic#21166)
  Fix aws, azure and googlecloud compute dashboards (elastic#21098)
  Add acceptable event log keys to winlog (elastic#21205)
  Add elastic-agent to gitignore (elastic#21219)
  Add cloudfoundry tags to events (elastic#21177)
  [Ingest Manager] Agent includes pgp file (elastic#19480)
  Add compatibility note about ingress-controller-v0.34.1 (elastic#21209)
  [Ingest Manager] Support for UPGRADE_ACTION (elastic#21002)
  Fix libbeat.output.*.bytes metrics of Elasticsearch output (elastic#21197)
  [packaging] use docker.elastic.co/ubi8/ubi-minimal (elastic#21154)
  Add host inventory metrics to system module (elastic#20415)
  [Filebeat][Azure Module] Fixing event.outcome from result_type issue (elastic#20998)
  ...
v1v added a commit to v1v/beats that referenced this pull request Sep 24, 2020
…ne-2.0

* upstream/master: (33 commits)
  Stop running agent container as root by default (elastic#21213)
  Stop running auditbeat container as root by default (elastic#21202)
  Fix autodiscover flaky tests (elastic#21242)
  [Ingest Manager] Enabled dev builds (elastic#21241)
  Fix librpm installation in auditbeat build (elastic#21239)
  Fix prometheus default config (elastic#21253)
  Fix dev guide test command (elastic#21254)
  Move aws lambda metricset to GA (elastic#21255)
  [Docs] Typo in table syntax (elastic#20227)
  [ECS] Adds related.hosts to capture all hostnames and host identifiers on an event. (elastic#21160)
  Add recursive split to httpjson (elastic#21214)
  [DOCS] Add beat specific start widgets (elastic#21217)
  Fix timestamp handling in remote_write (elastic#21166)
  Fix aws, azure and googlecloud compute dashboards (elastic#21098)
  Add acceptable event log keys to winlog (elastic#21205)
  Add elastic-agent to gitignore (elastic#21219)
  Add cloudfoundry tags to events (elastic#21177)
  [Ingest Manager] Agent includes pgp file (elastic#19480)
  Add compatibility note about ingress-controller-v0.34.1 (elastic#21209)
  [Ingest Manager] Support for UPGRADE_ACTION (elastic#21002)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Ingest Management:beta2 Group issues for ingest management beta2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants