[cmd/opampsupervisor]: Implement PackagesAvailable for upgrading agent #35503

BinaryFissionGames · 2024-10-01T06:00:16Z

Description:
Implements ReportPackageStatuses and taking PackagesAvailable for upgrading the agent.

The agent will only accept a top-level package with an empty name. This agent must be signed using cosign's keyless signing method (this is how the opentelemetry-collector-releases repository signs its releases).

The signature field must be populated with the resultant base64 encoded cert and signature, both being space separated (signature = b64_cert + " " + b64_signature).

This first implementation only allows online verification; In order to verify the certificate, it must reach out to the public rekor transparency log instance. It also fetches certificates from the internet. Some of these things are configurable through environment variables, but I figure we can parse out how offline signature verification works in a follow-up PR. This basic setup should allow for signature verification for agents that have access to the internet.

This PR also does not revert the collector if it is unhealthy. This will need to be done in a follow up PR. I think we should do it after #34907 is merged, as I imagine the logic will overlap here.

Link to tracking Issue: Closes #34734, Partially solves #33947

Testing:

Unit testing, e2e testing

evan-bradley

Thanks for taking this on, its no small task. I'm still looking through this, but wanted to give early feedback on a few points.

evan-bradley · 2024-10-04T15:15:45Z

cmd/opampsupervisor/e2e_test.go

+	require.Equal(t, &protobufs.PackageStatuses{
+		Packages: map[string]*protobufs.PackageStatus{
+			"": {
+				// TODO: Should initital version be filled in?


This should probably be the version/hash of the Collector binary, right? I think the version can be the one obtained during bootstrapping.

evan-bradley · 2024-10-04T15:16:44Z

cmd/opampsupervisor/supervisor/config/config.go

+	return nil
+}
+
+// TODO: Certificate paths? The certificate can be specified via SIGSTORE_ROOT_FILE for now


Could we link to an issue here? It would make it easier to track and provides some context.

evan-bradley · 2024-10-04T15:25:35Z

cmd/opampsupervisor/supervisor/packages.go

+	agentHash := h.Sum(nil)
+
+	// Load persisted package state, if it exists
+	// TODO: use packagesStatusPath method somehow


I think this is okay given that it's the only other usage of this and is a very simple call to filepath.Join.

The other options I can think of aren't worth the additional complexity:

Extract the packagesStatusPath functionality to a function and call the function in the method as well. Too much abstraction for a simple call.

Instantiate packageManager above and call pathagesStatusPath here, then fill in other details on the struct afterward. Splits the initialization too much for fairly minimal gains.

evan-bradley · 2024-10-04T15:30:21Z

cmd/opampsupervisor/supervisor/packages.go

+	return err
+}
+
+type packageState struct {


Would it make sense to instead use the persistent state file to store this? I see how the purpose of each file is distinct, but having two small-ish YAML files also feels a little odd.

This may also be premature, but I think abstracting state storage through the persistent state struct may keep things cleaner in the long run.

evan-bradley · 2024-10-04T15:48:25Z

cmd/opampsupervisor/supervisor/supervisor.go

@@ -275,20 +297,20 @@ func (s *Supervisor) loadConfig(configFile string) error {
 // an OpAMP extension, obtains the agent description, then
 // shuts down the Collector. This only needs to happen
 // once per Collector binary.
-func (s *Supervisor) getBootstrapInfo() (err error) {
+func (s *Supervisor) getBootstrapInfo() (agentVersion string, err error) {


I think this is another place where consolidating our storage to the persistent state may make things cleaner. We have two different data flows for the instance ID vs. the agent version in this function, when I think we should probably store the two next two each other so all of the agent's information is recorded in the same place. What do you think?

evan-bradley · 2024-10-04T16:16:20Z

cmd/opampsupervisor/supervisor/packages.go

+	}
+
+	// overwrite agent process
+	startAgent, err := p.am.stopAgentProcess(ctx)


Do you think it would substantially complicate things if we limited agent process management to the Supervisor struct and only deal with managing the binary file here? My first impression is that we should keep those responsibilities separated if possible and not have the package manager know anything about the Collector process, even if the current implementation isn't too complicated.

My thought was that we need to stop the agent process to write the binary in the first place.

We could write the binary somewhere temporarily, and send a message to the agent goroutine to do the replacement, which could be cleaner (better separation of concerns). The package manager would still need to have some handle to signal that, but it wouldn't necessarily be tied to the collector.

Would we be able to accomplish this with synchronous, one-way communication? I was thinking maybe we could do something vaguely like this in the Supervisor struct:

oldVersion := 0 newVersion := 1 s.commander.stopAgent() s.packageManager.SwapInVersion(newVersion) if err := s.commander.startAgent(); err != nil { s.packageManager.SwapInVersion(oldVersion) s.commander.startAgent() s.reportFailedInstallation() return } s.packageManager.DeleteVersion(oldVersion) s.reportSuccessfulInstallation()

Yeah, I think we could do something like that. I think this also sets us up nicely for restoring the previous version if the new one fails. I'll look into making those changes.

evan-bradley · 2024-10-04T16:28:05Z

cmd/opampsupervisor/supervisor/supervisor.go

+		if err != nil {
+			s.logger.Error("Failed to sync PackagesAvailable message", zap.Error(err))
+		}
+		// TODO: Should we wait for the sync to be done somehow? Should it be in a separate goroutine


From what I'm seeing, the Done channel only communicates that the PackageSyncer is done communicating package statuses to the server, and doesn't track whether the packages have been downloaded: https://github.com/open-telemetry/opamp-go/blob/main/client/internal/packagessyncer.go#L51. It may be worth waiting for this before communicating any additional updates about package statuses to the server.

I'm actually really confused reading that code. It looks like once Sync is done, that channel is closed, so waiting on Done here wouldn't actually add any extra synchronization.

Good call at looking at that impl, I definitely had a different idea of its purpose in my head.

evan-bradley · 2024-10-04T16:30:30Z

@tigrannajaryan I'd appreciate you taking a look at this, even if just to review conformance to the spec.

tigrannajaryan

I started reviewing but got confused by the many moving parts of signatures and verifications.

I think we need a design doc or some other form of documentation that explains all the involved parties (cosign, Collector releases, OpAMP Server, Supervisor) and how they dance together to make sure the downloaded binary is what it pretends to be and is safe to use.

Some sort of a sequence diagram that starts with the Collector build on opentelemetry-collector-releases and ends with Supervisor launching the new executable and shows all the steps in between would be great. If we number the steps we could even refer to them in the code.

tigrannajaryan · 2024-10-16T13:49:23Z

cmd/opampsupervisor/supervisor/config/config.go

+}
+
+// TODO: Certificate paths? The certificate can be specified via SIGSTORE_ROOT_FILE for now
+type AgentSignature struct {


Please document AgentSignature and AgentSignatureIdentity.

tigrannajaryan · 2024-10-16T14:01:45Z

cmd/opampsupervisor/supervisor/packages.go

+	return splitSignature[0], splitSignature[1], nil
+}
+
+// sig is the decoded signature of


Incomplete comment?

tigrannajaryan · 2024-10-16T14:02:28Z

cmd/opampsupervisor/supervisor/packages.go

+}
+
+func parsePackageSignature(signature []byte) (b64Cert, b64Signature []byte, err error) {
+	splitSignature := bytes.SplitN(signature, []byte(" "), 2)


Is this signature format documented somewhere? This imposes a requirement on the OpAMP server to generate signatures in a specific way, right?

BinaryFissionGames · 2024-10-16T15:39:05Z

I think we need a design doc or some other form of documentation that explains all the involved parties (cosign, Collector releases, OpAMP Server, Supervisor) and how they dance together to make sure the downloaded binary is what it pretends to be and is safe to use.

Yeah, good call out. I will work on this and get something in our spec that explains everything here.

BinaryFissionGames added 30 commits October 1, 2024 01:29

start working on processing agent packages

bfbd653

implement (untested) outline for taking package

32d49fc

WIP state

470c26e

WIP managing using package manager

bf9dcf0

implement more of packageManager

00973b8

Fix mismatch interface

d692fe6

implement update-content

53d86d9

make signature verification configurable, spec out some of test

a7acf27

add env var to comment

9751575

Remove duplicate todo

729e69b

return error for creating verification options

46e7243

add comment for singature values

a9d39d3

Remove TODO

dfc6dec

err shadowing

78ad966

add some unit test

1cccc55

iterate on e2e test

d130cbf

fix stop/starting collector

73470fe

fix nil not equalling nil

fb147d5

fix copy file

3e348dd

check agent description

44069b6

extract tarball

980b581

fix e2e test

925c907

fix import order

9c9755f

remove unnecesary else

00a31f5

go.mod should use 1.22.0

5de4aa1

comment grammar

42e0e13

re-add todo

478c12d

remove commented options in CheckOpts

cd0c29d

remove replace

fe0a7b8

tidy

3f7e016

github-actions bot added the cmd/opampsupervisor label Oct 1, 2024

github-actions bot requested review from atoulme, evan-bradley and tigrannajaryan October 1, 2024 06:00

BinaryFissionGames added 5 commits October 1, 2024 02:03

add chlog

8880ce1

tidy

cc7d49e

tidy

b174bc5

Use rekor package for client not cosign

c596d8d

Calculate hash (hash differs from goos/goarch)

411dfab

BinaryFissionGames marked this pull request as ready for review October 1, 2024 08:07

BinaryFissionGames requested a review from a team as a code owner October 1, 2024 08:07

github-actions bot assigned mx-psi Oct 1, 2024

evan-bradley reviewed Oct 4, 2024

View reviewed changes

tigrannajaryan reviewed Oct 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cmd/opampsupervisor]: Implement PackagesAvailable for upgrading agent #35503

[cmd/opampsupervisor]: Implement PackagesAvailable for upgrading agent #35503

BinaryFissionGames commented Oct 1, 2024

evan-bradley left a comment

evan-bradley Oct 4, 2024

evan-bradley Oct 4, 2024

evan-bradley Oct 4, 2024

evan-bradley Oct 4, 2024

evan-bradley Oct 4, 2024

evan-bradley Oct 4, 2024

BinaryFissionGames Oct 4, 2024

evan-bradley Oct 4, 2024

BinaryFissionGames Oct 4, 2024

evan-bradley Oct 4, 2024

BinaryFissionGames Oct 4, 2024 •

edited

Loading

evan-bradley commented Oct 4, 2024

tigrannajaryan left a comment

tigrannajaryan Oct 16, 2024

tigrannajaryan Oct 16, 2024

tigrannajaryan Oct 16, 2024

BinaryFissionGames commented Oct 16, 2024

[cmd/opampsupervisor]: Implement PackagesAvailable for upgrading agent #35503

Are you sure you want to change the base?

[cmd/opampsupervisor]: Implement PackagesAvailable for upgrading agent #35503

Conversation

BinaryFissionGames commented Oct 1, 2024

evan-bradley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BinaryFissionGames Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

evan-bradley commented Oct 4, 2024

tigrannajaryan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BinaryFissionGames commented Oct 16, 2024

BinaryFissionGames Oct 4, 2024 •

edited

Loading