Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Unit/Change]: trigger config change on unit config and revision changes #109

Merged
merged 4 commits into from
May 16, 2024

Conversation

pkoutsovasilis
Copy link
Contributor

This PR adds checks for the RevisionNumber and Streams of the UnitExpectedConfig to properly cause a TriggeredConfigChange.

@blakerouse I think you are the original implementor of the new control protocol, so any thoughts on that?

@pkoutsovasilis pkoutsovasilis requested a review from a team as a code owner May 10, 2024 07:34
@pkoutsovasilis pkoutsovasilis requested review from andrzej-stencel and removed request for a team May 10, 2024 07:34
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label May 10, 2024
@pierrehilbert pierrehilbert requested a review from a team May 10, 2024 09:14
@cmacknz
Copy link
Member

cmacknz commented May 13, 2024

This needs some test coverage :)

@pkoutsovasilis
Copy link
Contributor Author

pkoutsovasilis commented May 14, 2024

This needs some test coverage :)

This needs some understanding first @cmacknz; so since you are part of the elastic-agent-control-plane team I consider you more than an expert to help me understand.

As far as I can tell from the code elastic-agent-client in updateState, checks first the configIdx and only in the case this is different it checks the config.GetSource(). However, even when condigIdx has changed (which is something that elastic-agent guarantees I guess?!) if the config.GetSource() of the input, which AFAIK doesn't hold any special fields (especially for an input type unit maybe it holds use_output field?!), is not changed no TriggeredConfigChange is captured. However, config.GetRevision() and config.Streams are not checked at all.

So @cmacknz, how elastic-agent currently guarantees that elastic-agent-client will capture TriggeredConfigChange on updateState, which fields does it change? Does it change completely the ID of the input?

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is needed at all? configIdx should always be incremented by the Elastic Agent when the configuration changes. If configIdx is not incremented when the configuration is changed then there is a bug in the Elastic Agent not in the elastic-agent-client?

@pkoutsovasilis
Copy link
Contributor Author

pkoutsovasilis commented May 14, 2024

I don't understand why this is needed at all? configIdx should always be incremented by the Elastic Agent when the configuration changes. If configIdx is not incremented when the configuration is changed then there is a bug in the Elastic Agent not in the elastic-agent-client?

@blakerouse a sole configIdx increase is not enough, based on this config.Source needs to change as well to update the config right? Otherwise you update only the configIdx; can you help by answering my question here

how elastic-agent currently guarantees that elastic-agent-client will capture TriggeredConfigChange on updateState, which fields does it change? Does it change completely the ID of the input?

@cmacknz
Copy link
Member

cmacknz commented May 14, 2024

The configuration index is a mechanism used to avoid pointlessly copying the configuration back and forth when nothing has changed.

As Blake mentioned, when the expected configuration has changed the config_state_idx field will increment. This tells you that the configuration has changed, but it doesn't tell you what in it has changed.

The change triggers are not part of the control protocol, they are a feature of the client library that is attempting to save all users from having to manually inspect each part of the expected configuration to know which part of it changed.

The expectation here seems to be that there is a need to detect changes separate from the config index, with an assumption that they are completely detectable purely by doing comparisons on the source field, which is the YAML representation of the original configuration. That is the block below:

if u.configIdx != cfgIdx {
		u.configIdx = cfgIdx
		if !gproto.Equal(u.config.GetSource(), cfg.GetSource()) {
			u.config = cfg
			triggers |= TriggeredConfigChange
		}
	}

Looking at this, I see it appears possible that we do not set TriggeredConfigChange if the configuration index has changed but proto.Equal returns false for some reason. This should not be possible, but maybe there is some edge case not covered here.

The code this PRs adds seems to be looking at the protobuf representation of the Source field, that is the well known parts of it pulled out into an explicit schema. It should be the case that looking for changes in Source should always detect changes in the individual UnitExpectedConfig member fields.

Looking at the agent side of this, I think the comparison here is slightly out of sync, because the agent is using proto.Equal on the entire UnitExpectedConfig message and not just the source field.

https://github.com/elastic/elastic-agent/blob/fdba118cf3cb5ccff1dc6c9f1b747c4d1471248e/pkg/component/runtime/state.go#L154-L158

			if !gproto.Equal(expected.config, unit.Config) {
				expected.config = unit.Config
				expected.configStateIdx++
				changed = true
			}

I'm not even sure why we need the proto.Equal comparison in here and we can't just rely entirely on the value from agent, since it is doing the exact same thing on the other side of the protocol.

I suspect the right answer is probably to remove the gproto.Equal(u.config.GetSource(), cfg.GetSource()) here and rely entirely on the u.configIdx != cfgIdx comparison for change detection. If we still wanted to compare the proto fields again, it should match what the agent does and compare the entire UnitExpectedConfig message and not just the source.

@pkoutsovasilis
Copy link
Contributor Author

pkoutsovasilis commented May 14, 2024

I suspect the right answer is probably to remove the gproto.Equal(u.config.GetSource(), cfg.GetSource()) here and rely entirely on the u.configIdx != cfgIdx comparison for change detection.

@cmacknz I agree 100% with the above statement and I can even adjust this PR to handle this. I did expand on the checks in the initial code of this PR motivated by the check of the Source field. That said, the single check you mention makes sense to me; if we rely on elastic-agent to increase the configIdx then this should be the single one criteria to cause a TriggeredConfigChange

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear why the gproto.Equal was added, probably just an optimization for not saying it changed if really it hadn't changed. Don't see any issue with removing it.

Looks good.

@pkoutsovasilis pkoutsovasilis merged commit dd64603 into main May 16, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants