Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Heartbeat] Add managed status reporter at monitor factory level #41077

Merged
merged 2 commits into from
Oct 4, 2024

Conversation

emilioalvap
Copy link
Collaborator

Proposed commit message

Add status reporting for monitors when running under elastic-agent, this will allow the Fleet UI to reflect theres an issue with one or more heartbeat integrations.
image

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  1. Build agentbeat locally with:
 DEV=true SNAPSHOT=true  PLATFORMS=linux/amd64 mage package
  1. Build elastic-agent locally with:
DEV=true SNAPSHOT=true PLATFORMS=linux/amd64 PACKAGES=docker mage package
  1. Enroll a non-complete elastic-agent into a private location policy with a browser monitor assigned.
  2. Check agent status is eventually reported as degraded and the integration marked as failed.

@emilioalvap emilioalvap added enhancement Team:obs-ds-hosted-services Label for the Observability Hosted Services team backport-skip Skip notification from the automated backport with mergify labels Oct 2, 2024
@emilioalvap emilioalvap requested a review from a team as a code owner October 2, 2024 14:59
@elasticmachine
Copy link
Collaborator

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Oct 2, 2024
@emilioalvap
Copy link
Collaborator Author

cc @lucabelluccini

@@ -175,6 +182,9 @@ func newMonitorUnsafe(

logp.L().Error(fullErr)
p.Jobs = []jobs.Job{func(event *beat.Event) ([]jobs.Job, error) {
// if statusReporter is set, as it is for running managed-mode, update the input status
// to failed, specifying the error
m.updateStatus(status.Failed, fmt.Sprintf("monitor could not be started: %s, err: %s", m.stdFields.ID, fullErr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this

	// Failed is status describing unit is failed. This status should
	// only be used in the case the beat should stop running as the failure
	// cannot be recovered.

Could this cause other HB to stop and also other monitors from running? Is this intended?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this cause other HB to stop and also other monitors from running?

It probably won't (it doesn't, as of now). Even if that were the case, since the status is scoped at monitor level, it should only filter the failed integrations, but I'm speculating here. There are also multiple status layers, this change only affects the stream (not even the integration) status.
As for the status, either failed or degraded should achieve the same purpose, I'm open to discussion on the implications. I leaned on failed because the type of error that is caught on this part is generally not recoverable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried if it could stop the other monitors. But if thats not the case, I am not super inclined towards changing this.

}

// SetStatusReporter
func (m *Monitor) SetStatusReporter(statusReporter status.StatusReporter) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since its set on the Monitor level, what happens if multiple monitors were configured, does the errors get accumulated or there is upper limit to how its shown in the UI ?

Copy link
Collaborator Author

@emilioalvap emilioalvap Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every monitor will map 1:1 to an agent integration, which Fleet UI already shows individually:
image

Copy link
Member

@vigneshshanmugam vigneshshanmugam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vigneshshanmugam
Copy link
Member

@emilioalvap Do you intent to add changelog entry?

@emilioalvap emilioalvap enabled auto-merge (squash) October 4, 2024 15:22
@emilioalvap emilioalvap added backport-8.15 Automated backport to the 8.15 branch with mergify backport-8.x Automated backport to the 8.x branch with mergify and removed backport-skip Skip notification from the automated backport with mergify labels Oct 4, 2024
@emilioalvap emilioalvap merged commit c70d2d8 into elastic:main Oct 4, 2024
29 of 31 checks passed
mergify bot pushed a commit that referenced this pull request Oct 4, 2024
)

* [Heartbeat] Add status reporting for monitors when running under elastic-agent

(cherry picked from commit c70d2d8)
mergify bot pushed a commit that referenced this pull request Oct 4, 2024
)

* [Heartbeat] Add status reporting for monitors when running under elastic-agent

(cherry picked from commit c70d2d8)
emilioalvap added a commit to emilioalvap/beats that referenced this pull request Oct 4, 2024
mergify bot pushed a commit that referenced this pull request Oct 4, 2024
(cherry picked from commit efb563c)

# Conflicts:
#	heartbeat/monitors/monitor.go
#	heartbeat/monitors/monitor_test.go
mergify bot pushed a commit that referenced this pull request Oct 4, 2024
(cherry picked from commit efb563c)

# Conflicts:
#	heartbeat/monitors/monitor.go
#	heartbeat/monitors/monitor_test.go
emilioalvap added a commit to emilioalvap/beats that referenced this pull request Oct 4, 2024
emilioalvap added a commit that referenced this pull request Oct 10, 2024
…auto-merge #41077 (#41133)

* [Heartbeat] Fix linting issues introduced by auto-merge #41077 (#41128)

* Manual merge

* [Heartbeat] Add status reporter at monitor factory level (#41077)

---------

Co-authored-by: Emilio Alvarez Piñeiro <95703246+emilioalvap@users.noreply.github.com>
Co-authored-by: emilioalvap <emilio.alvarezpineiro@elastic.co>
emilioalvap added a commit that referenced this pull request Oct 16, 2024
…uto-merge #41077 (#41134)

* [Heartbeat] Fix linting issues introduced by auto-merge #41077 (#41128)

(cherry picked from commit efb563c)

# Conflicts:
#	heartbeat/monitors/monitor.go
#	heartbeat/monitors/monitor_test.go

* Merge conflicts

* [Heartbeat] Add status reporter at monitor factory level

* Add unit test and changelog

---------

Co-authored-by: Emilio Alvarez Piñeiro <95703246+emilioalvap@users.noreply.github.com>
Co-authored-by: emilioalvap <emilio.alvarezpineiro@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-8.15 Automated backport to the 8.15 branch with mergify enhancement Team:obs-ds-hosted-services Label for the Observability Hosted Services team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants