Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix heartbeat races on event updates #6950

Merged
merged 1 commit into from
Apr 26, 2018

Conversation

urso
Copy link

@urso urso commented Apr 25, 2018

Resolves: #6616

Heartbeat has a many components, each adding fields to an existing
event. As events are created per 'TaskRun', this is normally fine. But
in case the TaskRunner branches of into multiple Sub-tasks, we will see
races on shared event structures. This is the case with multiple ports
in the http/tcp configuration or the IPAll setting (ping all known IPs
of a given domain).

Namespaces in an event can be potentially populated via globally shared
structures. This can also lead to unwanted updates or races.

This fix clones the event structure before continuing event updates, so
to remove the chance of races.

I did run some tests before and after the change, with the race detector enabled. No more race have been found with this change.

@urso urso force-pushed the heartbeat-event-update-races branch 2 times, most recently from bd65c6f to 67e542e Compare April 25, 2018 18:59
event = common.MapStr{}
event.DeepUpdate(fields)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this with the else if I asked myself what the else case in the end which is not covered here actually means and if it can happen. So event is nil and err is nil?

Copy link
Author

@urso urso Apr 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A runner can return an event, but must not. It's fully optional. We have intermediate runners (resolve IP address), that do not return an event, but create a set of continuation runners. So event can be nil.

If a runner returns an error, no followup runners are created. => We need to create an error event. On error a runner can optionally return an event. The WithFields runner enforces the creation of an event on error, such that the already known fields (like IP resolve results) an can be merged into the final event

The WithFields wrapper adds shared fields only if an event will be returned.

To be more verbose, the conditions is:

if err != nil {
  if event == nil {
    event = common.MapStr{}
  } else {
    event = event.Clone()
  }
  event.DeepUpdate(fields)
} else { // no error
  if event != nil {
    event = event.Clone() // we need to copy the event to ensure no error are returned.
    event.DeepUpdate(fields)
  }
}

Due to the else if, the second else branch is equivalent to event == nil && err != nil.

The case event == nil && err == nil is perfectly valid -> we just don't report an event.

@ruflin ruflin added the needs_backport PR is waiting to be backported to other branches. label Apr 26, 2018
@ruflin
Copy link
Contributor

ruflin commented Apr 26, 2018

I added the needs_backport and review label.

@ruflin ruflin added the review label Apr 26, 2018
@ruflin
Copy link
Contributor

ruflin commented Apr 26, 2018

@urso Can you rebase this on master. This should fix the LS CI failure.

Heartbeat has a many components, each adding fields to an existing
event. As events are created per 'TaskRun', this is normally fine. But
in case the TaskRunner branches of into multiple Sub-tasks, we will see
races on shared event structures. This is the case with multiple ports
in the http/tcp configuration or the IPAll setting (ping all known IPs
of a given domain).

Namespaces in an event can be potentially populated via globally shared
structures. This can also lead to unwanted updates or races.

This fix clones the event structure before continuing event updates, so
to remove the chance of races.
@urso urso force-pushed the heartbeat-event-update-races branch from 67e542e to 38c9c4a Compare April 26, 2018 11:52
@urso
Copy link
Author

urso commented Apr 26, 2018

@ruflin rebased.

@ruflin ruflin merged commit e92ed3e into elastic:master Apr 26, 2018
urso pushed a commit to urso/beats that referenced this pull request May 3, 2018
Heartbeat has a many components, each adding fields to an existing
event. As events are created per 'TaskRun', this is normally fine. But
in case the TaskRunner branches of into multiple Sub-tasks, we will see
races on shared event structures. This is the case with multiple ports
in the http/tcp configuration or the IPAll setting (ping all known IPs
of a given domain).

Namespaces in an event can be potentially populated via globally shared
structures. This can also lead to unwanted updates or races.

This fix clones the event structure before continuing event updates, so
to remove the chance of races.

(cherry picked from commit e92ed3e)
@urso urso added v6.2.5 and removed needs_backport PR is waiting to be backported to other branches. labels May 3, 2018
urso pushed a commit to urso/beats that referenced this pull request May 3, 2018
Heartbeat has a many components, each adding fields to an existing
event. As events are created per 'TaskRun', this is normally fine. But
in case the TaskRunner branches of into multiple Sub-tasks, we will see
races on shared event structures. This is the case with multiple ports
in the http/tcp configuration or the IPAll setting (ping all known IPs
of a given domain).

Namespaces in an event can be potentially populated via globally shared
structures. This can also lead to unwanted updates or races.

This fix clones the event structure before continuing event updates, so
to remove the chance of races.

(cherry picked from commit e92ed3e)
@urso urso added the v6.3.0 label May 3, 2018
ruflin pushed a commit that referenced this pull request May 4, 2018
Heartbeat has a many components, each adding fields to an existing
event. As events are created per 'TaskRun', this is normally fine. But
in case the TaskRunner branches of into multiple Sub-tasks, we will see
races on shared event structures. This is the case with multiple ports
in the http/tcp configuration or the IPAll setting (ping all known IPs
of a given domain).

Namespaces in an event can be potentially populated via globally shared
structures. This can also lead to unwanted updates or races.

This fix clones the event structure before continuing event updates, so
to remove the chance of races.

(cherry picked from commit e92ed3e)
ruflin pushed a commit that referenced this pull request May 4, 2018
Heartbeat has a many components, each adding fields to an existing
event. As events are created per 'TaskRun', this is normally fine. But
in case the TaskRunner branches of into multiple Sub-tasks, we will see
races on shared event structures. This is the case with multiple ports
in the http/tcp configuration or the IPAll setting (ping all known IPs
of a given domain).

Namespaces in an event can be potentially populated via globally shared
structures. This can also lead to unwanted updates or races.

This fix clones the event structure before continuing event updates, so
to remove the chance of races.

(cherry picked from commit e92ed3e)
@urso urso deleted the heartbeat-event-update-races branch February 19, 2019 18:31
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
Heartbeat has a many components, each adding fields to an existing
event. As events are created per 'TaskRun', this is normally fine. But
in case the TaskRunner branches of into multiple Sub-tasks, we will see
races on shared event structures. This is the case with multiple ports
in the http/tcp configuration or the IPAll setting (ping all known IPs
of a given domain).

Namespaces in an event can be potentially populated via globally shared
structures. This can also lead to unwanted updates or races.

This fix clones the event structure before continuing event updates, so
to remove the chance of races.

(cherry picked from commit bc225fb)
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
Heartbeat has a many components, each adding fields to an existing
event. As events are created per 'TaskRun', this is normally fine. But
in case the TaskRunner branches of into multiple Sub-tasks, we will see
races on shared event structures. This is the case with multiple ports
in the http/tcp configuration or the IPAll setting (ping all known IPs
of a given domain).

Namespaces in an event can be potentially populated via globally shared
structures. This can also lead to unwanted updates or races.

This fix clones the event structure before continuing event updates, so
to remove the chance of races.

(cherry picked from commit bc225fb)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Heartbeat] Crashing on concurrent map write
2 participants