Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not close filestream harvester if an unexpected error is returned when close.on_state_change.* is enabled #26411

Merged

Conversation

kvch
Copy link
Contributor

@kvch kvch commented Jun 22, 2021

What does this PR do?

This PR returns early if close.on_state_change.removed is enabled and the opened file no longer exists. Otherwise, it logs an error message and keeps the reader running.

Why is it important?

Previously, a message has been logged on error level and the reader has been stopped if the Stat call returned an error. However, it was not correct because if close.on_state_change.renamed was enabled the reader would have been closed if the file had been removed. Now the reader is not stopped.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
    - [ ] I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 22, 2021
@kvch kvch requested a review from urso June 22, 2021 14:55
@kvch kvch added the Team:Elastic-Agent Label for the Agent team label Jun 22, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 22, 2021
@kvch kvch added backport-v7.13.0 Automated backport with mergify backport-v7.14.0 Automated backport with mergify needs_team Indicates that the issue/PR needs a Team:* label labels Jun 22, 2021
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 22, 2021
@botelastic
Copy link

botelastic bot commented Jun 22, 2021

This pull request doesn't have a Team:<team> label.

@mergify
Copy link
Contributor

mergify bot commented Jun 22, 2021

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b feature-filebeat-filestream-decrease-log-level upstream/feature-filebeat-filestream-decrease-log-level
git merge upstream/master
git push upstream feature-filebeat-filestream-decrease-log-level

@kvch kvch force-pushed the feature-filebeat-filestream-decrease-log-level branch from 7973b7b to 8ab0707 Compare June 22, 2021 15:10
@elasticmachine
Copy link
Collaborator

elasticmachine commented Jun 22, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #26411 updated

  • Start Time: 2021-06-24T11:49:52.843+0000

  • Duration: 99 min 11 sec

  • Commit: 8357c74

Test stats 🧪

Test Results
Failed 0
Passed 14125
Skipped 2311
Total 16436

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 14125
Skipped 2311
Total 16436

@mergify
Copy link
Contributor

mergify bot commented Jun 23, 2021

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b feature-filebeat-filestream-decrease-log-level upstream/feature-filebeat-filestream-decrease-log-level
git merge upstream/master
git push upstream feature-filebeat-filestream-decrease-log-level

return true
}

// If an unexpected error happens we keep the reader open hoping once everything will go back to normal.
f.log.Errorf("Unexpected error reading from %s; error: %s", f.file.Name(), statErr)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one still might produce a false-positive log message. The shouldBeClosed method is called by a separate go-routine independent from the reader go-routine. When the reader go-routine shuts down (cancelled event) while shouldBeClosed is run we will see an error telling us that f is invalid because it was already closed.

In order to prevent the false-positive message, we either need a mutex or some ref-counting on f + a wrapper on the reader that allows us to unblock a pending read call if the file gets closed async.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to solve the race in a separate PR.

@kvch kvch merged commit 1106449 into elastic:master Jun 24, 2021
mergify bot pushed a commit that referenced this pull request Jun 24, 2021
…when close.on_state_change.* is enabled (#26411)

## What does this PR do?

This PR returns early if `close.on_state_change.removed` is enabled and the opened file no longer exists. Otherwise, it logs an error message and keeps the reader running.

## Why is it important?

Previously, a message has been logged on error level and the reader has been stopped if the `Stat` call returned an error. However, it was not correct because if `close.on_state_change.renamed` was enabled the reader would have been closed if the file had been removed. Now the reader is not stopped.

(cherry picked from commit 1106449)
mergify bot pushed a commit that referenced this pull request Jun 24, 2021
…when close.on_state_change.* is enabled (#26411)

## What does this PR do?

This PR returns early if `close.on_state_change.removed` is enabled and the opened file no longer exists. Otherwise, it logs an error message and keeps the reader running.

## Why is it important?

Previously, a message has been logged on error level and the reader has been stopped if the `Stat` call returned an error. However, it was not correct because if `close.on_state_change.renamed` was enabled the reader would have been closed if the file had been removed. Now the reader is not stopped.

(cherry picked from commit 1106449)
kvch added a commit that referenced this pull request Jun 24, 2021
…when close.on_state_change.* is enabled (#26411) (#26477)

## What does this PR do?

This PR returns early if `close.on_state_change.removed` is enabled and the opened file no longer exists. Otherwise, it logs an error message and keeps the reader running.

## Why is it important?

Previously, a message has been logged on error level and the reader has been stopped if the `Stat` call returned an error. However, it was not correct because if `close.on_state_change.renamed` was enabled the reader would have been closed if the file had been removed. Now the reader is not stopped.

(cherry picked from commit 1106449)

Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
kvch added a commit that referenced this pull request Jun 24, 2021
…when close.on_state_change.* is enabled (#26411) (#26476)

## What does this PR do?

This PR returns early if `close.on_state_change.removed` is enabled and the opened file no longer exists. Otherwise, it logs an error message and keeps the reader running.

## Why is it important?

Previously, a message has been logged on error level and the reader has been stopped if the `Stat` call returned an error. However, it was not correct because if `close.on_state_change.renamed` was enabled the reader would have been closed if the file had been removed. Now the reader is not stopped.

(cherry picked from commit 1106449)

Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
mdelapenya added a commit to mdelapenya/beats that referenced this pull request Jun 28, 2021
* master: (32 commits)
  [Metricbeat] Change Account ID to Project ID in `gcp.billing` module (elastic#26412)
  update libbeat fields.ecs.yml file and ecsVersion to 1.10.0 (elastic#26121)
  [Filebeat] Update AWS ELB ingest pipeline (elastic#26441)
  [FIlebeat] add strict_date_optional_time_nanos date format to PanOS module (elastic#26158)
  Fix the irregular and typo on prometheus module. (elastic#25726)
  [Filebeat] Parse additonal debug data fields for Okta module (elastic#25818)
  fix: update MSSQL Server linux image's Docker registry (elastic#26440)
  Update indexing.go godocs (elastic#26408)
  Do not close filestream harvester if an unexpected error is returned when close.on_state_change.* is enabled (elastic#26411)
  Add support for copytruncate method when rotating input logs with an external tool in `filestream` input (elastic#23457)
  Allow fields with ip_range datatype (elastic#26444)
  Add Anomali ThreatStream support to threatintel module (elastic#26350)
  fix: use the right param type (elastic#26469)
  [Automation] Update elastic stack version to 8.0.0-7640093f for testing (elastic#26460)
  Set SM Filebeat modules as GA (elastic#26226)
  Fix rfc5464 date parsing in the syslog input (elastic#26419)
  Add linked account information into billing metricset (elastic#26285)
  [Filebeat] Update HA Proxy log grok patterns (elastic#25835)
  disable metricbeat logstash test_node_stats (elastic#26436)
  chore: pass BEAT_VERSION when running E2E tests (elastic#26291)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v7.13.0 Automated backport with mergify backport-v7.14.0 Automated backport with mergify Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants