Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registry file fsync improvements #6988

Merged
merged 2 commits into from
May 3, 2018
Merged

Conversation

urso
Copy link

@urso urso commented May 2, 2018

  • Return error if Sync fails
  • Execute fsync on new parent directory
  • improve metrics:
    • registrar.writes.total: total number of registry write attempts
    • registrar.writes.fail: total number of failed write attempts
    • registrar.writes.success: total number of successfull write attempts

Resolves: #6792 (followup for #6877)

@urso urso added the review label May 2, 2018
@urso urso force-pushed the improve-registry-sync branch 2 times, most recently from 6032970 to 6f5d96d Compare May 2, 2018 12:02
@@ -41,10 +41,10 @@ var (
registryWrites = monitoring.NewInt(nil, "registrar.writes")
)

func New(registryFile string, registryFilePermissions os.FileMode, flushTimeout time.Duration, out successLogger) (*Registrar, error) {
func New(registryFile string, fileMode os.FileMode, flushTimeout time.Duration, out successLogger) (*Registrar, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported function New should have comment or be unexported

- Return error if Sync fails
- Execute fsync on new parent directory
- improve metrics:
  - registrar.writes.total: total number of registry write attempts
  - registrar.writes.fail: total number of failed write attempts
  - registrar.writes.success: total number of successfull write attempts
)

func New(registryFile string, registryFilePermissions os.FileMode, flushTimeout time.Duration, out successLogger) (*Registrar, error) {
func New(registryFile string, fileMode os.FileMode, flushTimeout time.Duration, out successLogger) (*Registrar, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported function New should have comment or be unexported

Copy link
Contributor

@ph ph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@urso I am OK with all the changes but a few things need to be added.

  • Fix the comment for the New method
  • This changes is worth a changelog, mentioning metrics + fsync on parent.

@ph ph added the Filebeat Filebeat label May 2, 2018
@urso
Copy link
Author

urso commented May 2, 2018

This changes is worth a changelog, mentioning metrics + fsync on parent.

These metrics are internal/private only. No need for changelog entry.

Copy link
Contributor

@ph ph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes lgtm waiting on ci

@ruflin
Copy link
Member

ruflin commented May 3, 2018

There is a clean_inactive failure on travis. Not sure if it is related: https://travis-ci.org/elastic/beats/jobs/374064172

@urso
Copy link
Author

urso commented May 3, 2018

The tests did succeed on jenkins. I wonder if the extra fsyncs change some timings, making some tests more flaky.

@ph
Copy link
Contributor

ph commented May 3, 2018

@urso more flaky maybe, we will need a bit more iteration to find out. Did you restart the failings tests on travis?

@urso
Copy link
Author

urso commented May 3, 2018

@ph yes, I did restart the test. Now it's green :) . Related beats-ci tests have been green.

@ph
Copy link
Contributor

ph commented May 3, 2018

@urso I will merge this, lets keep an eye out if this test is still flaky, I've seen it fails before :)

@ph ph merged commit 5916636 into elastic:master May 3, 2018
@ruflin
Copy link
Member

ruflin commented May 4, 2018

Should we backport this one?

@urso urso added the needs_backport PR is waiting to be backported to other branches. label May 4, 2018
@rockybean
Copy link
Contributor

When will this pr release? in 6.3? We actually met this kind of problem in our production environment. Though we can solve by deleting the registrar file, but as a result, filebeat will read old files once again. This is terrible.

@urso urso added v6.3.1 and removed needs_backport PR is waiting to be backported to other branches. labels Jun 18, 2018
urso pushed a commit to urso/beats that referenced this pull request Jun 18, 2018
* Registry file fsync improvements

- Return error if Sync fails
- Execute fsync on new parent directory
- improve metrics:
  - registrar.writes.total: total number of registry write attempts
  - registrar.writes.fail: total number of failed write attempts
  - registrar.writes.success: total number of successfull write attempts

(cherry picked from commit 5916636)
ph pushed a commit that referenced this pull request Jun 18, 2018
* Registry file fsync improvements

- Return error if Sync fails
- Execute fsync on new parent directory
- improve metrics:
  - registrar.writes.total: total number of registry write attempts
  - registrar.writes.fail: total number of failed write attempts
  - registrar.writes.success: total number of successfull write attempts

(cherry picked from commit 5916636)
@urso urso deleted the improve-registry-sync branch February 19, 2019 18:53
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
* Registry file fsync improvements

- Return error if Sync fails
- Execute fsync on new parent directory
- improve metrics:
  - registrar.writes.total: total number of registry write attempts
  - registrar.writes.fail: total number of failed write attempts
  - registrar.writes.success: total number of successfull write attempts

(cherry picked from commit f5179f9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants