Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize postgresql ingest pipeline #7269

Merged
merged 3 commits into from
Jun 12, 2018

Conversation

ph
Copy link
Contributor

@ph ph commented Jun 5, 2018

The postgresql ingest pipeline was not performing so well.
This PR use the following rules to improve the situation.

  • Anchor the Regular expression at the begining of the string.
  • Merge the multiple statements into a single RE
  • Do not use back reference for user/host delimiter.

Fixes: #7201

Note to reviewers:

Master branch to ingest 18 documents 26ms, this PR 11 ms.
Not I did not use dissect, since there are variations in the tokens.

@ph ph added in progress Pull request is currently in progress. Filebeat Filebeat labels Jun 5, 2018
@ph ph force-pushed the fix/performance-check-postgresql branch from 4de6c89 to 869b326 Compare June 5, 2018 15:42
ph added 2 commits June 5, 2018 15:15
The postgresql ingest pipeline was not performing so well.
This PR use the following rules to improve the situation.

- Anchor the Regular expression at the begining of the string.
- Merge the multiple statements into a single RE
- Do not use back reference for user/host delimiter.

Fixes: elastic#7201
@ph ph changed the title [WIP] Optimize postgresql ingest pipeline Optimize postgresql ingest pipeline Jun 5, 2018
@ph ph force-pushed the fix/performance-check-postgresql branch from a03b504 to ccc274f Compare June 5, 2018 20:31
@ph
Copy link
Contributor Author

ph commented Jun 5, 2018

I am working on a bigger tests scenario, but still its better :)

@@ -6,16 +6,11 @@
"field": "message",
"ignore_missing": true,
"patterns": [
"%{LOCALDATETIME:postgresql.log.timestamp} %{WORD:postgresql.log.timezone} \\[%{NUMBER:postgresql.log.thread_id}\\] %{USERNAME:postgresql.log.user}@%{POSTGRESQL_DB_NAME:postgresql.log.database} %{WORD:postgresql.log.level}: duration: %{NUMBER:postgresql.log.duration} ms statement: %{MULTILINEQUERY:postgresql.log.query}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we take this as a general rule that 1 pattern is better the multiple?

Should we perhaps introduce in some cases the processing in steps to first extract the common parts and then do the more complex things in a second processor. This would make things more readable and maintainable. Don't know what the affect on performance would be.

@ph
Copy link
Contributor Author

ph commented Jun 6, 2018 via email

@ph ph added review and removed in progress Pull request is currently in progress. labels Jun 11, 2018
@ph
Copy link
Contributor Author

ph commented Jun 11, 2018

Changed label from in progress to review

@ruflin ruflin merged commit 4a41587 into elastic:master Jun 12, 2018
@ruflin
Copy link
Contributor

ruflin commented Jun 12, 2018

@ph On your note about dissect: It seems that dissect could be used to extract parts of it?

@ruflin
Copy link
Contributor

ruflin commented Jun 12, 2018

I wonder if we should backport this to 6.3?

@ph
Copy link
Contributor Author

ph commented Jun 12, 2018

@ruflin we should, in a real world context the ingest pipeline is mostly unusable and could hurt your cluster.

@ph ph added the needs_backport PR is waiting to be backported to other branches. label Jun 12, 2018
ph added a commit to ph/beats that referenced this pull request Jun 14, 2018
@ph ph added v6.3.1 and removed needs_backport PR is waiting to be backported to other branches. labels Jun 15, 2018
ph added a commit to ph/beats that referenced this pull request Jun 15, 2018
The postgresql ingest pipeline was not performing so well.
This PR use the following rules to improve the situation.

- Anchor the Regular expression at the begining of the string.
- Merge the multiple statements into a single RE
- Do not use back reference for user/host delimiter.

Fixes: elastic#7201

Master branch to ingest 18 documents 26ms, this PR 11 ms.
Not I did not use dissect, since there are variations in the tokens.

(cherry picked from commit 4a41587)
ruflin pushed a commit that referenced this pull request Jun 18, 2018
The postgresql ingest pipeline was not performing so well.
This PR use the following rules to improve the situation.

- Anchor the Regular expression at the begining of the string.
- Merge the multiple statements into a single RE
- Do not use back reference for user/host delimiter.

Fixes: #7201

Master branch to ingest 18 documents 26ms, this PR 11 ms.
Not I did not use dissect, since there are variations in the tokens.

(cherry picked from commit 4a41587)
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
The postgresql ingest pipeline was not performing so well.
This PR use the following rules to improve the situation.

- Anchor the Regular expression at the begining of the string.
- Merge the multiple statements into a single RE
- Do not use back reference for user/host delimiter.

Fixes: elastic#7201

Master branch to ingest 18 documents 26ms, this PR 11 ms.
Not I did not use dissect, since there are variations in the tokens.

(cherry picked from commit e801aaa)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants