-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PostgreSQL output plugin #3428
PostgreSQL output plugin #3428
Conversation
@svenklemm Any update on the status of this PR? As per #3408 it seems like it might be ready to merge. Maybe add @danielnelson as a reviewer? I'd love to use this plugin. |
@cwadley I'm using it for a couple weeks now without problems but it could use more testing and proper integration tests. |
@svenklemm I'll get to work :) |
Hi folks, First things first, thanks @svenklemm I learned a lot with your code I think this PR #3912 is a better/faster solution for large number of metrics. |
I wrote an experimental batching mode on top of this branch, see the commits in https://github.com/saaros/telegraf/tree/postgres_output_batching This speeds up the processing of 10k metric batches 5-10x on my test environment when running locally and even more when telegraf & pg are running on separate hosts. |
Thanks @otherpirate -- I had a look at your PR, but since it didn't have any DDL support I figured it's faster to add batching to this version than DDL to the other one. Would be great to get something in good enough shape to be merged to master. |
I've merged the batch changes from @saaros into this PR. |
@svenklemm Thanks for the heads up, it might take awhile yet but we will at this plugin as well as #3408, #3912, and #4205 |
@danielnelson any update on this? |
@svenklemm Sorry, I haven't had time to work on this yet. |
oops, that review was for the most recent changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
plugins/outputs/postgresql/README.md
Outdated
address = "host=localhost user=postgres sslmode=verify-full" | ||
|
||
## A list of tags to exclude from storing. If not specified, all tags are stored. | ||
# ignored_tags = ["foo", "bar"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this, seems you opted to use the global tagexclude
} | ||
|
||
func (p *Postgresql) generateInsert(tablename string, columns []string) string { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: remove this line
quoted = append(quoted, quoteIdent(column)) | ||
} | ||
|
||
sql := fmt.Sprintf("INSERT INTO %s(%s) VALUES(%s)", quoteIdent(tablename), strings.Join(quoted, ","), strings.Join(placeholder, ",")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: just return fmt.Sprintf(...
p.Tables[tablename] = true | ||
} | ||
|
||
var columns []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initialize columns
with "time"
as a value to skip line 231's append
} | ||
|
||
var columns []string | ||
var values []interface{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initialize values
with metric.Time()
as a value to skip line 232's append
|
||
err := p.db.QueryRow(query, where_values...).Scan(&tag_id) | ||
if err != nil { | ||
// log.Printf("I! Foreign key reference not found %s: %v", tablename, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented code
} | ||
|
||
for table_and_cols, values := range batches { | ||
// log.Printf("Writing %d metrics into %s", len(params[table_and_cols]), table_and_cols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented code
@danielnelson @glinton |
Nice work, would like to have this very much :) |
Just ran across another issue.
Will effectively be merged together as
When the postgresql output sends these, they're getting inserted as separate records.
Which outputs points like this:
And if you query the data in postgres, it looks like this:
Since it seems that TimescaleDB doesn't merge points like InfluxDB does (and vanilla postgres definitely doesn't), I think telegraf needs to be responsible for handling this and doing the merge. Now I'm not sure that this needs to be handled by the postgres plugin itself though. This seems like it might be a good candidate for a processor plugin. I'm also not sure if processors receive points as a continuous stream, or batches like output plugins. If it is a continuous stream, I think it would be good to add a "merge" processor which adds a short time buffer (e.g. 10ms), such that when a point comes through, it's held back for that configured amount of time looking for other points to merge. |
if we have
See: https://www.postgresql.org/docs/12/sql-insert.html#SQL-ON-CONFLICT |
Well right now the plugin doesn't use In addition, the creation of a unique constraint will impact performance, but I'm not sure by how much. |
🤔 AFAIK, we could use a |
Well it seems that such a plugin already exists: https://github.com/influxdata/telegraf/tree/master/plugins/aggregators/merge The way the plugin is implemented isn't the greatest solution, as it operates in batches rather than as a stream, meaning that there's still the risk of a single point getting written twice if it's in 2 different batches. But it's better than nothing. |
Just ran into another issue. I caused a situation (totally my fault) where a few records couldn't be written to the DB, and so errors were returned back to telegraf. I'm not sure how this plugin is handling transactioning, but other records were inserted fine. The problem is that the plugin kept trying to resend the bad records, along side the good records. This resulted in massive amounts of duplicate data in the database (approx 3.8 million rows). I think the error handling approach needs to be changed.
As it is, the only way I could stop the massive data duplication was to bounce telegraf so it'd discard the bad records and stop retrying them. |
Thanks for such great project, any ETA to merge this PR? |
@svenklemm or @blagojts, what are your thoughts around @phemmer's error handling concerns above? |
I like this RP very much. Is there a time point for merging. |
Hi team, |
Can we get some clarification on the state of this plugin & pull request. After about 10 months without any update or response, it would appear that it has been abandoned. This is unfortunate, but it happens as priorities change. But if it needs a new maintainer, can we state that so people know it's open for someone to pick up and carry? |
@svenklemm, please chime in if you would like to continue to own this. Otherwise anyone who's interested in working on building on @svenklemm's code feel free to address necessary fixes. @phemmer Seems like you've provided a lot of great feedback if you'd like to take it over. |
I don't know that I have the time to be able to take this on. But I'll keep it near the top of my list of potential work when not pressed for other work. In the mean time I'll keep adding my experiences with it. Currently the plugin's
|
As someone who used this plugin for months, I must warn everyone that this plugin does not scale well with many agents, many metrics and has many performance issues that quadruple when |
That's interesting. Can you elaborate any further? Given your statement that switching to an ORM will help performance, this implies that the performance issue is client side. I'm having a hard time imagining what that could be. I would expect the issue to be on the database side as the number of different series grows, in which it's not an ORM problem, but one of defining a proper segmentation key. Also I'm not familiar with GORM, but ORMs tend to revolve around mapping defined types (e.g. |
Performance issues exist both on DB side and client side. For example, I would like to reiterate my warning especially for deployments in commercial projects. |
Just FYI, I've started looking into addressing some of the issues on this plugin. @AtakanColak I think I found the performance issue you're experiencing. The issue is that when using I was able to solve this issue by changing the code to use a deterministic key. Meaning telegraf computes the tag ID and doesn't have to do a lookup. With this change I was able to go from about 15k points per minute to about 50k. Additionally with the need for using the With that, and some other hackery involving multiple connections, I'm able to get telegraf up to about 1m points per minute, a massive improvement over 15k. But it's still telegraf bottlenecking, and not the DB. I'm going to clean up the plugin a bit so I can remove some of the performance hacks I had to do, and will submit to a new branch. |
I've put up a draft PR of the changes over at #8651. While it is a draft, it should be fully functional, and I would welcome feedback. The plugin has been massively reworked. But it's a lot more flexible, fault tolerant, and performant. |
Closing this for #8651 |
I use freebsd13.0 Where can i download the sourcecode for this plugin to compile it ? |
This is an output plugin to store metrics in a postgresql database. It will create the tables required if they do not yet exist. Table names are derived from the metric name.
#3408