-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: postgresql output #8651
feat: postgresql output #8651
Conversation
@phemmer Does this PR cover the copy method outlined inhttps://github.com//pull/3912#issuecomment-731124375? We'd like to have this be the single Postgresql output PR moving forward. |
Yes, COPY is utilized for all data inserts. |
Perfect. Let's move forward with this PR then. Is this in a state yet to have someone on the Influx team to review or still in draft mode? |
Yes it's still a draft. None of the tests have been written. Was waiting to see if there were any major changes requested. Since it seems quiet on that front, I can assume there are none and proceed. |
Hey @phemmer! I really want to see this tremendous work merged especially as there seems to be huge interest from the community. However, ~2,500 LoC are impossible to review (even with a lot of beer ;-)), so we need a strategy to get this in using a staged approach. |
If you are like me and wanted a docker build embedding the latest available postgresql/timescaledb output with telegraf 1.17.2:
|
Are there any risks involved with two telegraf agents on different hosts attempting to create tables (for new metrics) or add columns at the same time? |
Also, you provide a very complex templating example with compression; is there any way in the PR's current form for different metrics to have different Postgres/Timescale configurations re: compression or time partition window? |
@phemmer any news here? |
Off the top of my head, yes. If both operations occur at the exact same moment (which will be pretty difficult, as the time window beetween check & create is very small), one of the clients will get an error and drop the metrics it can't insert (in the case of a table or tag column) or drop the fields (in the case of a field column). However this should be very easy to compensate for by configuring the error code for "table already exists" and "column already exists" into temporary errors, which will cause the plugin to retry. I'll add this to the tests.
Yes. You can add a conditional clause to the template that would use different SQL based on your chosen condition. But you can also have 2 postgres outputs to the same database.
I've been working on some other higher priority stuff, so haven't had much time to get back to this. I expect to be able to get back on it next week. |
plugins/outputs/postgresql/README.md
Outdated
# Send metrics to postgres | ||
[[outputs.postgresql]] | ||
## specify address via a url: | ||
## postgres://[pqgotest[:password]]@localhost[/dbname]\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, the "postgres://" style connection strings no longer work (they did in the prev PR). The "simple string" approach seems to work.
@phemmer just let me know if you can split out PRs for me to review... :-) |
I don't think a split is likely. I'll won't dismiss the idea outright, and I'll give it honest consideration, but I suspect ripping out chunks of functionality and substituting it with simpler versions is likely too much work. Plus at that point is it really a code review if the final thing isn't what's being reviewed? I'll make a final pass at organizing the code before taking PR out of draft, but right now the distribution of code seems fairly sane:
|
I've been trying this PR out in my local setup; I've had success sending it metrics from inputs.cpu and a custom socket_listener via the
Full telegraf log: https://gist.github.com/machty/fc3242a4a743917698b2c81d22c33e8e Is this an issue with my setup or with the PR code? SolvedThe issue was that the metrics were coming in uppercase, and uppercase table names in postgres tables requires special escaping/quoting. The easiest solution for me was to put in a telegraf string processor to downcase all measurement names before the postgres output was reached:
|
The PR code.
Spot on. I thought I had all uses of table names properly quoted. But it turns out there was an obscure one I didn't expect: the
I've fixed this locally. But there are a few other pending changes I don't want to push just yet. Thanks for reporting. This was an interesting one to track down. |
I wonder if downcasing metric names should be an (opt-out-able) default for this PR (better to introduce it now as a default than change it later). So many downstream use cases are made more annoying/difficult/footgunny by letting in case-sensitive table names. FYI I didn't keep track of the exact error, but I was also running into a situation where very long names exceeded the 63 char Postgres table name max, and it was causing some issues into I introduced a few string processors to shorten known long strings in the metric name (e.g. "Rack_Server_All_GC_minor_gc_count" to "rs_all_gc_minor_gc_count"). That said, this could be something in Here's my create_templates:
|
I'm of the opinion that the plugin shouldn't mess with the casing. It shouldn't be something that surprises people down the road, as it's immediately noticable. It's easy enough to change in the config like you did. Probably worth mentioning in the README for the plugin though as "things which you probably want to handle in your config".
Yes, that's another one. Right now the plugin should just spit back the error it gets from postgres. But again, I don't think I want to add some automatic behavior to truncate strings, as truncation might end up merging different measurements into the same table that shouldn't be merged. I think it better to present the error to the user so they can ensure their table names have meaningful values.
The latter 2 statements raise an interesting use case. I'm not familiar with pg_partman, but how would you safely pass table names with special characters if you can't use quoting? Looking at pg_partman's changelog, they clearly support it as it was mentioned in version 2.1.0 release. |
I'm not sure about pg_partman. From my limited experience they seem to track managed tables in the |
Have taken the PR out of draft status. At this point I'm just doing cleanup operations. Just now noticed I need to update the readme. There's also that commented line in |
TimeColumnName = "time" | ||
TimeColumnDataType = utils.PgTimestampWithTimeZone | ||
TagIDColumnName = "tag_id" | ||
TagIDColumnDataType = utils.PgBigInt | ||
TagsJSONColumnName = "tags" | ||
FieldsJSONColumnName = "fields" | ||
JSONColumnDataType = utils.PgJSONb | ||
) | ||
|
||
var TimeColumn = utils.Column{TimeColumnName, TimeColumnDataType, utils.TimeColType} | ||
var TagIDColumn = utils.Column{TagIDColumnName, TagIDColumnDataType, utils.TagsIDColType} | ||
var FieldsJSONColumn = utils.Column{FieldsJSONColumnName, JSONColumnDataType, utils.FieldColType} | ||
var TagsJSONColumn = utils.Column{TagsJSONColumnName, JSONColumnDataType, utils.TagColType} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please avoid exporting those as I think they are only used internally.
func ColumnFromTag(key string, value interface{}) utils.Column { | ||
return utils.Column{key, utils.DerivePgDatatype(value), utils.TagColType} | ||
} | ||
func ColumnFromField(key string, value interface{}) utils.Column { | ||
return utils.Column{key, utils.DerivePgDatatype(value), utils.FieldColType} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a matter of taste, but you might want to dissolve those functions and use the function body directly at the caller spot.
## Postgres schema to use. | ||
schema = "public" | ||
|
||
## Store tags as foreign keys in the metrics table. Default is false. | ||
tags_as_foreign_keys = false | ||
|
||
## Suffix to append to table name (measurement name) for the foreign tag table. | ||
tag_table_suffix = "_tag" | ||
|
||
## Deny inserting metrics if the foreign tag can't be inserted. | ||
foreign_tag_constraint = false | ||
|
||
## Store all tags as a JSONB object in a single 'tags' column. | ||
tags_as_jsonb = false | ||
|
||
## Store all fields as a JSONB object in a single 'fields' column. | ||
fields_as_jsonb = false | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand those correctly, they are containing the default value. If so, please comment them as
## Postgres schema to use. | |
schema = "public" | |
## Store tags as foreign keys in the metrics table. Default is false. | |
tags_as_foreign_keys = false | |
## Suffix to append to table name (measurement name) for the foreign tag table. | |
tag_table_suffix = "_tag" | |
## Deny inserting metrics if the foreign tag can't be inserted. | |
foreign_tag_constraint = false | |
## Store all tags as a JSONB object in a single 'tags' column. | |
tags_as_jsonb = false | |
## Store all fields as a JSONB object in a single 'fields' column. | |
fields_as_jsonb = false | |
## Postgres schema to use. | |
# schema = "public" | |
## Store tags as foreign keys in the metrics table. Default is false. | |
# tags_as_foreign_keys = false | |
## Suffix to append to table name (measurement name) for the foreign tag table. | |
# tag_table_suffix = "_tag" | |
## Deny inserting metrics if the foreign tag can't be inserted. | |
# foreign_tag_constraint = false | |
## Store all tags as a JSONB object in a single 'tags' column. | |
# tags_as_jsonb = false | |
## Store all fields as a JSONB object in a single 'fields' column. | |
# fields_as_jsonb = false | |
Download PR build artifacts for linux_amd64.tar.gz, darwin_amd64.tar.gz, and windows_amd64.zip. 📦 Click here to get additional PR build artifactsArtifact URLs |
How have you been visualizing the data? I have been running the plugin for few weeks now and its working but I can't seem to do a group by host tags in Grafana. |
##### Multi-node | ||
```toml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
##### Multi-node | |
```toml | |
##### Multi-node | |
```toml |
statements. This allows for complete control of the schema by the user. | ||
|
||
Documentation on how to write templates can be found here: | ||
[https://pkg.go.dev/github.com/influxdb/telegraf/plugins/outputs/postgresql/sqltemplate](https://pkg.go.dev/github.com/influxdb/telegraf/plugins/outputs/postgresql/sqltemplate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reimda will the linter always complain about a line like this or will the linter be ok with URL label syntax like this:
[https://pkg.go.dev/github.com/influxdb/telegraf/plugins/outputs/postgresql/sqltemplate][0]
[0]: https://pkg.go.dev/github.com/influxdb/telegraf/plugins/outputs/postgresql/sqltemplate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's the way to do it. Long URLs are fine when you use reference style links: https://www.markdownguide.org/basic-syntax/#reference-style-links
Just FYI since it looks like this is getting reviewed: I pushed the other day, but see the test/integration failures. I'm guessing something changed with regards to how integration tests work. I haven't had a chance to go figure it out yet. |
Yeah, I was debating what to do here. We moved all the integration tests to use test containers and removed the docker-compose file that was previously there. That way we can actually run tests more easily, including in PRs. The command the CI uses is Here is an example for testing with mariadb. |
I guess this won't make today's minor release? ;)
|
Hi, is there any progress in this merger. |
No. I need to set aside some time to go back and figure out the integration thing. But honestly, I'm getting extremely frustrated with this whole process, and It's making me want to spend less time on this. It's like trying to hit a moving target. Every single time I do a rebase/merge, there's some conflict and/or change. It's a guarantee that the But yes, I'll get to it. Maybe by the end of the week. |
I hear your frustration. Please see how many people appreciate your effort and contribution, and look forward to seeing this landed. You have made many highly valuable contributions and I again look forward to landing this one.
Do let us know if there is something we should or should not be doing when updating dependencies or doing other reviews to reduce conflicts. It is unclear to me if we should be doing something different; however, I would love to know if there is.
This is a totally fair criticism. We have made some changes to how the sample configuration is stored to make it easier for our contributors and to keep data in sync. Of course, this churn made it more difficult for PRs in flight. Again, we hear the feedback and appreciate your patience. We made a conscious effort over the past few months to get back to more consistently reviewing PRs in a timely fashion, responding to issue reports as they come in, and continuing to drive down the total issue and PR count. While we have not been perfect, I do believe we are improving the project for the better. |
I would be happy to land this with:
@phemmer thoughts? |
Hello all involved, I am not sure if it can help somehow but I've found some example configuration for CircleCI with Postgres here, @phemmer (or anybody who can help to resolve the "test-integration" issue). I personaly know literaly nothing about all the "CI stuff" around etc. so I can't help with this myself. The rest is just some Markdown formatting stuff that can be fixed easily, IMHO. Many people are waiting for this output plugin to be finally merged and for quite a long time! If anybody knows how to solve the issues so that it get merged, please do. Thank you. 😞 |
I resolved linting issues and conflicts in go.mod and sent a pull request to Patrick. Moreover, according to |
@powersj is there anything the core telegraf team can do to help @phemmer get this over the line ? It's been going on for an age now and it seems like it's all there but for merge conflicts. Hopefully someone that knows the telegraf codebase and build would be able to get it into a mergeable state, no? |
Would someone kindly be able to trigger a new build as the existing build artifacts have disappeared from existence? |
Functionally, this is the same as influxdata#8651. The differences are two fold right now: 1) tests all use test-containers and right now do not have the ability to use a local postgresql database 2) The tests expecting pguint extension will skip untill the testcontainer startup installs that extension.
Hi Folks, I am looking to drive this PR to completion. I have opened up #11672, which includes this PR, with the remaining open changes: markdown change + using test containers for integration tests complete. From the user side, these changes are not interesting. Having these automated tests with test-containers keeps this plugin testable for future PRs and part of our nightly testing. I will work with the team to land that PR. I am happy to work with @phemmer to close some of the additional gaps on testing with a local instance + pguint. Thanks! |
Functionally, this is the same as influxdata#8651. The differences are two fold right now: 1) tests all use test-containers and right now do not have the ability to use a local postgresql database 2) The tests expecting pguint extension will skip untill the testcontainer startup installs that extension.
Hi, Once again, a huge thank you to @phemmer for driving this PR for so long. I have merged #11672, which contains this PR with some changes for tests. This means that tomorrow's nightly builds will include the PostgreSQL output plugin. Our next minor release, v1.24.0, in September will include this plugin. @phemmer - the entire team really would like your continued assistance if you are willing. As such, if you want to see about modifying the tests to allow for them to run in testcontainers and locally to aid in debugging, I would be happy to see a PR. Thanks for everyone's patience. I will be closing this PR now. |
hello, I upgraded to telegraf 1.25 recently and now my previous conf file throws error below: C:\Dashboard\Telegraf>telegraf.exe --config telegraf.conf --debug this is part of my conf file, which was working before: [[outputs.postgresql]] Store tags as foreign keys in the metrics table. Default is false.tags_as_foreignkeys = true Default templatetable_template = "CREATE TABLE IF NOT EXISTS {TABLE}({COLUMNS})"Example for timescaledbtable_template = "CREATE TABLE IF NOT EXISTS {TABLE}({COLUMNS}); SELECT create_hypertable({TABLELITERAL},'time',chunk_time_interval := '1 month'::interval,if_not_exists := true);" All my queries in grafana UI uses queries with tag_id .. any help to get the setup working ? |
@raorad as you were using an old unofficial version of telegraf, the official release is subject to changes, and different configuration syntax. You can find the documentation for the released plugin here: https://github.com/influxdata/telegraf/blob/v1.25.0/plugins/outputs/postgresql/README.md |
This PR provides a PostgreSQL output plugin.
This is a continuation of #3428 but massively reworked. There are a lot of breaking changes to address issues/limitations, as well as performance optimizations.
The performance optimizations are not minor either. The original code from #3428 benchmarked at around 250 points per second (using real production data). Its been a bit since I last ran a benchmark, but last time I did I was able to get around 60,000 points per second.
closes #3408 closes #3428
Major changes from #3428:
tags_as_foreign_keys
.Previous code utilized an auto-generated serial to obtain the tag ID for a set of tags. This resulted in a severe bottleneck as the tag IDs have to be looked up from the database for every single series. The new code hashes the tags & their values (within telegraf) to generate the tag ID.
The plugin allows multiple simultaneous inserts into the database. Each batch is split into a separate insert (COPY) per measurement/table, which can run in parallel. If the plugin receives batches faster than a single connection can write, each batch will also be inserted in parallel.
numeric
data type foruint64
.Previous code used
bigint
foruint64
. This would result in overflow errors when inserting values larger than anint64
asbigint
is a signed 64-bit integer.numeric
is an arbitrary precision exact value numeric data type. It is less performant thanbigint
, but it's the only datatype that can hold the fulluint64
range.For table creation & column addition, a more powerful mechanism was implemented. The new template statements use golang's text/template library with lots of provided variables, methods, & functions to allow for a virtually any use case. Some example configs are provided below to demonstrate. Comprehensive documentation on this functionality will be added.
The old code didn't use transactions, and would infinitely retry the entire batch on error. This resulted in things like duplicate inserts. As mentioned earlier, each measurement is a separate sub-batch so this mitigates some of the scope of errors. Each sub-batch is inserted in a transaction, so there's no risk of duplicates. In addition, the plugin is able to discern between temporary and permanent errors. A permanent error is something like bad SQL that will always fail now matter how many times you retry it. A temporary error is something like a failed connection, where a retry may succeed. Temporary errors will infinitely retry (with incremental backoff), while permanent errors will discard the sub-batch.
Note that there are breaking changes in the db schema:
tags_as_foreign_keys
, thetag_id
column type is nowbigint
. However if the column type is changed tobigint
preserving the records, then while the plugin will use newtag_id
values, and insert new records, joins should be functional.uint64
, the column type is nownumeric
instead ofbigint
. Leaving the column type asbigint
should still work unless values exceed the maximum ofbigint
.tags_as_foreign_keys
, must be commented with the stringtag
at the beginning. Failure to do so will result in some of the add column template functionality not working properly.Right now this PR is a draft as I want to run it in an environment with a TimescaleDB similar to how we plan on deploying it to production. However there are several blocking issues with TimescaleDB preventing me from being able to do this. All the tests are still for the old code, and have not been updated (and thus won't even compile). There's also still a few more minor changes to make, as well as general cleanup. So I'm taking this opportunity to put it out for preview so that feedback can be gathered and any architectural changes can be made.
I have not done exhaustive testing, so there may be bugs. I do not know of any right now, so if any are found, please raise them.
Example templating
Below are some example templates I've been using in my testing. The scenario is for use with TimescaleDB. The templates basically allow creating the tables in the
telegraf
schema, and then a view in thepublic
schema which joins the tag table and the data table making it easier to work with. In addition since you cannot add columns to a TimescaleDB hypertable with compression, it creates a new table when columns are added, and creates another view whichUNION
s the old and new tables.This is probably one of the most complex use cases possible, but it demonstrates the power and flexibility of the templating.
Required for all PRs: