Line protocol write API #2696

jwilder · 2015-05-29T18:33:42Z

This PR adds a new write HTTP endpoint (/write_points) that uses a text based line protocol instead of JSON. The protocol is a list of points separated by newlines \n.

Each point is composed of three blocks separated by whitespace. The first block is the measurement name and tags separated by commas. The second block is fields separated by commas. The last block is optional and is the timestamp for the point as a unix epoch in nanoseconds.

measurement[,tag=value,tag2=value2...] field=value[,field2=value2...] [unixnano]

Each point must have a measurement name. Tags are optional. Measurement, tag and values can not have any spaces. If the value contains a comma, it needs to be escaped with \,.

Each point must have at least one value. The format of a field is name=value. Fields can be one of four types: integer, float, boolean or string. Integers are all numeric and cannot have a decimal point .. Floats are all numeric and must have a decimal point. Booleans are the values true and false. Strings must be surrounded by double-quores ". If the value contains a quote, it must be escaped \". There can be no spaces between consecutive field values.

For example,

cpu,host=serverA,region=us-west value=1.0 10000000000
cpu,host=serverB,region=us-west value=3.3 10000000000
cpu,host=serverB,region=us-east user=123415235,event="overloaded" 20000000000
mem,host=serverB,regstion=us-east swapping=true 2000000000

Points written in this format should be sent to the /write_points endpoint. The request should be a POST with the points in the body of the request. The content can also be gzip encoded.

The following URL params may also be sent:

db: required The database to write points
rp: optional The retention policy to write points. If not specified, the default retention policy will be used.
precision: optional The precision of the time stamps (n, u, ms, s,m,h). If not specified, n is used.
consistency: optional The write consistency level required for the write to succeed. Can be one of one, any, all,quorum. Defaults to all.
u: optional The username for authentication
p: optional The password for authentication

A successful response to the request will return a 204. If a parameter or point is not valid, a 400 will be returned.

PR Notes:

The parser has been tuned to minimize allocations and extra work during parsing. For example, the raw byte slice read in is held onto as much as possible until there is a need to modify it. Similarly, values are not unmarshaled into Go types until necessary. It also tries to validate the input using a single pass over the data as much as possible. Tags need to be sorted so it is preferable to send them in already sorted to avoid sorting on the server. The sort has been tuned as well so that it performs consistently over a large range of inputs.

My local benchmarks have parsing performing around 750k-2m/points/sec depending on the shape of the point data.

This changes the implementation of point to minimize the extra processing needed to parse and marshal point data though the system.

pauldix · 2015-05-29T18:44:07Z

Measurement, tags, tag values, field names, and field values should all be able to have spaces, providing they are escaped. I assume the list of characters to escape are , " =. The empty one being a space.

The default consistency should be changed to one.

pauldix · 2015-05-29T18:48:55Z

Overall looks good except for the updates to support spaces. Can you also update the Go client to use this endpoint and format instead?

jwilder · 2015-05-29T18:57:53Z

Ok. I'll fix the escaping and default consistency. I'll do a separate pr for the client.

otoolep · 2015-05-29T19:30:12Z

Good idea. Simple, but we can still curl it up.

beckettsean · 2015-05-29T20:26:47Z

The last block is optional and is the timestamp for the point as a unix epoch in nanoseconds. and precision: optional The precision of the time stamps (n, u, ms, s,m,h). If not specified, n is used. seem to be in conflict with each other.

Perhaps that first sentence should just be The last block is optional and is the timestamp for the point as a unix epoch. The documentation for the optional precision query param does say the default is nanoseconds if none is supplied.

beckettsean · 2015-05-29T20:27:05Z

Will this line protocol support Unicode characters in strings or only ASCII?

pauldix · 2015-05-29T20:33:54Z

@beckettsean you should not prominently on the docs that the tags should be sorted by the tag keys.

corylanou · 2015-05-29T20:36:26Z

Sorted for optimal performance, right? If they aren't sorted, we will sort them, but I think currently drops performance by 50%?

jwilder · 2015-05-29T20:38:09Z

@beckettsean I added a test for unicode just now. Seems to work for string field values at least. I'll need to check the other spots though.

Line protocol write API

beckettsean · 2015-05-29T20:39:55Z

@pauldix For the docs, Tag Keys should be ASCII sorted? Alpha sorted?

basically, what's the order for the following characters:

1 a A å Å _ - , . <space> <tab>

Are any of those illegal characters for this protocol?

pauldix · 2015-05-29T20:40:05Z

@corylanou that's right. In the docs for the protocol we should push them in the right direction. Meaning they should sort the tags.

Also, even though the timestamp is optional, we should push them to include it. This becomes important if they're running a cluster and they get a response back on the client side that said it was only a partial write. In most cases they would want to repost the data. However, they'll potentially get duplicates. UNLESS they include the timestamp for each point in their request to write.

beckettsean · 2015-05-29T20:46:18Z

@pauldix we will make clear in the docs the potential gotchas with not supplying a timestamp. The particular issue you describe only affects clusters, correct? For single node setups the writes are atomic so no partial write is possible, as far as I understand it.

pauldix · 2015-05-29T20:48:32Z

@beckettsean technically it's still possible with a single server if they're writing data with different timestamps that end up dividing the points across multiple shards.

However, if they do a write of a bunch of points with no timestamp specified, and it's a single server, then a partial write isn't possible. It'll either succeed entirely or fail i.e. Atomic.

gunnaraasen · 2015-05-29T21:25:41Z

Similar to what @pauldix mentioned. It would be great if measurement, tag key, tag values, field keys, and field values all followed the query language's identifier definition as closely as possible.

unquoted identifiers must start with an upper or lowercase ASCII character or "_"
unquoted identifiers may contain only ASCII letters, decimal digits, and "_"
double quoted identifiers can contain any unicode character other than a new line
double quoted identifiers can contain escaped " characters (i.e., \")

And string literals are single quoted.

Not sure if those rules make sense for the line format, but users will probably expect it to be similar to the query language identifier definition.

neonstalwart · 2015-05-29T21:29:22Z

how about rather than a new endpoint, continue to use /write but switch behavior based on the Content-Type header? this would be text/plain and the JSON would be application/json. that's a fairly common way to interact via HTTP.

pauldix · 2015-05-29T21:31:12Z

@gunnaraasen Those rules don't quite make sense for the line format mainly because this is so strict we know where the tag keys vs tag values are. The query language is much more flexible so it has more limitations.

Requiring double quotes around the identifiers (meaurement names, tag keys, and field keys) would be unnecessary for the line protocol for writing and would bloat the message.

pauldix · 2015-05-29T21:32:53Z

@neonstalwart the problem is that previously we didn't require a Content-Type to be set, so that could break the existing write API for a bunch of people.

Going forward, the JSON write endpoint is going to be deprecated and this new endpoint is going to be the preferred way of writing data in.

Also, in my experience, many people have trouble interacting with HTTP APIs that require you to set things in the headers. I know it's part of the thing, but if we're optimizing for ease of use, a different endpoint is the best.

neonstalwart · 2015-05-29T21:34:04Z

the problem is that previously we didn't require a Content-Type to be set, so that could break the existing write API for a bunch of people.

that's kind of a weak argument given timestamp -> time, name -> measurement.

pauldix · 2015-05-29T21:48:08Z

@neonstalwart I guess that's true since we broke it before. Not buying the usability argument? :)

neonstalwart · 2015-05-29T21:49:43Z

Going forward, the JSON write endpoint is going to be deprecated and this new endpoint is going to be the preferred way of writing data in.

😞 would you reconsider? i think HTTP+JSON is more or less the de facto way i interact with things these days. it's of course not the only way but with the move towards putting each service in its own container and using HTTP to communicate between components it seems very common in my experience.

Also, in my experience, many people have trouble interacting with HTTP APIs that require you to set things in the headers. I know it's part of the thing, but if we're optimizing for ease of use, a different endpoint is the best.

i agree that many people find it difficult to use HTTP. i just think that a /write that accepts json and a /write_points that accepts plain text is kind of awkward. to be kind of brutal, as a consumer considering a product, the API feels cheap (not well crafted and thought out) which leaves me wondering about the quality of the code. the API is your public face of this product.

allgeek · 2015-06-09T23:47:28Z

Everywhere under the hood we use a key to identify series. We use this in the underlying storage and we use it to route requests and writes within a cluster. In the case of the line protocol, that's the bytes up to the first space. We can get at it without doing any additional work or allocations.

That sounds like a huge benefit, and overall I'm definitely a fan of the line protocol approach for this use case (and of course the performance impacts as a result). However I'm wondering if this series-key optimization may be problematic if clients aren't pre-sorting the tags as suggested for performance purposes. If the parser re-sorts if necessary, but this shortcut is being used, wouldn't these keys still be the unsorted version? In the clustering redesign blog post, an assumption was made that equal points are duplicates. Would there be issues with receiving these two ('equal') points, but not really considering them duplicates?

cpu,host=serverA,region=uswest value=23.2

cpu,region=uswest,host=serverA value=23.2

It seems highly unlikely that a client would send 'duplicates' in different orders like this, but not knowing the full impact of the assumptions being made around these keys and handling of duplicates, my developer's spidey sense was tingling with potential edge case issues when I read the above quote..

otoolep · 2015-06-10T00:13:28Z

The problem you speak of @allgeek is accounted for deep in the system. Rest-assured the two example points you show are considered the same point, and the tags are sorted by key on every point before performing the identity check:

https://github.com/influxdb/influxdb/blob/master/tsdb/meta.go#L1099

jwilder · 2015-06-10T00:17:37Z

@allgeek For best performance, you should send them in pre-sorted if you can. If they are not sorted, they will be sorted before being stored. If they are already sorted, we don't attempt a sort. The parsing throughput drops by approximately 50% for unsorted tags and is proportional to the number tags present. It is still ~16x faster than JSON though and moves the bottleneck closer to the disks.

The two points you show would would have the same key but different timestamp since a timestamp is not shown. They would be two different points in the same series. If they both had the same timestamps, then they would be duplicate points.

otoolep · 2015-06-10T00:34:43Z

@jwilder is correct to point out the requirement for an identical timestamp. Just to be clear, I assumed the same timestamp for each point, in your example.

randywallace · 2015-06-12T03:37:59Z

Directly related to this PR, I hacked out a quick rubygem that facilitates using the LineProtocol. I wrote it so that I can follow this up with easy integration into our sensu infrastructure. Comments/concerns/complaints/PR's are warmly welcomed: https://github.com/randywallace/influxdb-lineprotocol-writer-ruby

It does correctly from my testing sort tag keys automatically.

I spent about 4 hours on this, and in that time couldn't find a reasonable way to get nanosecond/microsecond precision in ruby (although for our use cases it isn't at all useful); I also didn't test SSL. If anyone wants to help with that, its appreciated.

xfmoulet · 2015-06-12T16:37:39Z

spaces within strings (ie within double quotes) seem to need escaping, is this normal ?
example :

    test,a="hello" value=1 --> 204 no content
    test,a="hello there" value=1 --> 400 bad request
    test,a="hello\ there" value=1 --> 204 no content

is this normal ? why is this escaping needed ?

jwilder · 2015-06-22T19:11:28Z

@xfmoulet Tag values should not be double-quoted. Just escape the spaces with \ and don't surround with double-quotes.

test,a=hello\ there value=1

edlane · 2016-08-17T21:28:47Z

@jwilder I would like to revisit the assumptions made above considering the recent improvements to JSON parsing claimed here: https://github.com/buger/jsonparser#benchmarks
Json parsing performance has been an acknowledged embarrassment to the Go community in the past --- losing out to Python, Ruby, Lua, .... and of course to C.

Rather than abandoning JSON support entirely would it be better to just use a faster library?

jwilder · 2016-08-17T21:39:13Z

@edlane The JSON endpoint was disabled back in 0.11 and removed in 0.12.

edlane · 2016-08-17T21:40:55Z

@jwilder Yes, That is why I asked the question.

jwilder · 2016-08-17T21:52:10Z

@edlane We've moved on from JSON on the write path and are very unlikely to add it back. We have a proposal for a v2 line protocol that we're considering as well. JSON also presents some problems with sending int64 values because it only has float numbers.

We have the same performance/memory issues on the query side related to JSON and have been adding support for other formats csv, msgpack. Switching the marshaller to something more performant for the query side might be worthwhile though.

edlane · 2016-08-17T22:26:37Z

The int64 issue might be a Go decoder "feature"
https://golang.org/pkg/encoding/json/#Decoder.UseNumber

which appears fixable here:
http://stackoverflow.com/questions/16946306/preserve-int64-values-when-parsing-json-in-go

Mckane10 · 2020-06-08T10:44:27Z

pyformance sends nan values to influxdb

I use influxdb to retrieve the Sawtooth parameters to display them on Grafana. But when I launch my Sawtooth validator component I get this error "Warning Influx: Bad requests".
Can you please help me?
If there is a correction method, which file is it on?

yonglizhong · 2021-06-11T09:39:34Z

I would just like to confirm again with InfluxDB 2.0. There is no longer the possibility of writing data into the server as JSON format. Yes? only via the line protocol as shown in the documentation with cURL. My attempt with Postman using JSON Format was also unsuccesful. This was done by sending an additional header "Content-Type: application/json"

danxmoran · 2021-06-11T13:39:41Z

@yonglizhong correct, the API expects line protocol as a text/plain request.

YYF-CHINA · 2022-05-06T06:56:42Z

@jwilder If the key of a tag has a newline character, the value of the tag is not displayed. Hope to solve this problem, thank you.

jwilder added 3 commits May 29, 2015 11:18

Add text protocol parsing and serialzation for points

9a9bb73

This changes the implementation of point to minimize the extra processing needed to parse and marshal point data though the system.

Add initial write_points http handler for text protocol

e1322bb

Add ParsePointsWithPrecision to handle precision write argument

4e7c8bd

jwilder added the 2 - Working label May 29, 2015

Change default consistency level to one

c0d7143

Ensure comma,space and equals are escaped

870a183

jwilder added a commit that referenced this pull request May 29, 2015

Merge pull request #2696 from influxdb/jw-write-path

99bc7d2

Line protocol write API

jwilder merged commit 99bc7d2 into alpha1 May 29, 2015

jwilder removed the 2 - Working label May 29, 2015

jwilder deleted the jw-write-path branch May 29, 2015 20:38

beckettsean mentioned this pull request Jun 9, 2015

Update telegraf to use new line protocol rather than JSON influxdata/telegraf#10

Closed

otoolep mentioned this pull request Jun 10, 2015

Ensure tags are always marshalled the same way #2860

Merged

tima mentioned this pull request Jun 10, 2015

Support new InfluxDB v0.9.0 line protocol write API bernd/statsd-influxdb-backend#16

Open

victorhooi mentioned this pull request Jun 11, 2015

Add support for InfluxDB line protocol write API influxdata/influxdb-python#191

Closed

bratajczyk mentioned this pull request Jun 13, 2015

Support for InfluxDB v0.9 bernd/statsd-influxdb-backend#10

Open

KiNgMaR mentioned this pull request Jun 19, 2015

Parser works incorrectly with high-precision points #3054

Closed

beckettsean mentioned this pull request Jun 23, 2015

command "select * from measurement" do not return all the columns #3045

Closed

pauldix mentioned this pull request Jul 1, 2015

Adds Kafka Plugin influxdata/telegraf#35

Merged

jipperinbham mentioned this pull request Jul 27, 2015

InfluxDB 0.9 support compose/transporter#108

Closed

jwilder mentioned this pull request Aug 5, 2015

VERSION 0.9.2 not support json format post? #3556

Closed

tgermain mentioned this pull request Aug 12, 2015

Make the plugin compatible with Influxdb 0.9 logstash-plugins/logstash-output-influxdb#24

Closed

kisienya mentioned this pull request Jul 9, 2016

Getting JSON data to influxDB chrisfauerbach/suricata_stats_influx#1

Closed

This was referenced Sep 14, 2016

Sort tags before writing, for performance node-influx/node-influx#179

Merged

select * order tags before fields #7094

Closed

Line protocol write API #2696

Line protocol write API #2696

Conversation

jwilder commented May 29, 2015

PR Notes:

pauldix commented May 29, 2015

pauldix commented May 29, 2015

jwilder commented May 29, 2015

otoolep commented May 29, 2015

beckettsean commented May 29, 2015

beckettsean commented May 29, 2015

pauldix commented May 29, 2015

corylanou commented May 29, 2015

jwilder commented May 29, 2015

beckettsean commented May 29, 2015

pauldix commented May 29, 2015

beckettsean commented May 29, 2015

pauldix commented May 29, 2015

gunnaraasen commented May 29, 2015

neonstalwart commented May 29, 2015

pauldix commented May 29, 2015

pauldix commented May 29, 2015

neonstalwart commented May 29, 2015

pauldix commented May 29, 2015

neonstalwart commented May 29, 2015

allgeek commented Jun 9, 2015

otoolep commented Jun 10, 2015

jwilder commented Jun 10, 2015

otoolep commented Jun 10, 2015

randywallace commented Jun 12, 2015

xfmoulet commented Jun 12, 2015

jwilder commented Jun 22, 2015

edlane commented Aug 17, 2016 • edited Loading

jwilder commented Aug 17, 2016

edlane commented Aug 17, 2016

jwilder commented Aug 17, 2016

edlane commented Aug 17, 2016 • edited Loading

Mckane10 commented Jun 8, 2020 • edited Loading

yonglizhong commented Jun 11, 2021 • edited Loading

danxmoran commented Jun 11, 2021

YYF-CHINA commented May 6, 2022

edlane commented Aug 17, 2016 •

edited

Loading

edlane commented Aug 17, 2016 •

edited

Loading

Mckane10 commented Jun 8, 2020 •

edited

Loading

yonglizhong commented Jun 11, 2021 •

edited

Loading