-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create InfluxQL 2.0 initial proposal #8591
Conversation
node_cpu{cpu="cpu0",mode="user"} 182168.46875 | ||
|
||
# Influx 2.0 | ||
_metric:node_cpu,cpu:cpu0,mode:user 182168.46875 1491675816 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_metric
or metric
(as on lines 133-136)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DanCech given the _ having special meaning _metric
makes sense.
the special
_metric
tag key
doc/influxql-2.md
Outdated
|
||
There are a set of tag keys used by the system that start with `_` that are well known. Some of them are for conversions from Influx 1.x or Prometheus while others are for information useful in graphing. | ||
|
||
* **_units** - the units of the measurement - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this plural?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the units are always plural. You'd say bytes, not byte or seconds, not second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of this tag would be a single unit of measurement.
This looks really interesting @pauldix, chaining functions together definitely makes things easier to follow as the data flows through the pipeline. The flexibility of using variables to build up scripts is definitely going to offer a lot of power, but will certainly create some challenges on the UI side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarifications for few proposed changes.
doc/influxql-2.md
Outdated
|
||
* **tag** - a key/value pair of strings. (e.g. "region":"host" or "building":"2"). Tags MUST NOT start with `#`. Tag keys starting with `_` have special meaning and should not be used by the user unless they conform to the predefined set of keys that use `_`. | ||
* **system tag** - A tag with a key that starts with `_` followed by letters and/or numbers. These are indexed just like tags, but are used by the system for compatibility layers with other systems. | ||
* **meta** - A JSON style object associated with a series. This data is not indexed and can only be included/joined in a query. Its purpose is to add more context to series for user interfaces and additional calculations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meta is the meta for a database or for a series? sorry to jump the gun if its explained below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like series to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's meta for the series, but after talking to some people I'm going to pull this out of the query language or line protocol. We'll offer other APIs for associating non-indexed metadata with tags and other stuff.
doc/influxql-2.md
Outdated
|
||
### Definitions | ||
|
||
The following definitions encompass the entire vocabulary of the data model. Where appropriate, JSON examples are given, but the protocol will also support a line or protobuf based structure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO: Since we moved a long way from JSON format for writes, i think the example should show how current line protocol command will looks like in target state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll be working out the result format in the next four weeks or so. Will add more detail when we have it.
node_cpu{cpu="cpu0",mode="user"} 182168.46875 | ||
|
||
# Influx 2.0 | ||
_metric:node_cpu,cpu:cpu0,mode:user 182168.46875 1491675816 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DanCech given the _ having special meaning _metric
makes sense.
the special
_metric
tag key
doc/influxql-2.md
Outdated
|
||
* **_units** - the units of the measurement - | ||
* **_type** - the type of the series - [int64, uint64, float64, bool, string] | ||
* **_samplingRate** - duration string for how frequently this series is expected to be sampled. `0` indicates that it is an event stream (irregular series) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we call it _samplingInterval? Sampling rate is assumed to be number of samples in one second for people with electronics background.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, interval is definitely the correct term
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call
doc/influxql-2.md
Outdated
*TODO:* give this more definition. | ||
|
||
Meta data can be written with a line starting with `##`. The meta data will be associated with the series on the next line. Meta data is a JSON object. Keys in the object starting with `_` are reserved for the system. The data in meta is not indexed. The only validation is that it is a parseable JSON object. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be able to add just meta for an already existing series. So the write protocol should support it. Thinking out loud, if i have a IOT stream from various devices, each giving out a device name as a tag, I should be able to add meta data to group them, and define that they are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd want to use tags for that, since those would be attributes you'd be likely to query by, so I'd be looking at either making them smart enough to send those tags natively or using an "enrichment" layer in to add them as part of the ingest pipeline.
doc/influxql-2.md
Outdated
|
||
#### Keys | ||
`keys` will return an array of tag keys that occur in the matrix. The output of keys function can be used as the predecessor argument to `empty` or `not empty` in a `where` expression. For pagination through keys, `limit` can be chained off it. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the Keys be limited to a window? One major demand with current InfluxQL is to have Show Tag Keys
to support a where time
clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For that you'd have to have a range
, but it might not have the best performance. For instance: select(database:"foo").range(start:-4h).last().keys()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, but its vital for proper grafana dashboard with template variable. e.g. If the access log is fed to Influxdb, visualizing the traffic pattern would be much easier if the URL is a tag, and that tag keys should be restricted to the time window. Current Influx shows all keys, which makes the grafana drop down a mess.
doc/influxql-2.md
Outdated
.sum() | ||
.interpolate(start:-30m,value:0) | ||
.join(keys:["host"],exp: | ||
{('metric'."errors".float()/'metric'."requests" * 100).float(fixed:1) AS 'metric'."error_rate"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if errors = 10 and requests = 0? 10/0? null
, 0.0
, none
, ""
, undefined behavior, a failed query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll probably want to give the user options to define what they want. Return error, set to zero and ignore, or drop the value entirely (null)
Personally, I really dislike the new design. Primarily because it makes it extremely difficult to work with a single "point" with multiple "field"s (to use the term definitions in InfluxQL 1). For graphing UIs (grafana & chronograf) this change makes sense as they are graphing individual fields (or a "series" in InfluxQL 2 terms). Even if you have multiple fields on a single graph, they're still treated as separate data sets. However for non-graphing use cases, where a client is querying the data to perform calculations and other analytics on it, it now becomes extremely difficult to work with. Now if I want to query these multiple fields over a range of time, I have to join them together manually. Even the proposed Not my main use case, but a simple one which exemplifies the issue, is a kapacitor alert on disk space. Lets say I have 2 fields, |
I really like the new language. It reminds me of ReQL from Rethinkdb. It makes it possible to create really nice client side query builders like this https://github.com/GoRethink/gorethink. |
I wonder if it would be possible to create custom functions that would be combination of functions provided by influxdb. For example, Graphite's join is some combination of new influxdb's interpolate+join and I can see users using them together a lot. I also imagine that there are usage patterns that are very common for certain metrics in certain organization that are uncommon otherwise. It would be nice to gather those and declare them as a pattern. |
tag is all, less is more. 👍 |
Would the Influx 1.x line-protocol still be supported whenever InfluxQL 2.0 comes around? I'm asking this because I see you define Influx 1.x compatible measurements using |
@phemmer thanks for the detailed use case. Maybe some sort of transpose functionality would fix the issue you're talking about. I see this more as a result format issue than anything else. The Kapacitor use case is definitely something we have to support. And the query engine will need to be able to stream partial results from many streams and combine them before getting the entire stream. |
@panovodv for sure, I think one of the advantages of this functional style is that we can always add new functions to the language and some of those could be shortcuts that combine commonly chained primitives. |
@jaapz the special identifiers are there explicitly to keep support for the 1.x line protocol. That line protocol would be translated into the |
Questions about proposed query language:
Would you support |
@VladRassokhin For your examples, |
@pauldix Thank you for clarification. Could you consider split string literals into two classes by 'multi-line' support? E.g. support them only in Another idea is to use some other syntax to define key identifiers, as it's easy to forget which quotation symbol should be used where. Like in example Anyway thank you for reading that and improving existing language :) |
I'm confused as to why you'd even want a different quotation mark for key identifiers and string literals, apart from it being an implementation detail of the query parser. The user is probably only going to get confused by this (I know I was) |
I am in full support for having a syntax which separates string literals from measurement names, tags and field names as suggested by @VladRassokhin E.g. in grafana its so easy to use template variable is |
I was thinking about this proposal and I really like it. One thing that we're contemplating in Skytap is having some kind of influxdb federation. We run public cloud and have separate metrics databases per each geographical region to keep them isolated in case of failure. At the same time we are interested in running queries across all regions. I feel like with this language one could specify not only database to query, but host as well. After that language processor could optimize which function to run remotely before aggregation between hosts, and which have to be run centrally on federating host. |
doc/influxql-2.md
Outdated
// Arguments: | ||
// start - a time. Could be relative e.g. `-4h` or an epoch, or a time string | ||
// end - a time | ||
// count - an integer. Get this number of values either from the passed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This count parameter seems like simply a limit condition. Should we remove the parameter and then chain the limit
function? Or does a count in range do something more than a limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it's just a limit. Limit plus the ability to order by asc or desc will cover it. Which reminds me, should order be an argument to range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking is if is just a limit then remove the parameter and use limit. Composibility of simple units has gotten us far with Kapacitor and I want to keep up that general design.
So to range + limit + order by is:
foo.range(start:-4h).limit(limit:100).orderBy(order:"asc")
@panovodv Brings up an interesting point above, asking if we can create higher order functions within the language. This way you can simplify common long chains. Maybe something like this:
WARNING: This opens a lot more questions than it answers
func rangeLimitOffset(start duration, limit int , offset int, order string) {
$.range(start:start)
.limit(limit:limit, offset:offset)
.orderBy(order: order)
}
// Now that the function has been defined in the script we can use it as short hand like
foo.rangeLimitOrder(start:-4h, limit:100, order:"asc")
This has been requested before for TICKscript and in fact deadman
within TICKscript is a hardcoded version of this.
I see a few ways to go about this:
- Overload the behavior of the various functions to they perform common operations
- Require functions to do one thing only, and then you must chain functions to get higher order behavior. Then hardcode various common chains as functions of their own.
- Same as 2 but allow the higher order functions to be defined within the language itself.
- Same as 2 but always require common chains to be explicitly written out.
Thoughts? Option 3 obviously provides the most flexibility but possibly too much flexibility in that it becomes overly difficult to read/follow a query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm completely in favor of atomic primitives that can be chained for more complex functionality. We'll definitely want shortcuts for common things in the future. We could also make it something by convention. For example, function names and parameter names are always camel cased. We could use _
to separate and combine. For example:
// chained
select().range(start: -4h).limit(n:10).order(by:"asc")
// combined
select(range_start: -4h, limit_n: 10, order_by: "asc")
doc/influxql-2.md
Outdated
// period - windows are for this duration. If only every or period is specified, | ||
// the other is set to the same value by default. If the every duration | ||
// is less than the period, windows will overlap. | ||
// count - number of windows (points) desired. Will adjust bucket durations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this kind of dynamic bucket duration but fixed count window size been requested before?
Kapacitor never received a request for that kind of windowing behavior.
But Kapacitor does support everyCount, periodCount which indicate that the windows should be constructed on the counts of points instead of their times.
Should we add everyCount
and periodCount
parameters and remove the count
parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is coming from Chronograf/Grafana where you the highest resolution possible of aggregated data, but are limited to the number of pixels (points) you have on a graph. For example, suppose I have 24px and I query the last 24 hours worth of data, if data is aggregated in with a group by
of less than 1h
I won't have enough pixels to display the graph. So instead of saying that I want to group things into 1h
buckets, I'd like to be able to say, I want a total of 24
points back, you pick the the size of the group by
interval so that I'll get at most that many points. This way the time range I query for has no impact on the group by
interval that I choose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, ok and why can't Chronograf/Grafana just compute interval = duration / num_pixels
i.e 24h / 24px = 1h/px
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can, in fact that's what grafana currently does (not sure about chronograf). I think the reason for putting it directly into the QL is that it is such a common query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@desa is right, this feature was here for people that wanted x number of windows so that aggregations on those windows would result in that number of summary points. Primarily for graphing libraries. Having it here vs. having it in the client library is advantageous for later when we have automatic downsampling (like Graphite). That way instead of having the client pick the window size, the database can do so based on what downsamples are available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does specifying a count allow the query engine to pick a useful window size? Isn't specifying the count just specifying a window size via an inverse relation instead of directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nathanielc ok, right, count isn't a good thing to call it. For the example I was giving it would be, return no more than count
number of windows. The later optimizations would be if there is a trailing aggregation function that can use some already pre-computed downsamples so that windows/aggregation don't need to happen. For instance if you're looking at 4 hours of data and ask for no more than 30 data points and called sum after, but you happened to have 10m sum summaries already available, just return those 24 10m sum summaries. I'm probably overthinking all this. Probably best to remove count
for now and revisit this when we add in the "intelligent rollups"/auto downsampling functionality.
doc/influxql-2.md
Outdated
``` | ||
|
||
#### Time Shift | ||
`timeShift` will shift all returned timestamps in the matrix by the given duration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be renamed to simply shift
since we are in a time domain? Or do we want the explicit name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I think of shift
I think of it in terms of array positions/indexes. timeShift
is a bit different. Although I'm not sure that we'd ever have the array style shift
in the language so we could shorten it. What you think?
doc/influxql-2.md
Outdated
```javascript | ||
// Accepts: a matrix | ||
// Arguments: | ||
// precision - a duration for what the rate should be. per second, per minute, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this parameter be named unit
instead of precision? Its talking about the denominator of the rate calculation not necessarily the precision of the calculation.
Assuming that rate
is essentially what InfluxQL calls derivative
currently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, unit
works.
doc/influxql-2.md
Outdated
|
||
We see that there are now columns that mark the start and end of each window. This is meant as a temporary stage before calling some sort of aggregate or selector function to be applied to each cell. For example if we had called `foo.window(every:20s).sum()` we would have received the following result: | ||
|
||
Series | 10 | 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The time used to index the column after a windowing aggregation should be the upper bound of the window not the lower bound of the window.
In my opinion it makes it read better since you will get times that match closer to the current time. For example if you are windowing by 1m, you get a result back for the current minute instead of the previous minute. This gets more pronounced as the time scale increases. Like windowing by 30d you get a point for today which represents the last 30 days of data, instead of getting a point at now() - 30d which is the "current" value.
Secondly when you start to use overlapping windows it makes more sense to reference the window buckets by their max time, since their minimum time may be significantly far in the past. For example in Kapacitor I have seen and performed workloads myself where I was windowing with every=10d and period=Infinity. So you get rolling windows every 10d that include all known historical data. The use case where this comes up frequently is in forecasting, you want to use all historical data you have to forecast but are rolling forward in time. Clearly referencing the windows by their lower bound in that case will cause issues.
In short Kapacitor has used max time for windows from the beginning for these reasons and I think the new system should as well.
Finally see #8213 for more discussion on this topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me. Maybe there should be an argument to all aggregation functions that lets the user pick which is used? Default of using max time is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want the ability to change the behavior of all aggregate functions via an options like syntax?
Meaning it can be cumbersome and error prone to have to specify and optional argument to all aggregation functions because you need to use the min time bound.
Maybe something like this at the top of a script to set defaults for just that script:
option time_bound=min
This has uses beyond just the time bound. For example the user may want to specify how out of order data is handled. The options are to drop it, wait for it for X amount of time (which delays the result), or recompute results when old data arrives.
This could be set via an options syntax.
option out_of_order=drop
option out_of_order=wait 5m
option out_of_order=recompute
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the option
syntax. Although maybe that would be in addition to having those things as individual arguments in the aggregate functions? So they could mix both or if they're only calling a single aggregate they can do it with an argument and not the option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, options should be in addition to the individual arguments. That way that can be changed on a case by case basis.
Using the top level option allows you to change global behavior consistently since all arguments are optional in individual functions.
All, I have updated the proposed spec will many things that we have been learning as we experiment. |
when can this new query language be realized? |
@opsnull we're releasing a prototype of it on Tuesday. It won't have everything, but it'll be something to start playing with against the OSS InfluxDB 1.4 release. Blog post then, feedback welcome! |
I suggest to update the proposed specs with the current IFQL version (0.0.3). In the PR: In the IFQL 0.0.3 Readme : In the PR: In the IFQL 0.0.3 Readme : In the PR: In the IFQL 0.0.3 Readme Can you explain what skew does ? It's in the IFQL 0.0.3 Readme. I suggest to add the functions that are int the IFQL 0.0.3 Readme, such as spread, filter. Is there any difference between : Please explain the op argument in the .merge function. I’m guessing it means operation. I see it used with append and count. I understand count but what does append do ? There's an error in the result of the query baz.merge(keys:["metric", "dc"]). The last line with dc=east should be in the first block, unless there's something I don't understand. The where example is missing a closing parenthesis. Also, the text says to use single quotes for tag keys but the sample uses double quotes. There's a few missing aggregator functions. integral is especially useful. Please explain why there's a .group and a .merge. What's the difference? I like that it gives more flexibility than InfluxQL. An issue in the current InfluxQL is that in a single query you can't do math between two fields that have mutually exclusive WHERE clauses. Such as (SELECT "Value" FROM ... WHERE host='A') - (SELECT "Value" FROM ... WHERE host='B'). With the new specs, how would I subtract the values of host B from the values of host A ? |
Thanks for the notes @samaust. We'll try to get this updated soon. Actually, we'll probably close this out pretty soon and instead put up docs in the IFQL repo. There are still a few things in flight that we're debating internally, but I think we're closing in on what will be the initial spec. Once it's a bit more firm we'll have docs up. Probably first or second week of January. In the meantime, can you log specific issues in https://github.com/influxdata/ifql/issues |
@pauldix I'm going to close this since I think ifql is the official creation/work in progress of this proposal. If you want to reopen it and merge it to this repository, we can, but I want to get our open PR count down so it's easier to focus on what we need to review/merge. |
Here are some initial ideas for a new query language that will combine InfluxDB and Kapacitor's language into one. It isn't TICKscript and it isn't InfluxQL's SQL style language. We're looking for feedback from UI builders and users on the look/feel of the language and if the functionality listed covers their use cases.
If you have example queries from InfluxQL that you'd like to see translated into this new style, please post. Or if you have query feature requests that currently don't exist, please add those to the comments here as well.