Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create InfluxQL 2.0 initial proposal #8591

Closed
wants to merge 2 commits into from
Closed

Create InfluxQL 2.0 initial proposal #8591

wants to merge 2 commits into from

Conversation

pauldix
Copy link
Member

@pauldix pauldix commented Jul 12, 2017

Here are some initial ideas for a new query language that will combine InfluxDB and Kapacitor's language into one. It isn't TICKscript and it isn't InfluxQL's SQL style language. We're looking for feedback from UI builders and users on the look/feel of the language and if the functionality listed covers their use cases.

If you have example queries from InfluxQL that you'd like to see translated into this new style, please post. Or if you have query feature requests that currently don't exist, please add those to the comments here as well.

@pauldix pauldix added the review label Jul 12, 2017
node_cpu{cpu="cpu0",mode="user"} 182168.46875

# Influx 2.0
_metric:node_cpu,cpu:cpu0,mode:user 182168.46875 1491675816
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_metric or metric (as on lines 133-136)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DanCech given the _ having special meaning _metric makes sense.

the special _metric tag key


There are a set of tag keys used by the system that start with `_` that are well known. Some of them are for conversions from Influx 1.x or Prometheus while others are for information useful in graphing.

* **_units** - the units of the measurement -
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this plural?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the units are always plural. You'd say bytes, not byte or seconds, not second.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of this tag would be a single unit of measurement.

@DanCech
Copy link

DanCech commented Jul 12, 2017

This looks really interesting @pauldix, chaining functions together definitely makes things easier to follow as the data flows through the pipeline. The flexibility of using variables to build up scripts is definitely going to offer a lot of power, but will certainly create some challenges on the UI side.

Copy link
Contributor

@mvadu mvadu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarifications for few proposed changes.


* **tag** - a key/value pair of strings. (e.g. "region":"host" or "building":"2"). Tags MUST NOT start with `#`. Tag keys starting with `_` have special meaning and should not be used by the user unless they conform to the predefined set of keys that use `_`.
* **system tag** - A tag with a key that starts with `_` followed by letters and/or numbers. These are indexed just like tags, but are used by the system for compatibility layers with other systems.
* **meta** - A JSON style object associated with a series. This data is not indexed and can only be included/joined in a query. Its purpose is to add more context to series for user interfaces and additional calculations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta is the meta for a database or for a series? sorry to jump the gun if its explained below.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like series to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's meta for the series, but after talking to some people I'm going to pull this out of the query language or line protocol. We'll offer other APIs for associating non-indexed metadata with tags and other stuff.


### Definitions

The following definitions encompass the entire vocabulary of the data model. Where appropriate, JSON examples are given, but the protocol will also support a line or protobuf based structure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO: Since we moved a long way from JSON format for writes, i think the example should show how current line protocol command will looks like in target state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll be working out the result format in the next four weeks or so. Will add more detail when we have it.

node_cpu{cpu="cpu0",mode="user"} 182168.46875

# Influx 2.0
_metric:node_cpu,cpu:cpu0,mode:user 182168.46875 1491675816
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DanCech given the _ having special meaning _metric makes sense.

the special _metric tag key


* **_units** - the units of the measurement -
* **_type** - the type of the series - [int64, uint64, float64, bool, string]
* **_samplingRate** - duration string for how frequently this series is expected to be sampled. `0` indicates that it is an event stream (irregular series)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we call it _samplingInterval? Sampling rate is assumed to be number of samples in one second for people with electronics background.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, interval is definitely the correct term

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call

*TODO:* give this more definition.

Meta data can be written with a line starting with `##`. The meta data will be associated with the series on the next line. Meta data is a JSON object. Keys in the object starting with `_` are reserved for the system. The data in meta is not indexed. The only validation is that it is a parseable JSON object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to add just meta for an already existing series. So the write protocol should support it. Thinking out loud, if i have a IOT stream from various devices, each giving out a device name as a tag, I should be able to add meta data to group them, and define that they are.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd want to use tags for that, since those would be attributes you'd be likely to query by, so I'd be looking at either making them smart enough to send those tags natively or using an "enrichment" layer in to add them as part of the ingest pipeline.


#### Keys
`keys` will return an array of tag keys that occur in the matrix. The output of keys function can be used as the predecessor argument to `empty` or `not empty` in a `where` expression. For pagination through keys, `limit` can be chained off it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the Keys be limited to a window? One major demand with current InfluxQL is to have Show Tag Keys to support a where time clause.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For that you'd have to have a range, but it might not have the best performance. For instance: select(database:"foo").range(start:-4h).last().keys()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, but its vital for proper grafana dashboard with template variable. e.g. If the access log is fed to Influxdb, visualizing the traffic pattern would be much easier if the URL is a tag, and that tag keys should be restricted to the time window. Current Influx shows all keys, which makes the grafana drop down a mess.

.sum()
.interpolate(start:-30m,value:0)
.join(keys:["host"],exp:
{('metric'."errors".float()/'metric'."requests" * 100).float(fixed:1) AS 'metric'."error_rate"})
Copy link

@ghost ghost Jul 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if errors = 10 and requests = 0? 10/0? null, 0.0, none, "", undefined behavior, a failed query?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll probably want to give the user options to define what they want. Return error, set to zero and ignore, or drop the value entirely (null)

@phemmer
Copy link
Contributor

phemmer commented Jul 16, 2017

Personally, I really dislike the new design. Primarily because it makes it extremely difficult to work with a single "point" with multiple "field"s (to use the term definitions in InfluxQL 1).

For graphing UIs (grafana & chronograf) this change makes sense as they are graphing individual fields (or a "series" in InfluxQL 2 terms). Even if you have multiple fields on a single graph, they're still treated as separate data sets.

However for non-graphing use cases, where a client is querying the data to perform calculations and other analytics on it, it now becomes extremely difficult to work with. Now if I want to query these multiple fields over a range of time, I have to join them together manually. Even the proposed join expression doesn't help as it still emits a single "series" (as defined in InfluxQL 2). So essentially I would have to query for the 'foo' field and buffer the results, query the 'bar' field and buffer the results, and then iterate over both of them creating points where the timestamps match. I would also have to handle where the either field could be not present (null) for a given timestamp, which makes efficient iteration difficult. And as indicated, each series has to be buffered, which might be a problem when working with a large time range and/or number of fields.

Not my main use case, but a simple one which exemplifies the issue, is a kapacitor alert on disk space. Lets say I have 2 fields, used and total. I want to alert when usage is >= 80%. In the alert that gets sent out I want to include the raw used & total values (80% of a 100MB disk is a lot different from 80% of a 100TB disk). With InfluxQL 2, this is very difficult as the join of the used, total, and percent values into a single point has to be performed client side.

@allen13
Copy link

allen13 commented Jul 17, 2017

I really like the new language. It reminds me of ReQL from Rethinkdb. It makes it possible to create really nice client side query builders like this https://github.com/GoRethink/gorethink.

@panovodv
Copy link

I wonder if it would be possible to create custom functions that would be combination of functions provided by influxdb. For example, Graphite's join is some combination of new influxdb's interpolate+join and I can see users using them together a lot. I also imagine that there are usage patterns that are very common for certain metrics in certain organization that are uncommon otherwise. It would be nice to gather those and declare them as a pattern.

@oiooj
Copy link
Contributor

oiooj commented Jul 18, 2017

tag is all, less is more. 👍

@jaapz
Copy link

jaapz commented Jul 19, 2017

Would the Influx 1.x line-protocol still be supported whenever InfluxQL 2.0 comes around? I'm asking this because I see you define Influx 1.x compatible measurements using _measurement and _field. I think introducing these "special" identifiers instead of keeping support for 1.x line-protocol would mean a bit of a painful bump for users when eventually moving to InfluxQL 2.0.

@pauldix
Copy link
Member Author

pauldix commented Jul 20, 2017

@phemmer thanks for the detailed use case. Maybe some sort of transpose functionality would fix the issue you're talking about. I see this more as a result format issue than anything else. The Kapacitor use case is definitely something we have to support. And the query engine will need to be able to stream partial results from many streams and combine them before getting the entire stream.

@pauldix
Copy link
Member Author

pauldix commented Jul 20, 2017

@panovodv for sure, I think one of the advantages of this functional style is that we can always add new functions to the language and some of those could be shortcuts that combine commonly chained primitives.

@pauldix
Copy link
Member Author

pauldix commented Jul 20, 2017

@jaapz the special identifiers are there explicitly to keep support for the 1.x line protocol. That line protocol would be translated into the _measurement and _field tag keys so that it can be queried in the QL 2.0 language. The idea is that there would be no "flip the switch changeover". Give users time to use both at the same time.

@VladRassokhin
Copy link

Questions about proposed query language:
What is the difference between following elements:

'StringA'
"StringB"
`StringC`
/StringD/

Would you support '''StringE''' from TICKscript?
Asking because I'm trying to create IDE plugin for TICKScript and all those ways to define strings (with multilines support in all of them) makes me sad.

@pauldix
Copy link
Member Author

pauldix commented Jul 23, 2017

@VladRassokhin For your examples, StringA is a key identifier, B & C are both string literals and D is a regular expression.

@VladRassokhin
Copy link

@pauldix Thank you for clarification.

Could you consider split string literals into two classes by 'multi-line' support? E.g. support them only in '''A''' string literals, because unfinished quote could match almost everything until end of file and it's sometimes hard to detect on which line missing symbol should be added.
I've experienced similar problem with HCL language which is used in Terraform, and developers agreed to disallow line separation in string literals.
Like in many programmer languages (Groovy, Kotlin, Python, Scala) it's good decision to separate string literals with multi-line support into separate syntax.

Another idea is to use some other syntax to define key identifiers, as it's easy to forget which quotation symbol should be used where. Like in example 'metric'."errors".float()/'metric'."requests" * 100, one could use "metric" instead and will find out that only in runtime. If key identifiers cannot contain special symbols and space some syntax like #metric, :metric, ::metric, @metric, or $metric could be used. If spaces are allowed there, then ok, there's no better syntax then existing one, though ` and ' could be swapped. My point is that in some other scripting languages single and double quoted strings usually have similar meaning, but I'm not an expert in language design.

Anyway thank you for reading that and improving existing language :)

@jaapz
Copy link

jaapz commented Jul 26, 2017

I'm confused as to why you'd even want a different quotation mark for key identifiers and string literals, apart from it being an implementation detail of the query parser. The user is probably only going to get confused by this (I know I was)

@mvadu
Copy link
Contributor

mvadu commented Jul 27, 2017

I am in full support for having a syntax which separates string literals from measurement names, tags and field names as suggested by @VladRassokhin E.g. in grafana its so easy to use template variable is $templateVariable$ and tag returned by Influx is $tag_<tagName> so easy to read and no place for confusion. Current Influx does not even raise alerts when you mess up with single and double quotes.

@panovodv
Copy link

panovodv commented Aug 1, 2017

I was thinking about this proposal and I really like it. One thing that we're contemplating in Skytap is having some kind of influxdb federation. We run public cloud and have separate metrics databases per each geographical region to keep them isolated in case of failure. At the same time we are interested in running queries across all regions.

I feel like with this language one could specify not only database to query, but host as well. After that language processor could optimize which function to run remotely before aggregation between hosts, and which have to be run centrally on federating host.

// Arguments:
// start - a time. Could be relative e.g. `-4h` or an epoch, or a time string
// end - a time
// count - an integer. Get this number of values either from the passed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This count parameter seems like simply a limit condition. Should we remove the parameter and then chain the limit function? Or does a count in range do something more than a limit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's just a limit. Limit plus the ability to order by asc or desc will cover it. Which reminds me, should order be an argument to range?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is if is just a limit then remove the parameter and use limit. Composibility of simple units has gotten us far with Kapacitor and I want to keep up that general design.

So to range + limit + order by is:

foo.range(start:-4h).limit(limit:100).orderBy(order:"asc")

@panovodv Brings up an interesting point above, asking if we can create higher order functions within the language. This way you can simplify common long chains. Maybe something like this:

WARNING: This opens a lot more questions than it answers

func rangeLimitOffset(start duration, limit int , offset int, order string) {
     $.range(start:start)
     .limit(limit:limit, offset:offset)
     .orderBy(order: order)
}

// Now that the function has been defined in the script we can use it as short hand like

foo.rangeLimitOrder(start:-4h, limit:100, order:"asc")

This has been requested before for TICKscript and in fact deadman within TICKscript is a hardcoded version of this.

I see a few ways to go about this:

  1. Overload the behavior of the various functions to they perform common operations
  2. Require functions to do one thing only, and then you must chain functions to get higher order behavior. Then hardcode various common chains as functions of their own.
  3. Same as 2 but allow the higher order functions to be defined within the language itself.
  4. Same as 2 but always require common chains to be explicitly written out.

Thoughts? Option 3 obviously provides the most flexibility but possibly too much flexibility in that it becomes overly difficult to read/follow a query.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm completely in favor of atomic primitives that can be chained for more complex functionality. We'll definitely want shortcuts for common things in the future. We could also make it something by convention. For example, function names and parameter names are always camel cased. We could use _ to separate and combine. For example:

// chained
select().range(start: -4h).limit(n:10).order(by:"asc")
// combined
select(range_start: -4h, limit_n: 10, order_by: "asc")

// period - windows are for this duration. If only every or period is specified,
// the other is set to the same value by default. If the every duration
// is less than the period, windows will overlap.
// count - number of windows (points) desired. Will adjust bucket durations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this kind of dynamic bucket duration but fixed count window size been requested before?
Kapacitor never received a request for that kind of windowing behavior.

But Kapacitor does support everyCount, periodCount which indicate that the windows should be constructed on the counts of points instead of their times.

Should we add everyCount and periodCount parameters and remove the count parameter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is coming from Chronograf/Grafana where you the highest resolution possible of aggregated data, but are limited to the number of pixels (points) you have on a graph. For example, suppose I have 24px and I query the last 24 hours worth of data, if data is aggregated in with a group by of less than 1h I won't have enough pixels to display the graph. So instead of saying that I want to group things into 1h buckets, I'd like to be able to say, I want a total of 24 points back, you pick the the size of the group by interval so that I'll get at most that many points. This way the time range I query for has no impact on the group by interval that I choose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, ok and why can't Chronograf/Grafana just compute interval = duration / num_pixels i.e 24h / 24px = 1h/px

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can, in fact that's what grafana currently does (not sure about chronograf). I think the reason for putting it directly into the QL is that it is such a common query.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@desa is right, this feature was here for people that wanted x number of windows so that aggregations on those windows would result in that number of summary points. Primarily for graphing libraries. Having it here vs. having it in the client library is advantageous for later when we have automatic downsampling (like Graphite). That way instead of having the client pick the window size, the database can do so based on what downsamples are available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does specifying a count allow the query engine to pick a useful window size? Isn't specifying the count just specifying a window size via an inverse relation instead of directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nathanielc ok, right, count isn't a good thing to call it. For the example I was giving it would be, return no more than count number of windows. The later optimizations would be if there is a trailing aggregation function that can use some already pre-computed downsamples so that windows/aggregation don't need to happen. For instance if you're looking at 4 hours of data and ask for no more than 30 data points and called sum after, but you happened to have 10m sum summaries already available, just return those 24 10m sum summaries. I'm probably overthinking all this. Probably best to remove count for now and revisit this when we add in the "intelligent rollups"/auto downsampling functionality.

```

#### Time Shift
`timeShift` will shift all returned timestamps in the matrix by the given duration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be renamed to simply shift since we are in a time domain? Or do we want the explicit name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I think of shift I think of it in terms of array positions/indexes. timeShift is a bit different. Although I'm not sure that we'd ever have the array style shift in the language so we could shorten it. What you think?

```javascript
// Accepts: a matrix
// Arguments:
// precision - a duration for what the rate should be. per second, per minute, etc.
Copy link
Contributor

@nathanielc nathanielc Aug 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this parameter be named unit instead of precision? Its talking about the denominator of the rate calculation not necessarily the precision of the calculation.

Assuming that rate is essentially what InfluxQL calls derivative currently?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, unit works.


We see that there are now columns that mark the start and end of each window. This is meant as a temporary stage before calling some sort of aggregate or selector function to be applied to each cell. For example if we had called `foo.window(every:20s).sum()` we would have received the following result:

Series | 10 | 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time used to index the column after a windowing aggregation should be the upper bound of the window not the lower bound of the window.

In my opinion it makes it read better since you will get times that match closer to the current time. For example if you are windowing by 1m, you get a result back for the current minute instead of the previous minute. This gets more pronounced as the time scale increases. Like windowing by 30d you get a point for today which represents the last 30 days of data, instead of getting a point at now() - 30d which is the "current" value.

Secondly when you start to use overlapping windows it makes more sense to reference the window buckets by their max time, since their minimum time may be significantly far in the past. For example in Kapacitor I have seen and performed workloads myself where I was windowing with every=10d and period=Infinity. So you get rolling windows every 10d that include all known historical data. The use case where this comes up frequently is in forecasting, you want to use all historical data you have to forecast but are rolling forward in time. Clearly referencing the windows by their lower bound in that case will cause issues.

In short Kapacitor has used max time for windows from the beginning for these reasons and I think the new system should as well.

Finally see #8213 for more discussion on this topic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. Maybe there should be an argument to all aggregation functions that lets the user pick which is used? Default of using max time is good.

Copy link
Contributor

@nathanielc nathanielc Aug 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the ability to change the behavior of all aggregate functions via an options like syntax?
Meaning it can be cumbersome and error prone to have to specify and optional argument to all aggregation functions because you need to use the min time bound.

Maybe something like this at the top of a script to set defaults for just that script:

option time_bound=min

This has uses beyond just the time bound. For example the user may want to specify how out of order data is handled. The options are to drop it, wait for it for X amount of time (which delays the result), or recompute results when old data arrives.

This could be set via an options syntax.

option out_of_order=drop
option out_of_order=wait 5m
option out_of_order=recompute

Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the option syntax. Although maybe that would be in addition to having those things as individual arguments in the aggregate functions? So they could mix both or if they're only calling a single aggregate they can do it with an argument and not the option?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, options should be in addition to the individual arguments. That way that can be changed on a case by case basis.

Using the top level option allows you to change global behavior consistently since all arguments are optional in individual functions.

@nathanielc
Copy link
Contributor

nathanielc commented Sep 20, 2017

All, I have updated the proposed spec will many things that we have been learning as we experiment.
Please feel to leave your thoughts and comments.

@opsnull
Copy link

opsnull commented Nov 11, 2017

when can this new query language be realized?

@pauldix
Copy link
Member Author

pauldix commented Nov 11, 2017

@opsnull we're releasing a prototype of it on Tuesday. It won't have everything, but it'll be something to start playing with against the OSS InfluxDB 1.4 release. Blog post then, feedback welcome!

@samaust
Copy link

samaust commented Dec 15, 2017

I suggest to update the proposed specs with the current IFQL version (0.0.3).

In the PR:
.group(keys:["tag_a", "tag_b"])
.merge(keys:["tag_a", "tag_b"])

In the IFQL 0.0.3 Readme :
.group(by: ["tag_a", "tag_b"])

In the PR:
select(database:"foo")

In the IFQL 0.0.3 Readme :
from(db:"foo")

In the PR:
.join(keys:["host"], exp:{$ + cpu})

In the IFQL 0.0.3 Readme
.join(on:["host"], eval:{$ + cpu})
I'm not sure if eval or exp is best.

Can you explain what skew does ? It's in the IFQL 0.0.3 Readme.

I suggest to add the functions that are int the IFQL 0.0.3 Readme, such as spread, filter.

Is there any difference between :
.merge(keys:["host"],op:"count")
and
.merge(keys:["host"])
.count()
Same result ? Same performance ?

Please explain the op argument in the .merge function. I’m guessing it means operation. I see it used with append and count. I understand count but what does append do ?

There's an error in the result of the query baz.merge(keys:["metric", "dc"]). The last line with dc=east should be in the first block, unless there's something I don't understand.

The where example is missing a closing parenthesis. Also, the text says to use single quotes for tag keys but the sample uses double quotes.

There's a few missing aggregator functions. integral is especially useful.
I see that percentile is in aggregator function instead of selector as in InfluxQL. Is that correct ? Does that mean it interpolates the percentile instead of selecting an existing value ?

Please explain why there's a .group and a .merge. What's the difference?

I like that it gives more flexibility than InfluxQL. An issue in the current InfluxQL is that in a single query you can't do math between two fields that have mutually exclusive WHERE clauses. Such as (SELECT "Value" FROM ... WHERE host='A') - (SELECT "Value" FROM ... WHERE host='B'). With the new specs, how would I subtract the values of host B from the values of host A ?

@pauldix
Copy link
Member Author

pauldix commented Dec 15, 2017

Thanks for the notes @samaust. We'll try to get this updated soon. Actually, we'll probably close this out pretty soon and instead put up docs in the IFQL repo. There are still a few things in flight that we're debating internally, but I think we're closing in on what will be the initial spec. Once it's a bit more firm we'll have docs up. Probably first or second week of January.

In the meantime, can you log specific issues in https://github.com/influxdata/ifql/issues

@jsternberg
Copy link
Contributor

@pauldix I'm going to close this since I think ifql is the official creation/work in progress of this proposal. If you want to reopen it and merge it to this repository, we can, but I want to get our open PR count down so it's easier to focus on what we need to review/merge.

@jsternberg jsternberg closed this Apr 23, 2018
@ghost ghost removed the review label Apr 23, 2018
@pauldix pauldix deleted the pd-influxql2 branch January 3, 2019 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.