Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field stats: added index_constraint option #11259

Merged

Conversation

martijnvg
Copy link
Member

Field stats index constraints allows to omit all field stats for indices that don't match with the index constraint. An index constraint can exclude indices' field stats based on a field's min_value and max_value statistics.

For example index constraints can be useful to find out the min and max value of a particular property of your data in a time based scenario. The following request only returns field stats for the answer_count property for indices holding questions created in the year 2014:

curl -XPOST 'http://localhost:9200/_field_stats?level=indices' -d '{
   "fields" : ["answer_count"]
   "index_constraints" : {
      "creation_date" : {
         "min_value" : {
            "gte" : "2014-01-01T00:00:00.000Z",
         },
         "max_value" : {
            "lt" : "2015-01-01T00:00:00.000Z"
         }
      }
   }
}'

PR for #11187

@jpountz
Copy link
Contributor

jpountz commented May 21, 2015

I'm personally a bit confused by the API. For instance the documentation says The following request only returns field stats for indices that have_timestampdate values between the defined range. but if I read the code correctly, an index that only has two values: one that is less than the minimum value and another one that is greater than the max value, then it would match although it has no values in the range?

Instead, maybe the API should apply to the min/max values instead of the field themselves to be clearer about what it does? Eg.

curl -XPOST 'http://localhost:9200/_field_stats?level=indices' -d '{
   "fields" : {
      "_timestamp.max" : {
         "gte" : "2014-01-01T00:00:00.000Z"
      },
      "_timestamp.min" : {
         "lt" : "2015-01-01T00:00:00.000Z"
      }
   }
}'

@jpountz
Copy link
Contributor

jpountz commented May 21, 2015

My above comment does not necessarily mean I think it's a bug, but I wanted to raise the corner case so that our API gurus like @clintongormley could help figure out if the API is right.

@martijnvg
Copy link
Member Author

@jpountz @clintongormley The field stats filtering doesn't really behave like a filter on docs in an index. It is rather more of an overlapping mechanism / range matching.

Lets say an index contains logs from 02-01-2015 to 03-01-2015. A range from 01-01-2015 to 04-01-2015 should include the field stats for that index in the field stats response.

So maybe we should name and document it differently?

@jpountz
Copy link
Contributor

jpountz commented May 21, 2015

+1 I would like it better if the API made it clearer this performs range overlapping - the current one feels more like it will return indices that have a value in the specified range, which is not what it does.

@clintongormley
Copy link
Contributor

I like @jpountz 's suggestion a lot - makes things much clearer

@martijnvg
Copy link
Member Author

I like to make a small change to the format @jpountz is suggesting:

curl -XPOST 'http://localhost:9200/_field_stats?level=indices' -d '{
   "fields" : {
      "_timestamp" : {
          "max" : {
              "gte" : "2014-01-01T00:00:00.000Z"
          },
          "min" : {
             "lt" : "2015-01-01T00:00:00.000Z"
          }
      }
   }
}'

By having on top level _timestamp element, it is clearer that we compute field stats for one field only.

@clintongormley
Copy link
Contributor

See #11187 (comment)

@s1monw s1monw added v1.6.1 and removed v1.6.0 labels Jun 3, 2015
@clintongormley clintongormley changed the title field stats: added stats filtering option Field stats: added stats filtering option Jun 7, 2015
@martijnvg martijnvg removed the v1.6.1 label Jun 17, 2015
@martijnvg martijnvg force-pushed the field_stats/feature/stats_filtering branch from 78745f3 to 7db5d58 Compare June 18, 2015 20:38
@martijnvg martijnvg changed the title Field stats: added stats filtering option Field stats: added index_constraint option Jun 18, 2015
@martijnvg
Copy link
Member Author

I've updated this PR based on the comments in #11187. Also the title and description have been updated to match index constraints instead of field stats filter.

case START_ARRAY:
if ("fields".equals(fieldName)) {
while ((token = parser.nextToken()) != XContentParser.Token.END_ARRAY) {
if (token.isValue()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean we will silently ignore other tokens?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes... instead lets fail if there is another token.

@martijnvg
Copy link
Member Author

@rjernst Thanks for looking at this, I've updated the PR.

@martijnvg martijnvg force-pushed the field_stats/feature/stats_filtering branch from d1fc3e9 to 718290a Compare June 19, 2015 08:28
@martijnvg
Copy link
Member Author

If there is no more feedback then I like to merge this PR in the next 48 hours.

@clintongormley
Copy link
Contributor

@jpountz @rjernst @bleskes any more feedback?

if (token.isValue()) {
fields.add(parser.text());
} else {
throw new IllegalArgumentException("Unknown token [" + token + "]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be "unexpected" rather than "unknown"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it should! I'll change it

@jpountz
Copy link
Contributor

jpountz commented Jun 30, 2015

I left some comments, otherwise LGTM. The one I'm most concerned about is what to do if a request has both a body and a fields parameter?

@martijnvg
Copy link
Member Author

On 30 June 2015 at 14:36, Adrien Grand notifications@github.com wrote:

I left some comments, otherwise LGTM. The one I'm most concerned about is
what to do if a request has both a body and a fields parameter?


Reply to this email directly or view it on GitHub
#11259 (comment)
.

good point, we should just fail the request.

@martijnvg martijnvg force-pushed the field_stats/feature/stats_filtering branch 3 times, most recently from a0843ff to dd91753 Compare July 1, 2015 06:46
Field stats index constraints allows to omit all field stats for indices that don't match with the constraint. An index
constraint can exclude indices' field stats based on the `min_value` and `max_value` statistic. This option is only
useful if the `level` option is set to `indices`.

For example index constraints can be useful to find out the min and max value of a particular property of your data in
a time based scenario. The following request only returns field stats for the `answer_count` property for indices
holding questions created in the year 2014:

curl -XPOST 'http://localhost:9200/_field_stats?level=indices' -d '{
   "fields" : ["answer_count"] <1>
   "index_constraints" : { <2>
      "creation_date" : { <3>
         "min_value" : { <4>
            "gte" : "2014-01-01T00:00:00.000Z",
         },
         "max_value" : {
            "lt" : "2015-01-01T00:00:00.000Z"
         }
      }
   }
}'

Closes elastic#11187
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Stats Statistics tracking and retrieval APIs >enhancement v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants