Efficiently search against wildcard indices regardless of underlying indexing strategy #4342

rashidkpc · 2015-06-26T20:49:14Z

Details about this change

This change will not break existing index patterns.

The original description

Elasticsearch 1.6 introduced the _field_stats API which will, for the first time, allow us to search for indices that contain fields within a given range. For example, we can search for indices that contain an @timestamp between X and Y.

It still needs one enhancement before we can utilize it: elastic/elasticsearch#11187

This means that users will no longer be required to roll their indices at UTC midnight, nor use date patterns at all. They can effectively name indices whatever they want. and Kibana can automatically optimize requests by firing a pre-flight request for indices. We might need to add some caching here, but it should greatly enhance usability.

Update: The implementation of the above enhancement is here: elastic/elasticsearch#11259

The text was updated successfully, but these errors were encountered:

w33ble · 2015-09-10T17:33:36Z

As noted in #4886, it would be useful to allow the user to specify the range for a given field and effectively tell Kibana how far back to look for matching indices.

rashidkpc · 2015-09-22T20:44:03Z

I wonder if it wouldn't make sense to just look all the way back, but do it in a stepped manner, with a progress bar?

pjcard · 2015-09-23T09:01:37Z

Hi, thanks for linking my issue. Regarding unlimited lookback, I wonder how far that would scale? Personally, I was thinking of adding a cron job to automatically update the mappings each day, I wonder if that might be another avenue to explore. My situation only occurred as I was unaware that there were new mappings needing indexing, hence it took me so long to update them that they went out of scope of the default lookback - ideally there would have been something to cause or prompt for an update within the default time period, rather than the default time period being bigger.

simianhacker · 2015-10-14T16:36:32Z

Make sure we cover the use cases in #2017

epixa · 2015-10-20T15:25:10Z

Since now there will only be one field that is affected by the time-based index checkbox, does anyone object to me moving that checkbox next to said field?

So this:

ruckc · 2015-10-23T17:23:55Z

Will we still have the ability to use timestamped indexes? Having timestamped indexes provides a trivial method to remove old data.

epixa · 2015-10-23T17:25:32Z

@ruckc Any reason you wouldn't just adjust the time range in kibana to not look at "old data"?

ruckc · 2015-10-23T17:27:44Z

@epixa We handle with a single node about 20-60gb indexed volume daily. The data looses relevancy extremely quickly (days), so we only keep at most a few days/week online depending on storage space available.

ruckc · 2015-10-23T17:31:00Z

@epixa even if the timestamped indexes were more of a workaround to lack of the field_stats API, at this point they are probably a feature to more people than just my organization who have built workflows taking advantage of them.

epixa · 2015-10-23T18:54:53Z

@ruckc It's possible that I'm misunderstanding, but it seems to me that using the field stats api will work for your workflow. For your scenario, there would probably even be a very minor performance gain with the new setup.

You could still maintain separate indexes following some sort of time-based convention. In fact, I'd say it's probably a good idea to continue doing so.

Consider this hypothetical scenario: you maintain some logstash data in daily indices, and you want to retrieve any data that happens to be stored in the last 7 days.

How it currently works with pattern-based naming convention

Index pattern: [logstash-]YYYY.MM.DD
Time field: @timestamp

A list of possible index names is generated:

logstash-2015-10-17
logstash-2015-10-18
logstash-2015-10-19
logstash-2015-10-20
logstash-2015-10-21
logstash-2015-10-22
logstash-2015-10-23

Kibana does a search against all 7 of those indexes regardless of whether they actually exist. Any non-existent index is just treated as empty.

How it will work

Index pattern: logstash-*
Time field: @timestamp

An index list is generated. You've deleted all but the last 4 indexes because the data is no longer useful to you, so only the 4 indexes that actually exist are included:

logstash-2015-10-20
logstash-2015-10-21
logstash-2015-10-22
logstash-2015-10-23

Kibana does a search on those indexes that are known to exist.

Under the new setup, the strategy you use to generate and name indexes is not necessarily directly coupled to how kibana queries them. Let's say you start pulling in twice as much data, around 100GB a day. You could start storing data in half-day indices (logstash-2015-10-24-am, logstash-2015-10-24-pm) and you wouldn't need to change anything within kibana itself. Kibana would be able to search against that new indexing strategy without any intervention.

Does this make sense? Am I understanding your workflow correctly?

ruckc · 2015-10-23T18:59:27Z

Yes that makes sense, and will work. I just wanted to ensure that Kibana would continue to support querying timestamped indexes as a whole.

epixa · 2015-10-23T19:00:34Z

@ruckc Definitely! The only requirement for querying them will be that they have some similarity in their naming convention that you can represent with a wildcard index pattern (eg logstash-*)

epixa · 2015-10-27T17:43:30Z

Many folks have expressed concern about the changes that will result from this ticket, so I wanted to spell out the implementation plan for this and provide a bit more detail about what these changes mean for time-based index patterns in Kibana.

Why?

Kibana is now smart enough to automatically determine which indices to search against based on your current specified time range for any wildcard index pattern. This means that any wildcard index pattern (e.g. logstash-*) that has a specific time field configured will automatically get the search optimizations that you used to only be able to get when you specified a time-based naming convention (e.g. [logstash-]YYYY.MM.DD.

This makes it easier to get up and running quickly with Kibana. A wildcard index pattern will now work for both small amounts of data and large amounts of data.

This also means that users can change their indexing strategies behind the scenes without having to create entirely new index patterns. For example, a user could change from having daily indexes to having hourly or even size-based indexes and their existing index pattern in Kibana will continue to work even when looking at a time range that spans the old and new indexes.

What is changing for 4.3?

All new and existing wildcard index patterns (e.g. logstash-*) that have a time field configured will have their searches optimized.

All new and existing index patterns created using a time-based naming convention (e.g. [logstash-]YYYY.MM.DD) will continue to work.

When creating a new index pattern, users will be discouraged from using time-based naming conventions via a deprecation warning on the form. Included along with the message will be a short description about how users can now use wildcard patterns to efficiently search against time-based indexes.

What is changing for 5.0?

The ability to use time-based naming conventions when creating new index patterns will be removed.

epixa · 2015-10-27T19:19:24Z

The last PR for this ticket just went into master.

mac3384 · 2016-03-01T10:23:06Z

This might not be the appropriate place to ask this question, but once the ability to use time-based naming convention on index is removed, what will be the best approach to deleted old data? As with the current approach, you can simply drop old indeces based on the date in their name. But if all my data resides in a single index, how will I be able to delete data older than X days/months/etc?? As using a curator will no longer works in this case. Am I correct?

epixa · 2016-03-01T16:29:51Z

@mac3384 You can (and probably should) still use time based indexing schemes for your data. Kibana is just now smart enough to intelligently query those indexes based on your currently selected time range for any wildcard index patterns you've created.

xande · 2016-03-24T19:38:25Z

@epixa, do you know what kind of algorithm Kibana is using to query only specific to time-range indices?

Would it also work with such kind of naming automatically: log_somethinghere_20160130 (i.e. date is expressed as YYYYMMDD)?

UPDATE:
I think I got it. Kibana narrows down the search using _field_stats for each of the indices?

epixa · 2016-03-24T20:18:27Z

@xande Your update is correct. Unless you specifically opt into the behavior when you create your index pattern, Kibana does not make any decisions about time ranges based on your index names. It uses the field_stats api to ask elasticsearch which indices have data in a given time range, and then it queries those indices specifically.

JeremyColton · 2016-07-26T09:17:13Z

Hi @epixa I am using ES 2.3.2 and Kibana 4.5. I am the only person using this ELK stack. My index pattern is 'logstash-*'. I didn't tick the ' Use event times to create index names [DEPRECATED]' checkbox. There is more text that says 'By default, searches against any time-based index pattern that contains a wildcard will automatically be expanded to query only the indices that contain data within the currently selected time range.'

When I query for 'today' in my dashboard, Kibana sends a request per visual to ES using epoch times for 'today', so this seems to fit the text above. But ES queries every single index it has (~3 months). I have the following entries in my elasticsearch.log per daily index: Eg -

[2016-07-26 08:40:27,877][DEBUG][action.search ] [Invisible Woman] [logstash-2016.06.11]

So this is an ES bug?

epixa · 2016-07-26T13:41:18Z

@JeremyColton I'm not completely sure of the underlying ES implementation, to be honest, but it might be. Are you able to get me information about the network requests that Kibana makes using the network tab in your dev tools?

JeremyColton · 2016-07-27T15:04:21Z

Kibana sent requests with epoch times for the last 24 hours.

However, I changed my index's shard number from 5 to 2 with no replicas.
I re-indexed my existing indices.
This problem then went away!

rashidkpc added release_note:enhancement elasticsearch 2.0 v4.3.0 labels Jun 26, 2015

rashidkpc mentioned this issue Jul 13, 2015

Tiered Index Patterns #4423

Closed

This was referenced Jul 22, 2015

Support wildcards in index patterns that use event times for index names #4465

Closed

Default to Use event times to create index names for time based index patterns #4472

Closed

rashidkpc removed the elasticsearch 2.0 label Jul 23, 2015

rashidkpc assigned spalger Jul 23, 2015

ppf2 mentioned this issue Jul 25, 2015

Discover issues numerous msearch queries against indices that do not exist for time patterned indices #4495

Closed

rashidkpc mentioned this issue Aug 3, 2015

Multiple indices in pattern #2017

Closed

rashidkpc mentioned this issue Aug 11, 2015

Unable to create Time-based index pattern with wildcard #4633

Closed

ppf2 mentioned this issue Aug 27, 2015

Ambiguous AuthorizationException #4777

Closed

rashidkpc unassigned spalger Aug 31, 2015

This was referenced Sep 10, 2015

Erroneous value for Date type in Discover #4891

Closed

Expose seting for "lookback" range for _field_stats API #4886

Closed

rashidkpc assigned epixa Sep 22, 2015

rashidkpc added v4.4.0 and removed v4.3.0 labels Oct 23, 2015

epixa added v4.3.0 PR sent and removed v4.4.0 labels Oct 23, 2015

martijnvg mentioned this issue Oct 27, 2015

Add field stats api elastic/elasticsearch#10523

Merged

epixa removed the PR sent label Oct 27, 2015

epixa changed the title ~~Deprecate timestamped indices, use _field_stats API~~ Efficiently search against wildcard indices regardless of what indexing strategy is used Oct 27, 2015

epixa changed the title ~~Efficiently search against wildcard indices regardless of what indexing strategy is used~~ Efficiently search against wildcard indices regardless of underlying indexing strategy Oct 27, 2015

epixa mentioned this issue Oct 27, 2015

Deprecation warning when creating interval patterns #5209

Merged

epixa closed this as completed Oct 27, 2015

tbragin mentioned this issue Nov 3, 2015

Kibana 4.3 documentation updates #5275

Closed

9 tasks

epixa mentioned this issue Nov 19, 2015

Default Logstash index pattern should be "[logstash-]YYYY.MM.DD", not "logstash-*" #5447

Closed

reedloden mentioned this issue Nov 26, 2015

Update index patterns to use wildcard elastic/beats-dashboards#36

Closed

scampi mentioned this issue May 27, 2016

Timepicker - All option #1723

Closed

lukas-vlcek mentioned this issue Aug 30, 2016

[DO_NOT_MERGE] Logging upgrade openshift/origin-aggregated-logging#208

Closed

diranged mentioned this issue Dec 6, 2016

Super slow Kibana UI on first load .. related to /_field_stats API call. #9386

Closed

pjcard mentioned this issue Feb 21, 2017

Timelion search ignores time range when choosing indices #10475

Closed

gavenkoa mentioned this issue Jun 19, 2017

Remove misleading information about deprecated/disabled time based index pattern. #12406

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficiently search against wildcard indices regardless of underlying indexing strategy #4342

Efficiently search against wildcard indices regardless of underlying indexing strategy #4342

rashidkpc commented Jun 26, 2015

w33ble commented Sep 10, 2015

rashidkpc commented Sep 22, 2015

pjcard commented Sep 23, 2015

simianhacker commented Oct 14, 2015

epixa commented Oct 20, 2015

ruckc commented Oct 23, 2015

epixa commented Oct 23, 2015

ruckc commented Oct 23, 2015

ruckc commented Oct 23, 2015

epixa commented Oct 23, 2015

ruckc commented Oct 23, 2015

epixa commented Oct 23, 2015

epixa commented Oct 27, 2015

epixa commented Oct 27, 2015

mac3384 commented Mar 1, 2016

epixa commented Mar 1, 2016

xande commented Mar 24, 2016

epixa commented Mar 24, 2016

JeremyColton commented Jul 26, 2016 •

edited

Loading

epixa commented Jul 26, 2016

JeremyColton commented Jul 27, 2016

Efficiently search against wildcard indices regardless of underlying indexing strategy #4342

Efficiently search against wildcard indices regardless of underlying indexing strategy #4342

Comments

rashidkpc commented Jun 26, 2015

Details about this change

The original description

w33ble commented Sep 10, 2015

rashidkpc commented Sep 22, 2015

pjcard commented Sep 23, 2015

simianhacker commented Oct 14, 2015

epixa commented Oct 20, 2015

ruckc commented Oct 23, 2015

epixa commented Oct 23, 2015

ruckc commented Oct 23, 2015

ruckc commented Oct 23, 2015

epixa commented Oct 23, 2015

How it currently works with pattern-based naming convention

How it will work

ruckc commented Oct 23, 2015

epixa commented Oct 23, 2015

epixa commented Oct 27, 2015

Why?

What is changing for 4.3?

What is changing for 5.0?

epixa commented Oct 27, 2015

mac3384 commented Mar 1, 2016

epixa commented Mar 1, 2016

xande commented Mar 24, 2016

epixa commented Mar 24, 2016

JeremyColton commented Jul 26, 2016 • edited Loading

epixa commented Jul 26, 2016

JeremyColton commented Jul 27, 2016

JeremyColton commented Jul 26, 2016 •

edited

Loading