-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficiently search against wildcard indices regardless of underlying indexing strategy #4342
Comments
As noted in #4886, it would be useful to allow the user to specify the range for a given field and effectively tell Kibana how far back to look for matching indices. |
I wonder if it wouldn't make sense to just look all the way back, but do it in a stepped manner, with a progress bar? |
Hi, thanks for linking my issue. Regarding unlimited lookback, I wonder how far that would scale? Personally, I was thinking of adding a cron job to automatically update the mappings each day, I wonder if that might be another avenue to explore. My situation only occurred as I was unaware that there were new mappings needing indexing, hence it took me so long to update them that they went out of scope of the default lookback - ideally there would have been something to cause or prompt for an update within the default time period, rather than the default time period being bigger. |
Make sure we cover the use cases in #2017 |
Will we still have the ability to use timestamped indexes? Having timestamped indexes provides a trivial method to remove old data. |
@ruckc Any reason you wouldn't just adjust the time range in kibana to not look at "old data"? |
@epixa We handle with a single node about 20-60gb indexed volume daily. The data looses relevancy extremely quickly (days), so we only keep at most a few days/week online depending on storage space available. |
@epixa even if the timestamped indexes were more of a workaround to lack of the field_stats API, at this point they are probably a feature to more people than just my organization who have built workflows taking advantage of them. |
@ruckc It's possible that I'm misunderstanding, but it seems to me that using the field stats api will work for your workflow. For your scenario, there would probably even be a very minor performance gain with the new setup. You could still maintain separate indexes following some sort of time-based convention. In fact, I'd say it's probably a good idea to continue doing so. Consider this hypothetical scenario: you maintain some logstash data in daily indices, and you want to retrieve any data that happens to be stored in the last 7 days. How it currently works with pattern-based naming conventionIndex pattern: A list of possible index names is generated:
Kibana does a search against all 7 of those indexes regardless of whether they actually exist. Any non-existent index is just treated as empty. How it will workIndex pattern: An index list is generated. You've deleted all but the last 4 indexes because the data is no longer useful to you, so only the 4 indexes that actually exist are included:
Kibana does a search on those indexes that are known to exist. Under the new setup, the strategy you use to generate and name indexes is not necessarily directly coupled to how kibana queries them. Let's say you start pulling in twice as much data, around 100GB a day. You could start storing data in half-day indices ( Does this make sense? Am I understanding your workflow correctly? |
Yes that makes sense, and will work. I just wanted to ensure that Kibana would continue to support querying timestamped indexes as a whole. |
@ruckc Definitely! The only requirement for querying them will be that they have some similarity in their naming convention that you can represent with a wildcard index pattern (eg |
Many folks have expressed concern about the changes that will result from this ticket, so I wanted to spell out the implementation plan for this and provide a bit more detail about what these changes mean for time-based index patterns in Kibana. Why?Kibana is now smart enough to automatically determine which indices to search against based on your current specified time range for any wildcard index pattern. This means that any wildcard index pattern (e.g. This makes it easier to get up and running quickly with Kibana. A wildcard index pattern will now work for both small amounts of data and large amounts of data. This also means that users can change their indexing strategies behind the scenes without having to create entirely new index patterns. For example, a user could change from having daily indexes to having hourly or even size-based indexes and their existing index pattern in Kibana will continue to work even when looking at a time range that spans the old and new indexes. What is changing for 4.3?All new and existing wildcard index patterns (e.g. All new and existing index patterns created using a time-based naming convention (e.g. When creating a new index pattern, users will be discouraged from using time-based naming conventions via a deprecation warning on the form. Included along with the message will be a short description about how users can now use wildcard patterns to efficiently search against time-based indexes. What is changing for 5.0?The ability to use time-based naming conventions when creating new index patterns will be removed. |
The last PR for this ticket just went into master. |
This might not be the appropriate place to ask this question, but once the ability to use time-based naming convention on index is removed, what will be the best approach to deleted old data? As with the current approach, you can simply drop old indeces based on the date in their name. But if all my data resides in a single index, how will I be able to delete data older than X days/months/etc?? As using a curator will no longer works in this case. Am I correct? |
@mac3384 You can (and probably should) still use time based indexing schemes for your data. Kibana is just now smart enough to intelligently query those indexes based on your currently selected time range for any wildcard index patterns you've created. |
@epixa, do you know what kind of algorithm Kibana is using to query only specific to time-range indices? Would it also work with such kind of naming automatically: log_somethinghere_20160130 (i.e. date is expressed as YYYYMMDD)? UPDATE: |
@xande Your update is correct. Unless you specifically opt into the behavior when you create your index pattern, Kibana does not make any decisions about time ranges based on your index names. It uses the field_stats api to ask elasticsearch which indices have data in a given time range, and then it queries those indices specifically. |
Hi @epixa I am using ES 2.3.2 and Kibana 4.5. I am the only person using this ELK stack. My index pattern is 'logstash-*'. I didn't tick the ' Use event times to create index names [DEPRECATED]' checkbox. There is more text that says 'By default, searches against any time-based index pattern that contains a wildcard will automatically be expanded to query only the indices that contain data within the currently selected time range.' When I query for 'today' in my dashboard, Kibana sends a request per visual to ES using epoch times for 'today', so this seems to fit the text above. But ES queries every single index it has (~3 months). I have the following entries in my elasticsearch.log per daily index: Eg - [2016-07-26 08:40:27,877][DEBUG][action.search ] [Invisible Woman] [logstash-2016.06.11] So this is an ES bug? |
@JeremyColton I'm not completely sure of the underlying ES implementation, to be honest, but it might be. Are you able to get me information about the network requests that Kibana makes using the network tab in your dev tools? |
Kibana sent requests with epoch times for the last 24 hours. However, I changed my index's shard number from 5 to 2 with no replicas. |
Details about this change
This change will not break existing index patterns.
See #4342 (comment) for more details.
The original description
Elasticsearch 1.6 introduced the
_field_stats
API which will, for the first time, allow us to search for indices that contain fields within a given range. For example, we can search for indices that contain an@timestamp
between X and Y.It still needs one enhancement before we can utilize it: elastic/elasticsearch#11187
This means that users will no longer be required to roll their indices at UTC midnight, nor use date patterns at all. They can effectively name indices whatever they want. and Kibana can automatically optimize requests by firing a pre-flight request for indices. We might need to add some caching here, but it should greatly enhance usability.
Update: The implementation of the above enhancement is here: elastic/elasticsearch#11259
The text was updated successfully, but these errors were encountered: