-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terms API: Allow to get terms for one or more field #21
Comments
Terms API: Allow to get terms for one or more field. Closed by 5d78196. |
Please could you provide the docs for the usage of terms, so that I can add it to ElasticSearch.pm thanks clint |
The terms API accepts the following uris:
The http parameters are (fields or field must be set):
The field names support for indexName based lookup, and full path lookup (can have a type prefix or not). The results basically include a docs header, and then a object named based on the field name, and the term and document frequency for each. The only thing that I am not sure about is that currently, the term value is the JSON object name, and I wonder if it make sense to create generic JSON object, with a term field inside with its value, what do you think? |
Regarding my previous question, I simply added another http boolean parameter called termsAsArray. It defaults to true, which means you will get an array of JSON objects, with term and docFreq as fields. This will also maintain the order for parsers that are not order aware (since you can sort). If set to false, it will return JSON object names with the term itself. |
You mean fromInclusive defaults to TRUE. I've renamed these |
What do you mean by this:
Can you give me an example of the format? |
Actually, these are both incorrect. Currently Why do you have these as different values? From the naming of |
The idea of fromInclusive and toInclusive is to follow the usually convention of writing a for loop, something like for (i=0;i<10;i++), in this case, the from (0) is inclusive, and to to is not. In any case, I suggest that you follow the same wording and parameters elasticsearch uses, so you won't confuse users. We can talk about if it make sense to change this, but while I suggest keeping it the same. Regarding the field name, it is exaplined a bit here (http://www.elasticsearch.com/docs/elasticsearch/mapping/object_type/#pathType), though I should add a page that explains it explicitly. For example, if you have (person is the type of the mapping):
then the field name (that will match) will be either person.name.firstName, or name.firstName. If you add explicit mapping for the name object (or person), you can control the pathType. |
OK - I didn't get that. I would say then they should be called In Perl (and some other dynamic languages), loops can be written more succinctly, like:
... both of which are inclusive. To my mind, basing the default values of
OK, I have two mappings: type_1 and type_2. Both have a field 'text', but i ask for terms on field 'text' or 'type_1.text', I get the same results, which doesn't seem to be what I'm asking. Is this what it is supposed to do? |
No problem, make sense, I will change the toInclusive to true. Regarding the field name, yea, its not filtered by type if you prefix it by type (which is different than if you use the typed field in search queries for example). It can be implemented, but its more difficult and will be much more expensive to perform, so for now, I did not implement it. |
Due to fix [3790](#3790) in core, upgrading an analyzer provided as a plugin now fails. See #5030 for details. Issue is in elasticsearch core code but can be fixed in plugins by overloading `PreBuiltAnalyzerProviderFactory`, `PreBuiltTokenFilterFactoryFactory`, `PreBuiltTokenizerFactoryFactory` or `PreBuiltCharFilterFactoryFactory ` when used. Closes #21 (cherry picked from commit 3401c21)
Latest changes break tests Closes #21. (cherry picked from commit 04c77e8)
Closes #21. (cherry picked from commit a1b37f6)
According to [Containers naming guide](http://msdn.microsoft.com/en-us/library/dd135715.aspx): > A container name must be a valid DNS name, conforming to the following naming rules: > > * Container names must start with a letter or number, and can contain only letters, numbers, and the dash (-) character. > * Every dash (-) character must be immediately preceded and followed by a letter or number; consecutive dashes are not permitted in container names. > * All letters in a container name must be lowercase. > * Container names must be from 3 through 63 characters long. We need to fix the documentation and control that before calling Azure API. The validation will come with issue #27. Closes #21. (cherry picked from commit 6531165)
Sometimes Tika may crash while parsing some files. In this case it may generate just runtime errors (Throwable), not TikaException. But there is no “catch” clause for Throwable in the AttachmentMapper.java : String parsedContent; try { // Set the maximum length of strings returned by the parseToString method, -1 sets no limit parsedContent = tika().parseToString(new FastByteArrayInputStream(content), metadata, indexedChars); } catch (TikaException e) { throw new MapperParsingException("Failed to extract [" + indexedChars + "] characters of text for [" + name + "]", e); } As a result, tika() may “hang up” the whole application. (we have some pdf-files that "hang up" Elastic client if you try to parse them using mapper-attahcment plugin) We propose the following fix: String parsedContent; try { // Set the maximum length of strings returned by the parseToString method, -1 sets no limit parsedContent = tika().parseToString(new FastByteArrayInputStream(content), metadata, indexedChars); } catch (Throwable e) { throw new MapperParsingException("Failed to extract [" + indexedChars + "] characters of text for [" + name + "]", e); } (just replace “TikaException” with “Throwable” – it works for our cases) Thank you! Closes elastic#21.
Prior to this change, the `publish()` method comprises a deeply nested collection of lambdas and anonymous classes which represent the notion of a single publication attempt. In future we want to treat it as a first-class concept so we can detect when it fails etc. This change gives names to the anonymous lambdas and classes as a step towards this.
Add updated repo to ubuntu to use newer version of ansible
Some revisions to the idea of using a function
With this commit we add a new command line parameter --elasticsearch-plugins to Night Rally. It will pass this parameter to Rally but also check whether we've specified "x-pack:security" as a plugin and will make the necessary adjustments (change expected cluster health and adjust the client options). Relates elastic#21
Getting terms (from one or more indices) and their document frequency (the number of time those terms appear in a document) is very handy. For example, implementing tag clouds, or providing basic auto suggest search box.
There should be several options for this API, including sorting by term (lex) or doc freq, bounding size, from/to (inclusive or not), min/max freq, prefix and regexp filtering.
The rest api should be: /{index}/_terms
The text was updated successfully, but these errors were encountered: