-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scaled_float
.
#19264
Add scaled_float
.
#19264
Conversation
To give more detailed information about why half floats are not enough, here is a table that gives disk usage for storing 10M random floats between 0 and 1 depending on the mapping:
I chose Of course this is not a good benchmark since this is fake data, but given how points and doc values work this simulates the worst case and real data could expect even better disk utilization. |
|
||
/** A {@link FieldMapper} for scaled floats. Values are internally multiplied | ||
* by a scaling factor and rounded to the closest long. */ | ||
public class ScaledFloatFieldMapper extends FieldMapper implements AllFieldMapper.IncludeInAll { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question: would it be possible to extend from LongFieldMapper
? Would be nice to have some code reuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it when working on this PR but in the end it made things more complicated since this mapper partially needs to behave as a long field and as a double field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I can see how this can complicate things, was just hoping that this code reuse would be a low hanging fruit.
Updated numbers with https://issues.apache.org/jira/browse/LUCENE-7371:
|
This is a tentative to revive elastic#15939 motivated by elastic/beats#1941. Half-floats are a pretty bad option for storing percentages. They would likely require 2 bytes all the time while they don't need more than one byte. So this PR exposes a new `scaled_float` type that requires a `scaling_factor` and internally indexes `value*scaling_factor` in a long field. Compared to the original PR it exposes a lower-level API so that the trade-offs are clearer and avoids any reference to fixed precision that might imply that this type is more accurate (actually it is *less* accurate). In addition to being more space-efficient for some use-cases that beats is interested in, this is also faster that `half_float` unless we can improve the efficiency of decoding half-float bits (which is currently done using software) or until Java gets first-class support for half-floats.
4be06bf
to
398d70b
Compare
Elasticsearch added a couple of new numeric datatypes, which means we need to update our type casting list to include them. Kibana should see them as "numbers" so they work properly in searches and aggs. Fixes elastic#7782 Related elastic/elasticsearch#18887 Related elastic/elasticsearch#19264
Elasticsearch has recently added scaled_float as an option for storing floating point numbers. The scaled floats are stored internally as longs, which means they can take advantage of the integer compression in Lucene. See elastic/elasticsearch#19264 for details. The PR moves all percentages to scaled floats. In our `fields.yml` we assume a default scaling factor of 1000, which should work well for our percentages (values between 0 and 1). This scaling factor can also be set to a different value in `fields.yml`.
Elasticsearch has recently added scaled_float as an option for storing floating point numbers. The scaled floats are stored internally as longs, which means they can take advantage of the integer compression in Lucene. See elastic/elasticsearch#19264 for details. The PR moves all percentages to scaled floats. In our `fields.yml` we assume a default scaling factor of 1000, which should work well for our percentages (values between 0 and 1). This scaling factor can also be set to a different value in `fields.yml`.
Elasticsearch added a couple of new numeric datatypes, which means we need to update our type casting list to include them. Kibana should see them as "numbers" so they work properly in searches and aggs. Fixes elastic#7782 Related elastic/elasticsearch#18887 Related elastic/elasticsearch#19264 Former-commit-id: 298ee35
This is a tentative to revive #15939 motivated by elastic/beats#1941.
Half-floats are a pretty bad option for storing percentages. They would likely
require 2 bytes all the time while percentages really don't need more than one
byte.
So this PR exposes a new
scaled_float
type that requires ascaling_factor
and internally indexes
value*scaling_factor
in a long field. Compared to theoriginal PR it exposes a lower-level API so that the trade-offs are clearer and
avoids any reference to fixed precision that might imply that this type is more
accurate (actually it is less accurate).
In addition to being more space-efficient for some use-cases that beats is
interested in, this is also faster that
half_float
unless we can improve theefficiency of decoding half-float bits (which is currently done using software)
or until Java gets first-class support for half-floats.