Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use half floats for most floating point numbers #1941

Merged
merged 1 commit into from
Jul 4, 2016

Conversation

tsg
Copy link
Contributor

@tsg tsg commented Jun 30, 2016

Starting with 5.0.0-alpha4, Elasticsearch supports half floats.

Half precision floating-point numbers
have good precision for small numbers, but the precision degrades fast for larger numbers.

This is switching to half floats all the fields that are percentages
(values between 0 and 1) and the fields that were floats but have naturally
small values only. After going through the list, most of our floating point
numbers fit in one of these categories. Only 3 float fields were left untouched.

The template generator automatically switches to "float" when generating the
ES 2.x template files.

This closes #1936.

Starting with 5.0.0-alpha4, Elasticsaerch supports half floats.

[Half precision floating-point](https://en.wikipedia.org/wiki/Half-precision_floating-point_format) numbers
have good precision for small numbers, but they degrade for larger numbers.

This is switching to half floats all the fields that are percentages
(values between 0 and 1) and the fields that were floats but have naturally
small values only. After going through the list, most of our floating point
number fit in one of these categories. Only 3 float fields were left untouched.

The template generator automatically switches to "float" when generating the
ES 2.x template files.
@tsg tsg added the review label Jun 30, 2016
@tsg
Copy link
Contributor Author

tsg commented Jun 30, 2016

jenkins, retest it

@ruflin ruflin merged commit 3498d2c into elastic:master Jul 4, 2016
@ruflin
Copy link
Contributor

ruflin commented Jul 4, 2016

Thanks a lot for taking this one.

@jpountz
Copy link
Contributor

jpountz commented Jul 4, 2016

I'm curious whether this significatly reduces the size of indices?

@jpountz
Copy link
Contributor

jpountz commented Jul 7, 2016

See also elastic/elasticsearch#19264.

jpountz added a commit to jpountz/elasticsearch that referenced this pull request Jul 18, 2016
This is a tentative to revive elastic#15939 motivated by elastic/beats#1941.
Half-floats are a pretty bad option for storing percentages. They would likely
require 2 bytes all the time while they don't need more than one byte.

So this PR exposes a new `scaled_float` type that requires a `scaling_factor`
and internally indexes `value*scaling_factor` in a long field. Compared to the
original PR it exposes a lower-level API so that the trade-offs are clearer and
avoids any reference to fixed precision that might imply that this type is more
accurate (actually it is *less* accurate).

In addition to being more space-efficient for some use-cases that beats is
interested in, this is also faster that `half_float` unless we can improve the
efficiency of decoding half-float bits (which is currently done using software)
or until Java gets first-class support for half-floats.
@tsg tsg deleted the half_floats branch August 25, 2016 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch percentages and other small floats to use half floats
3 participants