Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Add new nanosecond supporting field mapper #32601

Closed
spinscale opened this issue Aug 3, 2018 · 4 comments
Closed

Core: Add new nanosecond supporting field mapper #32601

spinscale opened this issue Aug 3, 2018 · 4 comments
Labels
blocker :Core/Infra/Core Core issues without another label v7.0.0-beta1

Comments

@spinscale
Copy link
Contributor

spinscale commented Aug 3, 2018

This new field mapper should support nanosecond timestamps. There are ideas to add support for this. You could come up with a new data structure that supports any date with a nanosecond resolution - which means that you need another data structure than the current long value we use for our current dates. This also implies that indexing and querying will be more expensive.

The other alternative would be to use a long and store the nanoseconds since the epoch. This limits our dates to our range of 1677 toll 2262, meaning we cannot store birthdays from many people in wikipedia. However, when you need nanosecond resolution it is usually about log files and not about birth dates. And those log files usually fit into the above mentioned date range.

This issue suggest to implement a timestamp (names are just suggestions here) field mapper, that stores dates in nanosecond resolution as a long.

This mapper needs to reject any date that is out of the above range when indexing (which also means there is a query short circuit).

Backwards compatibility

The most important part is to be able to search across shards where one field is a long in milliseconds and one field a long in nanoseconds. Adrien came up with the idea of extending org.elasticsearch.common.lucene.Lucene.readSortValue(StreamInput in) and add a special type to mark a sorting as timestamp as nanoseconds, this way merging of results will be possible by adapting the values before merge.

Something to keep in mind here: When mixing indices that have dates in nanos and dates in millis, and we convert to nanos, we cannot deal with dates outside of the nanosecond range. So we have to error out when such a query comes in before doing flawed conversions.

Note: If the long is treated unsigned we could move the range (also requiring more different conversions, if mixing up with millis)

Aggregations

Having nanosecond resolution buckets would result in a lot of buckets, so I do consider this a second step and this should not stop adding the field mapper to add first preliminary support.

Relates #27330

@spinscale spinscale added :Core/Infra/Core Core issues without another label v7.0.0 labels Aug 3, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@jasontedor
Copy link
Member

Relates #10005

@uschindler
Copy link
Contributor

uschindler commented Aug 3, 2018

Another alternative to use a double with "days since epoch" or similar. Dates around the current time/epoch have highest precision, but datetimes far away from today loose precision. We use this aproach for scientific data, as it is obvious that exact date times only make sense around now, not for stuff far away.
This approach allows easy sorting and the conversion to java.time is easy:

  /** Converts a TemporalAccessor to a double (days since epoch). Includes time, if available. */
  public static double temporalToDouble(TemporalAccessor accessor) {
    double r = accessor.getLong(ChronoField.EPOCH_DAY);
    if (accessor.isSupported(ChronoField.NANO_OF_DAY)) {
      r += accessor.getLong(ChronoField.NANO_OF_DAY) / NANOS_PER_DAY;
    }
    return r;
  }

  /** Converts a double with the days since epoch to an Instant. */
  public static Instant doubleToInstant(double epochDouble) {
    final long epochDays = (long) epochDouble;
    return Instant.EPOCH.plus(epochDays, ChronoUnit.DAYS)
        .plusNanos(Math.round((epochDouble - epochDays) * NANOS_PER_DAY));
  }

Just ideas! (this code is untested, I just converted it from millies to nanos, maybe there are some sign problem, but i think it's tested also for dates before epoch).

@uschindler
Copy link
Contributor

As far as I remember PostgreSQL internally uses the same data type for SQL timestamps.

jimczi added a commit to jimczi/elasticsearch that referenced this issue Jan 31, 2019
This change adds an option to the `FieldSortBuilder` that allows to transform the type
of a numeric field into another. Possible values for this option are `long` that transforms
the source field into an integer and `double` that transforms the source field into a floating point.
This new option is useful for cross-index search when the sort field is mapped differently on some
indices. For instance if a field is mapped as a floating point in one index and as an integer in another
it is possible to align the type for both indices using the `numeric_type` option:

```
{
   "sort": {
    "field": "my_field",
    "numeric_type": "double" <1>
   }
}
```

<1> Ensure that values for this field are transformed to a floating point if needed.

Only `long` and `double` are supported at the moment but the goal is to also handle `date` and `date_nanos`
when elastic#32601 is merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker :Core/Infra/Core Core issues without another label v7.0.0-beta1
Projects
None yet
Development

No branches or pull requests

6 participants