-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement java.time.{Duration, Instant, Period} type encoders #581
Implement java.time.{Duration, Instant, Period} type encoders #581
Conversation
Thanks for the PR 🎉 , I'll look into it a bit later, I have a quick question though, is not |
I guess so, but I think YearMonthIntervalType it's not available for 3.0 and 3.1, or maybe I'm misunderstanding the problem ? |
@jgoday oh, how the TypedEncoder may look like for |
I think that with import org.apache.spark.sql.catalyst.util.{IntervalUtils}
implicit val timeDuration: TypedEncoder[java.time.Duration] = new TypedEncoder[java.time.Duration] {
def nullable: Boolean = false
def jvmRepr: DataType = ScalaReflection.dataTypeFor[java.time.Duration]
def catalystRepr: DataType = DayTimeIntervalType()
def toCatalyst(path: Expression): Expression =
StaticInvoke(
IntervalUtils.getClass,
DayTimeIntervalType(),
"durationToMicros",
path :: Nil,
returnNullable = false)
}
def fromCatalyst(path: Expression): Expression =
StaticInvoke(
IntervalUtils.getClass,
ObjectType(classOf[java.time.Duration]),
"microsToDuration",
path :: Nil,
returnNullable = false)
}
implicit val timePeriod: TypedEncoder[java.time.Period] = new TypedEncoder[java.time.Period] {
def nullable: Boolean = false
def jvmRepr: DataType = ScalaReflection.dataTypeFor[java.time.Period]
def catalystRepr: DataType = YearMonthIntervalType()
def toCatalyst(path: Expression): Expression =
StaticInvoke(
IntervalUtils.getClass,
YearMonthIntervalType(),
"periodToMonths",
path :: Nil,
returnNullable = false)
def fromCatalyst(path: Expression): Expression =
StaticInvoke(
IntervalUtils.getClass,
ObjectType(classOf[java.time.Period]),
"monthsToPeriod",
path :: Nil,
returnNullable = false)
} |
Hm my immidiate thoughts are:
I'm okay with 2 if it is useful for users and definitely should not be an issue to redefine in a priority scope. |
@pomadchin
Seems to be failing with the Instant conversion (DateTimeUtils.instantToMicros), should we be using another more realistic range (from Instant.EPOCH to Instant.now() for example) ? Arbitrary(Gen.choose[Instant](Instant.EPOCH, Instant.now())) |
@jgoday 👍 |
Codecov Report
@@ Coverage Diff @@
## master #581 +/- ##
=======================================
Coverage 95.14% 95.15%
=======================================
Files 65 65
Lines 1134 1157 +23
Branches 8 7 -1
=======================================
+ Hits 1079 1101 +22
- Misses 55 56 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instant
codec 100% makes sense, it has a separate type and supported across all 3.x versions. I also left a couple of comments.
Duration
and Period
look more like Injections
(since that is a covnersion to the underlying type). I'm good with having those as default injections in the TypedEncoder
companion object, but not strognly opinionated about it: it may cause a confusing fallback for those who forgot to implement it.
Thanks for looking into all corner cases! 👍
17e0ec7
to
d5a966b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgoday hey 👋; sorry that I didn't have a chance to review it earlier;
I added some changes to clean it up a bit. However, it looks like Duration and Period require a little bit of more tests; I tried to add them into frameless.CollectTests, however didn't have enough time to investigate failures.
Could you take a look into that? I think there is some inconsistency in the encode / decode behavior. If you would not have time I'll have another look into it later this week.
Hi, what's up there ? |
Hey @cchantep requires a little bit of work and extra tests coverage to ensure that all is consistent, sadly I didn't have much time lately. |
d5a966b
to
cf3fa3a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that it took so long to get it merged!
Once it's green I'm merging it, thanks for your contribution 🚀
StaticInvoke( | ||
DateTimeUtils.getClass, | ||
TimestampType, | ||
"instantToMicros", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgoday that's the main beast, has some limitations 🤷 but we have to use it to be compat with Spark.
761a4bc
to
9273128
Compare
9273128
to
f3850af
Compare
See #576 (provide TypedEncoder for java.time.{Instant, Duration, Period} in Spark 3.2 #576)
This PR implements the following implicit typeEncoders in frameless.TypedEncoder
As DayTimeIntervalType and YearMonthIntervalType were introduced in spark 3.2,
to maintain compatibility through 3.0, 3.1 and 3.2 spark versions,
java.time.Instant is represented as an Int (days) catalyst type and java.time.Duration as a LongType (millis).