-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define datum reader/writer interfaces with variant #1635
Conversation
CROSS-ENGINE-REPORT ❌
Testing Details
Result Details
Now FAILING Tests ❌The following 3 test(s) were previously PASSING in BASE but are now FAILING in TARGET: Click here to see
Now IGNORED Tests ❌The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact. Now Passing Tests180 test(s) were previously failing in BASE (LEGACY-V0.14.8) but now pass in TARGET (EVAL-48F3A00). Before merging, confirm they are intended to pass. The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact. CROSS-COMMIT-REPORT ✅
Testing DetailsResult Details
|
*/ | ||
public abstract class PEnum { | ||
public abstract class Enum { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally named PEnum
due to the ubiquity of java.lang.Enum
. I don't mind it either way since most people don't use Enum
directly, but bringing it up for awareness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I figured since not used directly, parameterized, and requires explicit import then there's little to no ambiguity.
public interface Variant<T> extends Datum { | ||
|
||
/** | ||
* Unpack the inner variant value. | ||
* | ||
* @return T | ||
*/ | ||
T unpack(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was my assumption that PType.Kind would have another variant: VARIANT
with some method to expose the parameterization. And, a Datum
holding type VARIANT
would allow for the invocation of two methods: a new Datum unpack()
and the existing byte[] getBytes()
.
Doing it like this, we could still unpack the variant to a PartiQL value (Datum), and if we expose IonVariant
as a public class, then Ion's API of IonElement
wouldn't be exposed either. I think it would provide enough APIs for variant consumers (ourselves included) to create a DatumReader/Writer for the variant type. Also, it would allow for us to internalize Variant
, IonDatumReader
, IonDatumWriter
, and IonVariant
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if this clarifies the model.
- A variant value is a datum (partiql value) but holds an inner type (hence unpack)
- You do not unpack the variant to a value because it already is a value – variant extends datum
- The VARIANT PType is parameterized by a string aka
variant("ion"), variant("json"), variant("cbor")
- Unpack is to provide direct access (like old .ionValue)
- Pack is to encode as bytes for the writer.
Hopefully this model is a bit more clear? To address the concerns
- no need to unpack, a variant is a datum implementation
- DatumReader/DatumWriter are the interfaces to read some encoding of PartiQL values into the Datum interface
- Variant, IonDatumReader, IonDatumWriter, and IonVariant are explicitly public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you are describing adding variant to the datum fat interface, I thought about that but variant is parameterized on some type. I'm not sure of a better solution without parameterizing Datum or having some method Object unpack()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to unpack, a variant is a datum implementation
Yes, exactly. We've established that variant is a runtime type and should be reflected in our runtime values.
In the IonVariant
impl, there is a TODO to return PType.variant("ion")
on getType()
. That's where it's a bit odd to me, because now, we've broken the contract between getType
and any of the other getters. It seems like the contract would be better between PType.Kind.VARIANT
and Datum.unpack(): Datum
.
We're back to a similar .check<Variant>()
as before, where we can't fully trust the relationship between Datum#getType()
and the getters. For example:
// This is all pseudo-code, for brevity
class Udf : Function {
override fun signature() = Signature("udf", params = listOf(PType.variantIon()))
override fun eval(args: Array<Datum>) {
val datum = args[0]
val variant: IonVariant = datum.check<IonVariant>() // which will throw a TypeCheckException and perform the cast
val underlyingValue: IonElement = datum.unpack()
return // whatever we want
}
}
Rather than:
// This is all pseudo-code, for brevity
class Udf : Function {
override fun signature() = Signature("udf", params = listOf(PType.variantIon()))
override fun eval(args: Array<Datum>) {
val underlyingValue: Datum = datum.unpack() // Or, we can call datum.getBytes()
return // whatever we want
}
}
This is especially apparent when we want to be in charge of casting. How would we know which Datum getter to invoke when getType()
returns PType.variant("ion")
?
* @return next Datum or null. | ||
*/ | ||
@Nullable | ||
public Datum next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't this look a bit more like the IonReader
with stepIn()
and stepOut()
?
Kind of similar to IonReader
, the constructors for IonTextReader
takes in an IonValue or stream of data , and it doesn't return IonValues
-- it returns primitives as you walk the data. I assumed this would be similar. They do expose an IonElementLoader
which seemingly wraps a reader and creates an IonElement
, but it's different than the reader itself.
As a side, this came into my mind because I saw the JSON directory placeholder. JSON doesn't allow for multiple top-level values, so the DatumReader
would always just return a single Datum
. As a consumer of this, you'd still need to know the in's and out's of the Datum API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reader/write are really the two functions "encode" and "decode" you could conceive of backing them by a cursor-like API, but that is not the intention here. Notice how the methods return datums and not primitives. These encode a Datum into some bytes, and can decode back into the Datum interface.
The consumer of a reader is not responsible for how the value is traversed, that's what the Datum interface allows for them. This is simply about getting bytes into and outof the Datum interface.
I don't believe you would need to know the in's and out's of Datum to use this. It just wraps a stream of one-or-more values. Yes, json is one top-level value, but there's others like
- jsonlines
- yaml
- csv
- etc.
which allow for multiple values in a stream
* | ||
* @param datum to write. | ||
*/ | ||
public DatumWriter write(Datum datum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got similar thoughts as the DatumReader
comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is intentionally not a cursor. The functionality is really just
write(value) -> bytes
read(bytes) -> value
Description
This PR introduces the DatumReader and DatumWriter interfaces along with the IonDatumReader implementation. In the interest of time, I've opened the PR now to get the interfaces in – and will continue to flesh out the implementations when I have time.
Once completed, and RFC will be raised with the encoding scheme.
Other Information
and Code Style Guidelines? YES
License Information
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.is pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.