-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Arrow migration #87
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good (didn't test). So the TDLR for this PR is that we use Apache arrow type system instead of ours (extending types they don't support)?
A walkthrough will be great, and also seeing them accept some of our extended types as contributions
case schema.TypeIntArray: | ||
typ = arrow.ListOf(arrow.PrimitiveTypes.Int64) | ||
case schema.TypeTimestamp: | ||
typ = arrow.FixedWidthTypes.Timestamp_us |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment that us
means micros seconds (and not US timezone)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arrow type is named badly, should've been arrow.FixedWidthTypes.TimestampMicros
internal/cqarrow/mac.go
Outdated
return b.Unmarshal(dec) | ||
} | ||
|
||
// UUIDArray is a simple array which is a FixedSizeBinary(16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: It looks like all the comments in this file that refer to FixedSizeBinary or UUID need to be updated (they were copied from UUID)
This should unblock cloudquery/filetypes#87 Add supports for timestamp to decode arrow string and for byte to decode base64 (backward compat is saved)
Co-authored-by: Herman Schaaf <hermanschaaf@gmail.com>
Co-authored-by: Kemal <223029+disq@users.noreply.github.com>
Co-authored-by: Herman Schaaf <hermanschaaf@gmail.com>
Co-authored-by: Herman Schaaf <hermanschaaf@gmail.com>
cded2e3
to
2ce1b28
Compare
"github.com/google/go-cmp/cmp" | ||
) | ||
|
||
func DiffSchema(sc, o *arrow.Schema) (string, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is some dead code that can be removed?
if diff := cmp.Diff(arrowSchema.String(), expecetdSchema.String()); diff != "" { | ||
t.Errorf(diff) | ||
} | ||
// if diff, _ := DiffSchema(arrowSchema, expecetdSchema); diff != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The only use of DiffSchema is here and it's commented out)
b.AppendNull() | ||
return | ||
} | ||
bytes, err := json.Marshal(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we need to use json.Encoder here as well, like we had to in plugin-sdk (https://github.com/cloudquery/plugin-sdk/pull/714/files)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I tested it against the old json filetypes plugin for a couple of sources and the output was the byte-for-byte the same 👍 We may want to consider using json.Encoder
but I think it's not a blocker
🤖 I have created a release *beep* *boop* --- ## [1.5.0](v1.4.2...v1.5.0) (2023-03-06) ### Features * Arrow migration ([#87](#87)) ([135f022](135f022)) ### Bug Fixes * **deps:** Update golang.org/x/xerrors digest to 04be3eb ([#78](#78)) ([5d47135](5d47135)) * **deps:** Update module github.com/apache/thrift to v0.18.0 ([#83](#83)) ([bce51fa](bce51fa)) * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.38.1 ([#72](#72)) ([0eadcfd](0eadcfd)) * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.38.2 ([#74](#74)) ([90bdbba](90bdbba)) * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.39.0 ([#75](#75)) ([b65db2e](b65db2e)) * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.39.1 ([#76](#76)) ([c03025e](c03025e)) * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.40.0 ([#77](#77)) ([7ec1df0](7ec1df0)) * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.41.0 ([#86](#86)) ([5b89f33](5b89f33)) * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.42.0 ([#88](#88)) ([99da3b2](99da3b2)) * **deps:** Update module github.com/klauspost/compress to v1.16.0 ([#84](#84)) ([a9c9da3](a9c9da3)) * **deps:** Update module github.com/pierrec/lz4/v4 to v4.1.17 ([#79](#79)) ([fbc42d4](fbc42d4)) * **deps:** Update module github.com/stretchr/testify to v1.8.2 ([#80](#80)) ([b080141](b080141)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Blocked by cloudquery/plugin-sdk#724
but ready for initial review and discussion. I can do a short walkthrough for anyone up for review.
Apache arrow fork is here (We use it until this is merged): https://github.com/cloudquery/arrow/tree/feat_extension_builder.
Some more notes: