-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(deps): Upgrade SDK to 3.6.3, encode unsupported types as strings #161
Conversation
Looks like arrow/csv only supports encoding of certain types: https://github.com/cloudquery/arrow/blob/52e5d8283320da444d653d3e858b00b148cedba4/go/arrow/csv/common.go#L229 |
@disq yes, parquet as well IIRC. I think we'll have to skip the tests on those for now, do a conversion ourselves, or implement it upstream |
I'm attempting a |
parquet/write.go
Outdated
switch dt := t.(type) { | ||
case *arrow.DayTimeIntervalType, *arrow.DurationType, *arrow.MonthDayNanoIntervalType, *arrow.MonthIntervalType: // unsupported in pqarrow | ||
return true | ||
case *arrow.LargeBinaryType, *arrow.LargeListType, *arrow.LargeStringType: // not yet implemented in arrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we should fall back to the non-large types (and zero copy contents?) in convertschema and other places, but this was easier and less messy for now.
parquet/write_read_test.go
Outdated
@@ -12,9 +12,16 @@ import ( | |||
"github.com/cloudquery/plugin-sdk/v3/schema" | |||
) | |||
|
|||
var pqTestOpts = schema.TestSourceOptions{ | |||
// persisted as timestamp[ms]: | |||
SkipTimestamps: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These types are all persisted as timestamp[ms] so we choose to skip the test/compare logic for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, can't we handle that? it's pretty important to check that we handle all timestamps correctly 🤔 also, are they persisted as timestamp[ms] or timestamp[us]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All timestamps are "persisted" as timestamp[ms, tz=UTC]
in pqarrow for some reason. Maybe it's parquet's native timestamp format I need to check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hermanschaaf There's a timestamp coercion feature in pqarrow which defaults to false but they still seem to get persisted only as ms.
There's also a special case which always coerces second-precision to ms:
// the user implicitly wants timestamp data to retain it's original time units,
// however the arrow seconds time unit cannot be represented in parquet, so must
// be coerced to milliseconds
if typ.Unit == arrow.Second {
logicalType = arrowTimestampToLogical(typ, arrow.Millisecond)
}
f20fffa
to
3fbf5c4
Compare
66f2ef9
to
35265cf
Compare
aacdf6b
to
94b8eb9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job!
🤖 I have created a release *beep* *boop* --- ## [3.0.1](v3.0.0...v3.0.1) (2023-05-25) ### Bug Fixes * **deps:** Update module github.com/cloudquery/plugin-pb-go to v1.0.6 ([#157](#157)) ([1dccb3a](1dccb3a)) * **deps:** Update module github.com/cloudquery/plugin-pb-go to v1.0.8 ([#160](#160)) ([ba7c364](ba7c364)) * **deps:** Upgrade SDK to 3.6.3, encode unsupported types as strings ([#161](#161)) ([6b6e305](6b6e305)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
No description provided.