Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(deps): Upgrade SDK to 3.6.3, encode unsupported types as strings #161

Merged
merged 26 commits into from
May 25, 2023

Conversation

disq
Copy link
Member

@disq disq commented May 18, 2023

No description provided.

@disq
Copy link
Member Author

disq commented May 18, 2023

Looks like arrow/csv only supports encoding of certain types: https://github.com/cloudquery/arrow/blob/52e5d8283320da444d653d3e858b00b148cedba4/go/arrow/csv/common.go#L229

@disq disq requested a review from hermanschaaf May 18, 2023 12:55
@hermanschaaf
Copy link
Member

@disq yes, parquet as well IIRC. I think we'll have to skip the tests on those for now, do a conversion ourselves, or implement it upstream

@disq
Copy link
Member Author

disq commented May 18, 2023

@disq yes, parquet as well IIRC. I think we'll have to skip the tests on those for now, do a conversion ourselves, or implement it upstream

I'm attempting a convertSchema() approach to see where it goes

@disq disq changed the title fix(deps): Upgrade SDK to 3.5.1 fix(deps): Upgrade SDK to 3.5.2 May 18, 2023
@github-actions github-actions bot added fix and removed fix labels May 18, 2023
@disq disq changed the title fix(deps): Upgrade SDK to 3.5.2 fix(deps): Upgrade SDK to 3.5.2, encode unsupported types as strings May 18, 2023
@github-actions github-actions bot added fix and removed fix labels May 18, 2023
parquet/write.go Outdated Show resolved Hide resolved
parquet/write.go Outdated
switch dt := t.(type) {
case *arrow.DayTimeIntervalType, *arrow.DurationType, *arrow.MonthDayNanoIntervalType, *arrow.MonthIntervalType: // unsupported in pqarrow
return true
case *arrow.LargeBinaryType, *arrow.LargeListType, *arrow.LargeStringType: // not yet implemented in arrow
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should fall back to the non-large types (and zero copy contents?) in convertschema and other places, but this was easier and less messy for now.

@disq disq changed the title fix(deps): Upgrade SDK to 3.5.2, encode unsupported types as strings fix(deps): Upgrade SDK to 3.6.1, encode unsupported types as strings May 21, 2023
@github-actions github-actions bot added fix and removed fix labels May 21, 2023
@@ -12,9 +12,16 @@ import (
"github.com/cloudquery/plugin-sdk/v3/schema"
)

var pqTestOpts = schema.TestSourceOptions{
// persisted as timestamp[ms]:
SkipTimestamps: true,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These types are all persisted as timestamp[ms] so we choose to skip the test/compare logic for them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, can't we handle that? it's pretty important to check that we handle all timestamps correctly 🤔 also, are they persisted as timestamp[ms] or timestamp[us]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All timestamps are "persisted" as timestamp[ms, tz=UTC] in pqarrow for some reason. Maybe it's parquet's native timestamp format I need to check.

Copy link
Member Author

@disq disq May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hermanschaaf There's a timestamp coercion feature in pqarrow which defaults to false but they still seem to get persisted only as ms.

There's also a special case which always coerces second-precision to ms:

	// the user implicitly wants timestamp data to retain it's original time units,
	// however the arrow seconds time unit cannot be represented in parquet, so must
	// be coerced to milliseconds
	if typ.Unit == arrow.Second {
		logicalType = arrowTimestampToLogical(typ, arrow.Millisecond)
	}

@disq disq changed the title fix(deps): Upgrade SDK to 3.6.1, encode unsupported types as strings fix(deps): Upgrade SDK to 3.6.2, encode unsupported types as strings May 23, 2023
@github-actions github-actions bot added fix and removed fix labels May 23, 2023
parquet/read.go Outdated Show resolved Hide resolved
@disq disq changed the title fix(deps): Upgrade SDK to 3.6.2, encode unsupported types as strings fix(deps): Upgrade SDK to 3.6.3, encode unsupported types as strings May 24, 2023
@github-actions github-actions bot added fix and removed fix labels May 24, 2023
Copy link
Member

@yevgenypats yevgenypats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!

@yevgenypats yevgenypats added the automerge Add to automerge PRs once requirements are met label May 25, 2023
@kodiakhq kodiakhq bot merged commit 6b6e305 into main May 25, 2023
@kodiakhq kodiakhq bot deleted the deps/upgrade-sdk branch May 25, 2023 07:09
kodiakhq bot pushed a commit that referenced this pull request May 25, 2023
🤖 I have created a release *beep* *boop*
---


## [3.0.1](v3.0.0...v3.0.1) (2023-05-25)


### Bug Fixes

* **deps:** Update module github.com/cloudquery/plugin-pb-go to v1.0.6 ([#157](#157)) ([1dccb3a](1dccb3a))
* **deps:** Update module github.com/cloudquery/plugin-pb-go to v1.0.8 ([#160](#160)) ([ba7c364](ba7c364))
* **deps:** Upgrade SDK to 3.6.3, encode unsupported types as strings ([#161](#161)) ([6b6e305](6b6e305))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Add to automerge PRs once requirements are met fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants