-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Parquet format arrays #71
Conversation
@@ -177,6 +177,7 @@ func (ReverseTransformer) ReverseTransformValues(table *schema.Table, values []a | |||
if err := t.Set(v); err != nil { | |||
return nil, fmt.Errorf("failed to convert value %v to type %s: %w", v, table.Columns[i].Type, err) | |||
} | |||
res[i] = t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this, the read-write test panics if there's a type error, instead of failing with a proper (and helpful) error message.
s := pschema.JSONSchemaItemType{ | ||
Tag: `name=parquet_go_root, repetitiontype=REQUIRED`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The item name here doesn't seem to have any significance, parquet_go_root
is what the library uses but it's not a const in the parquet world, so I opted to add the table name here so that files are identifiable (with parquet-tools etc.) even if they lost their filenames.
switch col.Type { | ||
case schema.TypeStringArray, schema.TypeIntArray, schema.TypeUUIDArray, schema.TypeCIDRArray, schema.TypeInetArray, schema.TypeMacAddrArray: | ||
opts = append(opts, "repetitiontype=REPEATED") | ||
func arrayElement(t schema.ValueType) schema.ValueType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use this method in the plugin-sdk in the future.
s.Fields = append(s.Fields, &pschema.JSONSchemaItemType{Tag: tag}) | ||
|
||
if !isArray(col.Type) { // array types are handled differently, see above | ||
if col.CreationOptions.PrimaryKey || col.CreationOptions.IncrementalKey { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we enforce constraints on incremental keys in other destinations right now, but it seems like a good idea, so I'm fine with this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I tested it out too, looks like it fixes the issue, nice one! 👏
🤖 I have created a release *beep* *boop* --- ## [1.4.2](v1.4.1...v1.4.2) (2023-02-19) ### Bug Fixes * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.38.0 ([#69](#69)) ([b6fa8c8](b6fa8c8)) * Parquet format arrays ([#71](#71)) ([4a72a70](4a72a70)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
This PR fixes the array/slice handling in the Parquet filetype by slightly updating the Parquet schema definition.
It's possible to represent UUID types with the proper logical type but it just didn't work with Athena. Also had to do some weird things like defining the uuid as 22-byte FIXED_LEN_BYTE_ARRAY (UUIDs are 16 bytes) but giving it a 16-byte Go slice (not a Go array) and Base64 decoders in reverse transformers etc... And all this didn't even work when it came to
UUIDArray
s... So I gave up on that for now.JSON logicaltype seems to somewhat work with DuckDB but doesn't with Athena so I removed that as well.