Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export Exasol Table into AWS S3 as Parquet format #16

Merged
merged 21 commits into from
Feb 12, 2019
Merged

Export Exasol Table into AWS S3 as Parquet format #16

merged 21 commits into from
Feb 12, 2019

Conversation

morazow
Copy link
Contributor

@morazow morazow commented Feb 11, 2019

Adds a functionality to export Exasol tables as parquet format into AWS S3 bucket.

Should fix #14 and #15.

This commit only adds the ExportPath class.
- Adds ParquetWriter
- Adds RowWriteSupport

However, there are still some more changes needed:

  - Decide on decimal to int32 or int64 based on precision
  - Improve the import functionality with date and timestamps
This does not make sense in this case, because both when reading or writing exasol will provide
correct Java type, e.g, BigDecimal if decimal with precision and scale and regular Integer or Long
if int32 or int64.
S3AFileSystem somehow requires local temp directory, before uploading to S3. This currently only
tested for AWS S3. I will update this for GCP or Azure if they require this filesystem when
performing tests for those platforms.
An example date was: `0001-01-01`, or in general days before 1970.

After writing this `0001-01-01` as a days since epoch and when reading back I was getting
`0001-12-31`.

The solution was to incorporate the timezone offset millis and take the *floor* of the millis per
day.
@codecov
Copy link

codecov bot commented Feb 11, 2019

Codecov Report

Merging #16 into master will decrease coverage by 4.35%.
The diff coverage is 87.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #16      +/-   ##
==========================================
- Coverage   95.17%   90.81%   -4.36%     
==========================================
  Files           7       14       +7     
  Lines         145      403     +258     
  Branches        8       22      +14     
==========================================
+ Hits          138      366     +228     
- Misses          7       37      +30
Impacted Files Coverage Δ
...om/exasol/cloudetl/scriptclasses/ImportFiles.scala 100% <ø> (ø) ⬆️
...la/com/exasol/cloudetl/parquet/ParquetSource.scala 88.88% <ø> (ø)
...om/exasol/cloudetl/scriptclasses/ExportTable.scala 100% <100%> (ø)
...com/exasol/cloudetl/parquet/ParquetRowWriter.scala 100% <100%> (ø)
...com/exasol/cloudetl/scriptclasses/ImportPath.scala 100% <100%> (ø) ⬆️
...com/exasol/cloudetl/scriptclasses/ExportPath.scala 100% <100%> (ø)
...main/scala/com/exasol/cloudetl/bucket/Bucket.scala 100% <100%> (ø) ⬆️
.../exasol/cloudetl/parquet/ParquetWriteOptions.scala 70% <70%> (ø)
...a/com/exasol/cloudetl/parquet/RowReadSupport.scala 70.76% <70.76%> (ø)
.../com/exasol/cloudetl/parquet/RowWriteSupport.scala 88.4% <88.4%> (ø)
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 79bc525...5023000. Read the comment docs.

1 similar comment
@codecov
Copy link

codecov bot commented Feb 11, 2019

Codecov Report

Merging #16 into master will decrease coverage by 4.35%.
The diff coverage is 87.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #16      +/-   ##
==========================================
- Coverage   95.17%   90.81%   -4.36%     
==========================================
  Files           7       14       +7     
  Lines         145      403     +258     
  Branches        8       22      +14     
==========================================
+ Hits          138      366     +228     
- Misses          7       37      +30
Impacted Files Coverage Δ
...om/exasol/cloudetl/scriptclasses/ImportFiles.scala 100% <ø> (ø) ⬆️
...la/com/exasol/cloudetl/parquet/ParquetSource.scala 88.88% <ø> (ø)
...om/exasol/cloudetl/scriptclasses/ExportTable.scala 100% <100%> (ø)
...com/exasol/cloudetl/parquet/ParquetRowWriter.scala 100% <100%> (ø)
...com/exasol/cloudetl/scriptclasses/ImportPath.scala 100% <100%> (ø) ⬆️
...com/exasol/cloudetl/scriptclasses/ExportPath.scala 100% <100%> (ø)
...main/scala/com/exasol/cloudetl/bucket/Bucket.scala 100% <100%> (ø) ⬆️
.../exasol/cloudetl/parquet/ParquetWriteOptions.scala 70% <70%> (ø)
...a/com/exasol/cloudetl/parquet/RowReadSupport.scala 70.76% <70.76%> (ø)
.../com/exasol/cloudetl/parquet/RowWriteSupport.scala 88.4% <88.4%> (ø)
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 79bc525...5023000. Read the comment docs.

@morazow morazow merged commit b4d0799 into exasol:master Feb 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support writing parquet files to cloud storage
1 participant