Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polars Generic Dataset #153

Closed
wants to merge 12 commits into from
Closed

Polars Generic Dataset #153

wants to merge 12 commits into from

Conversation

wmoreiraa
Copy link
Contributor

Signed-off-by: wmoreiraa walber3@gmail.com

Description

Add polars.GenericDataSet as discussed in #95
Ive probably closed the old PR doing wrong git things #116

Development notes

Similar to pandas.GenericDataSet, changes made:

  1. Add entry "write_mode" to be able to use formats that polars doesnt provide a write method (e.g.: delta, excel)
  2. Write file permission changed to "wb" from "w".
  3. Added more tests cases for different file formats.

Checklist

  • Updated the documentation to reflect the code changes
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes

wmoreiraa and others added 12 commits February 13, 2023 17:03
Signed-off-by: wmoreiraa <walber3@gmail.com>
Signed-off-by: wmoreiraa <walber3@gmail.com>
Signed-off-by: wmoreiraa <walber3@gmail.com>
Signed-off-by: wmoreiraa <walber3@gmail.com>
…test-reqs dep hell

Signed-off-by: wmoreiraa <walber3@gmail.com>
Signed-off-by: wmoreiraa <walber3@gmail.com>
…t to parqquet from csv

Signed-off-by: wmoreiraa <walber3@gmail.com>
…anch

Signed-off-by: wmoreiraa <walber3@gmail.com>
Signed-off-by: wmoreiraa <walber3@gmail.com>
Signed-off-by: wmoreiraa <walber3@gmail.com>
Signed-off-by: wmoreiraa <walber3@gmail.com>
@wmoreiraa
Copy link
Contributor Author

Notes: the comments/review from @merelcht on #116 are on point and were not addressed (those not related to spark changes that are not on this PR anymore).

"Maybe I'm misunderstanding this, but if you're using a read-only data format, you simply shouldn't call save right? It kind of sounds that if you change the write mode to ignore you wouldn't get an error, but you would get raise DataSetError(f"Write mode '{self._write_mode}' is read-only.") as set in line 219." . I think this one could go two different ways:

  1. Removing the read_only formats and then removing the write_mode option : Easiest way, it would lose the functionality but I do think its very minor and it would allow more general usage and further improvement on this by community.
  2. Refactoring and fixing this implementation mistake that Ive made.

@datajoely @astrojuanlu , I might not have any time soon to do this. Think you could finish this fix?

@merelcht all other review/comments are addressed if removing the write_mode, this is why I dont specifically copied here. Also, thank you for the review.

@astrojuanlu
Copy link
Member

Thanks a lot for getting this far @wmoreiraa. We'll take care of the final push 💪🏽

@merelcht merelcht linked an issue Apr 11, 2023 that may be closed by this pull request
@astrojuanlu
Copy link
Member

Closing in favor of gh-170.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding Polars' dataframe as a dataset
2 participants