Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Polars backend #22

Merged
merged 72 commits into from
Aug 29, 2024
Merged

Add a Polars backend #22

merged 72 commits into from
Aug 29, 2024

Conversation

finn-rudolph
Copy link
Contributor

No description provided.

The row order after a polars group_by().agg() is not consistent across
multiple executions, so it does not make much sense to test it.
has the same functionality as pandas, need to support arrange / filter
for window functions.
- make dense_rank a nullary operator

- do not allow an operator to mutate it's args via mutate_args (this was just
  for rank and dense_rank and can be handled in a simpler way)

- rank and dense_rank are an explicit special case in polars _translate_function
  since the polars rank function does not support ranking on multiple expressions,
  thus a conversion to a struct column is necessary
values should not be sorted or positive only up front as this can hide bugs
one cannot specify a tolerance for decimal-float comparison
it does not seem like something very common and every backend does something
different and calls it string join / string agg or similar. The SQL backends do
a cumulative join, polars not. Maybe we can add it back in as a function only
allowed in summarise and not in mutate, since there the behaviour (apart from MSSQL,
of course) is consistent.
we also implicitly convert dates to datetimes when comparing / subtracting them.
the exact semantics still need to be defined (don't know what SQL does).
strings are 0-indexed.
Postgres and SQLite give cols that only contain null values after a join
the Null type. Polars and MSSQL don't do that. transform does what the
backend does.
in pl.read_database, DuckDB only accepts a string or some special DuckDB object,
so we compile the query to a string for DuckDB. The changed test failed since DuckDB
apparently produces floats on integer division. This will be investigated later.
@finn-rudolph finn-rudolph requested a review from a team as a code owner August 29, 2024 18:02
@finn-rudolph finn-rudolph merged commit e22e52c into main Aug 29, 2024
6 checks passed
@finn-rudolph finn-rudolph deleted the polars-backend branch August 29, 2024 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant