Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Upgrade arrow/datafusion/deltalake/lance #2429

Merged
merged 19 commits into from
Jan 25, 2024

Conversation

vrongmeal
Copy link
Contributor

@vrongmeal vrongmeal commented Jan 16, 2024

🤞

@vrongmeal vrongmeal changed the title wip: upgrade df/arrow/... chore: Upgrade arrow/datafusion/deltalake/lance Jan 19, 2024
crates/datasources/src/mongodb/mod.rs Outdated Show resolved Hide resolved
Comment on lines +2069 to +2056
ast::DataType::Int64
| ast::DataType::Float64
| ast::DataType::Nvarchar(_)
| ast::DataType::JSON
| ast::DataType::Uuid
| ast::DataType::Binary(_)
| ast::DataType::Bytes(_)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add these datatypes? I excluded them since Binary was already excluded…

Comment on lines +1989 to +1971
ast::DataType::Array(ast::ArrayElemTypeDef::AngleBracket(inner_sql_type))
| ast::DataType::Array(ast::ArrayElemTypeDef::SquareBracket(inner_sql_type)) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting both is OK?

@vrongmeal
Copy link
Contributor Author

Resolving bugs. There are changes in deltalake that remove DeltaObjectStore in favour of LogStore. I have made this change but it breaks some stuff. Fixing it.

@vrongmeal
Copy link
Contributor Author

Resolving bugs. There are changes in deltalake that remove DeltaObjectStore in favour of LogStore. I have made this change but it breaks some stuff. Fixing it.

Somehow there’s an extra log that resets the table schema to nothing:

{"metaData":{"id":"06d0a73d-b29a-468e-a914-e018559ce305","name":"t1","description":null,"format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[]}","partitionColumns":[],"createdTime":1705658425682,"configuration":{}}}
{"commitInfo":{"timestamp":1705658425687,"operation":"WRITE","operationParameters":{"mode":"Overwrite"},"clientVersion":"delta-rs.0.17.0"}}

@vrongmeal
Copy link
Contributor Author

Thinking of resetting lance to the latest release (instead of the latest commit) so we can work with existing DeltaObjectStore and wait for any bugs to resolve here.

@vrongmeal
Copy link
Contributor Author

So, the LogStore impl happened before they updated DF. We will need to fix this log thing before we can get this in.

vrongmeal and others added 3 commits January 23, 2024 18:16
Signed-off-by: Vaibhav <vrongmeal@gmail.com>
crates/datasources/src/mongodb/mod.rs Outdated Show resolved Hide resolved
crates/datasources/src/mongodb/mod.rs Outdated Show resolved Hide resolved
@universalmind303 universalmind303 marked this pull request as ready for review January 24, 2024 23:09

# Runs pytest in the tests directory.
pytest *args:
poetry -C tests run pytest --rootdir={{invocation_directory()}}/tests {{ if args == "" {'tests'} else {args} }}
pytest *args:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously this required poetry to be directly installed. Now it properly uses a virtual env.

@universalmind303
Copy link
Contributor

AFAICT, there are 2 main followups for this upgrade.

  • migrate to the new datafusion udf paradigm
  • delta-rs has a registry for objectstores now. we need to figure out how to better integrate with that.

@tychoish
Copy link
Contributor

AFAICT, there are 2 main followups for this upgrade. [...]

can you open tickets for these just so we don't lose track of them?

@universalmind303
Copy link
Contributor

AFAICT, there are 2 main followups for this upgrade. [...]

can you open tickets for these just so we don't lose track of them?

done

@@ -296,7 +296,7 @@ jobs:
tests/.venv/
key: ${{ runner.os }}-poetry-${{ hashFiles('**/Cargo.lock') }}
- name: setup pytest
run: just pytest-setup
run: just venv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because I wanted to keep the virtualenvs separate for the python tests and the bindings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because I wanted to keep the virtualenvs separate for the python tests and the bindings.

I have an idea for this one. Ill open up a followup pr for this shortly.

@universalmind303 universalmind303 merged commit 7d17144 into main Jan 25, 2024
20 checks passed
@universalmind303 universalmind303 deleted the vrongmeal/upgrade branch January 25, 2024 18:35
tychoish pushed a commit that referenced this pull request Feb 1, 2024
🤞

---------

Signed-off-by: Vaibhav <vrongmeal@gmail.com>
Co-authored-by: universalmind303 <cory.grinstead@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants