Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Delta Lake table format both source and sink #710

Open
andrei-ionescu opened this issue Aug 7, 2024 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@andrei-ionescu
Copy link

andrei-ionescu commented Aug 7, 2024

This can be implemented using the Delta-RS library.

I've also seen that in the documentation there is a Delta Lake Sink connector — https://doc.arroyo.dev/connectors/delta — but I couldn't find it in this repository. Where can I find the Delta Lake Sink connector? If it's under another connector should we make it a first-class citizen?

@mwylde
Copy link
Member

mwylde commented Aug 7, 2024

Adding support for delta as a source would be great! The delta connector is implemented on top of the filesystem connector, since most of the complexity is in consistently writing the data to S3 (see https://www.arroyo.dev/blog/streaming-to-s3-is-hard), not handling the delta metadata.

Most of the delta code is here: https://github.com/ArroyoSystems/arroyo/blob/master/crates/arroyo-connectors/src/filesystem/sink/delta.rs. It's integrated into the filesystem connector's two-phase commit handler in https://github.com/ArroyoSystems/arroyo/blob/master/crates/arroyo-connectors/src/filesystem/sink/mod.rs.

@mwylde mwylde added the enhancement New feature or request label Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants