Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support of HDFS as remote object store #1060

Closed
yahoNanJing opened this issue Sep 28, 2021 · 8 comments
Closed

Add support of HDFS as remote object store #1060

yahoNanJing opened this issue Sep 28, 2021 · 8 comments
Labels
enhancement New feature or request

Comments

@yahoNanJing
Copy link
Contributor

yahoNanJing commented Sep 28, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently, we can only read parquet files from local file system. It would be nice to add support to read parquet files that reside on HDFS.

Describe the solution you'd like

Describe alternatives you've considered

Additional context
Add any other context or screenshots about the feature request here.

@yahoNanJing
Copy link
Contributor Author

yahoNanJing commented Sep 28, 2021

@yjshen could you help review the related PR and give some comments?

@yjshen
Copy link
Member

yjshen commented Sep 28, 2021

@yahoNanJing, Great to see this! Previously, we have some discussions at #907. Please also check the discussions there for more backgrounds.

@yjshen
Copy link
Member

yjshen commented Sep 28, 2021

Besides, I have a partially implemented repo for native HDFS Rust binding, based on libhdfs3 originates from HAWQ. It will be great we can cooperate to make this happen.

https://github.com/yjshen/hdfs-native
The repo provides a wrapper of libhdfs3, which also originates from hdfs-rs as your fs-hdfs. I also have a HDFS object-store impl in src/hdfs_store.rs for fast development iteration.

@yjshen
Copy link
Member

yjshen commented Sep 28, 2021

Per discussions in #907, a more preferable way for the community may be to put connectors such as S3, HDFS in their own repositories for fast development iterations, reduce unnecessarily dependencies and keep compilation time reasonable.

@yahoNanJing
Copy link
Contributor Author

Agree to have a separate crate from each remote storage. Later will refine this PR and make it for the HDFS related crate.

@zuston
Copy link
Member

zuston commented Jun 15, 2023

Besides, I have a partially implemented repo for native HDFS Rust binding, based on libhdfs3 originates from HAWQ. It will be great we can cooperate to make this happen.

https://github.com/yjshen/hdfs-native The repo provides a wrapper of libhdfs3, which also originates from hdfs-rs as your fs-hdfs. I also have a HDFS object-store impl in src/hdfs_store.rs for fast development iteration.

Great work for https://github.com/yjshen/hdfs-native. Is it available for production env?

@drauschenbach
Copy link
Contributor

Obsoleted by #1062 where consensus led to this feature landing in https://github.com/datafusion-contrib/hdfs-native-object-store.

@andygrove
Copy link
Member

Closing this. Thanks @drauschenbach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants