Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support parquet or other binary format for external memory #6719

Closed
PeterPanonGit opened this issue Feb 21, 2021 · 2 comments
Closed

Support parquet or other binary format for external memory #6719

PeterPanonGit opened this issue Feb 21, 2021 · 2 comments

Comments

@PeterPanonGit
Copy link

Currently XGBoost external memory only support libsvm and csv file as input file. However these types of text files are really slow to write and read when the data size is large. Is there a roadmap for xgboost to support more efficient data input format for external memory such as parquet, hdf5, etc?

@trivialfis
Copy link
Member

trivialfis commented Feb 21, 2021

I'm trying to move the external memory support to 3 party libraries by making DMatrix support iterators, an implementation for GPU is available https://github.com/dmlc/xgboost/blob/master/demo/guide-python/data_iterator.py . Others still need some more work.

@trivialfis
Copy link
Member

Please check out the new iterator interface, with which you can load the data with any library you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants