-
Notifications
You must be signed in to change notification settings - Fork 15.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature : Support for Hugging Face streaming dataset #3501
Feature : Support for Hugging Face streaming dataset #3501
Conversation
Thanks for the PR! I believe @eyurtsev is working on an iterable data loading interface - we'll try to make sure this fits in with what we're building towards before landing |
Make sense, thank you! |
A lazy iteration method was added to the interface: #3659 Notable change is that this method does not have a text splitter. I'm thinking about adding text splitting and batching via a composition, so it won't be included by default when doing lazy iteration. For context it's a part the following sequence of changes: #2833 (comment) |
Actually, it makes a lot of sense to have a new method for handling lazy loading and to implement lazy loading for all iterable loaders, not just Hugging Face. Thanks for that! |
lazy load now exists on the interface, if anyone wants can also pass through the streaming flag to allow loading without the download |
#2864
Enhanced the Hugging Face loader to support streaming-enabled datasets, expanding its capabilities.