-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export Python Interface for external memory. #7070
Export Python Interface for external memory. #7070
Conversation
d9ccd05
to
7893d66
Compare
Codecov Report
@@ Coverage Diff @@
## master #7070 +/- ##
==========================================
+ Coverage 81.60% 82.58% +0.98%
==========================================
Files 13 13
Lines 3903 3962 +59
==========================================
+ Hits 3185 3272 +87
+ Misses 718 690 -28
Continue to review full report at Codecov.
|
6506bd6
to
72e5b35
Compare
200dc2a
to
54b4610
Compare
711e389
to
ce37278
Compare
Running it multiple times to see if removing dmlc parser can fix the Ran 5 times so far. Seems fine. Will continue monitoring in the future. |
d8eea6b
to
53f0361
Compare
46aa916
to
2fbdab5
Compare
""" | ||
_T = TypeVar("_T") | ||
|
||
def __init__(self, cache_prefix: Optional[str] = None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache_prefix
can be a parameter for DMatrix
instead of DataIter
. I don't have a strong preference for the choice. But do note that it's useful since users might have a URI that's not a local file path so we can't drop the parameter.
This is the final PR of the original Sparse DMatrix rewrite. The origin description is kept in the below section. This PR exposes Python API for data iterator. Other than the interface, this PR also finalizes documents, examples and tests.
Old description
This is a proof of concept for using iterative DMatrix style callback to handle external memory. Also, the data iter in Python can now handle CPU data and be used by CPU-based algorithms. For details:
** todos **