You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Dask interface allows you to train a LightGBM model on data stored in a Dask Array or Dask DataFrame. As part of this, it zips together corresponding chunks into something called "parts". So for example, if the training data are like this:
X: a Dask DataFrame with data for features
y: a Dask Array with labels
Then that interface might zip together the chunk of X with the first 1000 rows and the chunk of y with labels for those 1000 rows.
If this was just features + labels, using tuples would be ok. But it can also include sample weights and, for learning-to-rank tasks, groups. In the future, it might include init_score as well.
The Dask interface should stitch things together into dictionaries instead, keyed with understandable keys like "data", "labels", "group", etc.
Motivation
This change would reduce the risks of mistakes in the Dask interface and would make the code easier to read and change.
References
@ffineis originally implemented this in #3708, but I asked him to pull it out into a separate PR. See #3708 (comment).
The text was updated successfully, but these errors were encountered:
Summary
The Dask interface allows you to train a LightGBM model on data stored in a Dask Array or Dask DataFrame. As part of this, it zips together corresponding chunks into something called "parts". So for example, if the training data are like this:
X
: a Dask DataFrame with data for featuresy
: a Dask Array with labelsThen that interface might zip together the chunk of
X
with the first 1000 rows and the chunk ofy
with labels for those 1000 rows.If this was just features + labels, using tuples would be ok. But it can also include sample weights and, for learning-to-rank tasks, groups. In the future, it might include
init_score
as well.The Dask interface should stitch things together into dictionaries instead, keyed with understandable keys like
"data"
,"labels"
,"group"
, etc.Motivation
This change would reduce the risks of mistakes in the Dask interface and would make the code easier to read and change.
References
@ffineis originally implemented this in #3708, but I asked him to pull it out into a separate PR. See #3708 (comment).
The text was updated successfully, but these errors were encountered: