Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] lgb.importance(): Error: R character strings are limited to 2^31-1 bytes #6288

Closed
Chuang1128 opened this issue Jan 27, 2024 · 9 comments

Comments

@Chuang1128
Copy link

Hello!

I built a lightgbm model, and then I used the lgb.importance(model).
finally, the r show the Error: R character strings are limited to 2^31-1 bytes.

how do I solve this error? Thank you!

@jameslamb
Copy link
Collaborator

Thanks for using LightGBM.

An error message alone is not enough information for us to help you. Please provide the following:

  • version of R
  • version of {lightgbm}
  • how you installed LightGBM
  • operating system
  • output of running sessionInfo() in your R session (if possible)
  • a minimal, reproducible example that generates this error (docs with some guidance on that)

Here's an example of how to create a reproducible example for the R package: #4721 (comment)

@jameslamb jameslamb changed the title lgb.importance(): Error: R character strings are limited to 2^31-1 bytes [R-package] lgb.importance(): Error: R character strings are limited to 2^31-1 bytes Jan 27, 2024
@Chuang1128
Copy link
Author

version of R: 4.3.2

version of {lightgbm}: 3.3.5

how you installed LightGBM: Tool>Install package

operating system: windows

output of running sessionInfo() in your R session (if possible)
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default

@jameslamb
Copy link
Collaborator

jameslamb commented Jan 27, 2024

Thank you for that!

Tool>Install package

What does this mean?

version of {lightgbm}: 3.3.5

Can you please update to the latest version (v4.3.0) from CRAN and try again?

install.packages("lightgbm", repos = "https://cran.r-project.org")

And after that...we still won't be able to help much without a minimal, reproducible example.

@Chuang1128
Copy link
Author

my lightgbm package version is 3.3.5
it is still have an error. and I will try reproducible example

@Chuang1128
Copy link
Author

train model

model = lgb.train(params = list(objective = "regression",
num_iterations = 100,
metric = "l2",
min_data = 1L,
min_data_in_bin=100,
min_gain_to_split = 10),
data = train,
nrounds = 100))

lgb_imp <- lgb.importance(model)

otherwise, when I decrease the num_iterations, the will not show the error.

@jameslamb
Copy link
Collaborator

my lightgbm package version is 3.3.5

Sorry if my placement was confusing. I'm asking what "Tool>Install package" means. Are those buttons you're clicking in an application? If so, what application?

train model

Thanks for this! But it is not a reproducible example.

Crucially... what does train contain? Much of LightGBM's behavior (like any machine learning framework) is dependent on the size, shape, and distribution of the input data.

For example, based only on the error message you've provided, I can think of a few possibilities:

  • your data has features with huge feature names
  • your data has a very large number of rows
  • you have a very large number of features

If you can't provide a reproducible example, can you please at least show the code you used to construct train? Including any code for reading in data from files, databases, etc.

And report the size of the dataset (number of rows, number of columns, exact feature names if there are any).

@Chuang1128
Copy link
Author

yes, I click the buttons in R studio to install the lightgbm package.

data <- readRDS("D:/data.rds")
dtrain <- lgb.Dataset(data=as.matrix(data[,-1]), label = data[,1])
model = lgb.train(params = list(objective = "regression",
num_iterations = 100,
metric = "l2",
min_data = 1L,
min_data_in_bin=100,
min_gain_to_split = 10),
data = train,
nrounds = 100))

lgb_imp <- lgb.importance(model)

my dataset:
number of rows: 4610000
number of columns: 21

@p-schaefer
Copy link

p-schaefer commented Mar 7, 2024

Hi @jameslamb, I'm having the same issue when running lightgbm::lgb.model.dt.tree(lgb_model) and I believe the culprit is here lgb.dump(booster = model, num_iteration = num_iteration). My dataset is also very large with a large number of rows and columns, and is fit with a complex model (i.e., not easy to share).

It looks like lgb.dump is trying to return a single long character string, when num_iteration = NULL, I wonder if instead it could be iterated into a large list or something? i.e., lapply(1:booster$current_iter(), booster$dump_model)

Edit: Never mind, the above won't work because dump_model will return everything up to the selected iteration, not just the selected iteration. For context, I'm trying to run this through treeshap::unify()

@jameslamb
Copy link
Collaborator

I'm closing this in favor of #6380, which describes the same problem thoroughly with a reproducible example. Let's please focus there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants