[R-package] lgb.importance(): Error: R character strings are limited to 2^31-1 bytes #6288

Chuang1128 · 2024-01-27T04:38:01Z

Hello!

I built a lightgbm model, and then I used the lgb.importance(model).
finally, the r show the Error: R character strings are limited to 2^31-1 bytes.

how do I solve this error? Thank you!

jameslamb · 2024-01-27T04:45:12Z

Thanks for using LightGBM.

An error message alone is not enough information for us to help you. Please provide the following:

version of R
version of {lightgbm}
how you installed LightGBM
operating system
output of running sessionInfo() in your R session (if possible)
a minimal, reproducible example that generates this error (docs with some guidance on that)

Here's an example of how to create a reproducible example for the R package: #4721 (comment)

Chuang1128 · 2024-01-27T05:02:04Z

version of R: 4.3.2

version of {lightgbm}: 3.3.5

how you installed LightGBM: Tool>Install package

operating system: windows

output of running sessionInfo() in your R session (if possible)
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default

jameslamb · 2024-01-27T05:08:13Z

Thank you for that!

Tool>Install package

What does this mean?

version of {lightgbm}: 3.3.5

Can you please update to the latest version (v4.3.0) from CRAN and try again?

install.packages("lightgbm", repos = "https://cran.r-project.org")

And after that...we still won't be able to help much without a minimal, reproducible example.

Chuang1128 · 2024-01-27T05:21:01Z

my lightgbm package version is 3.3.5
it is still have an error. and I will try reproducible example

Chuang1128 · 2024-01-27T05:32:13Z

train model

model = lgb.train(params = list(objective = "regression",
num_iterations = 100,
metric = "l2",
min_data = 1L,
min_data_in_bin=100,
min_gain_to_split = 10),
data = train,
nrounds = 100))

lgb_imp <- lgb.importance(model)

otherwise, when I decrease the num_iterations, the will not show the error.

jameslamb · 2024-01-27T05:56:47Z

my lightgbm package version is 3.3.5

Sorry if my placement was confusing. I'm asking what "Tool>Install package" means. Are those buttons you're clicking in an application? If so, what application?

train model

Thanks for this! But it is not a reproducible example.

Crucially... what does train contain? Much of LightGBM's behavior (like any machine learning framework) is dependent on the size, shape, and distribution of the input data.

For example, based only on the error message you've provided, I can think of a few possibilities:

your data has features with huge feature names
your data has a very large number of rows
you have a very large number of features

If you can't provide a reproducible example, can you please at least show the code you used to construct train? Including any code for reading in data from files, databases, etc.

And report the size of the dataset (number of rows, number of columns, exact feature names if there are any).

Chuang1128 · 2024-01-27T06:01:49Z

yes, I click the buttons in R studio to install the lightgbm package.

data <- readRDS("D:/data.rds")
dtrain <- lgb.Dataset(data=as.matrix(data[,-1]), label = data[,1])
model = lgb.train(params = list(objective = "regression",
num_iterations = 100,
metric = "l2",
min_data = 1L,
min_data_in_bin=100,
min_gain_to_split = 10),
data = train,
nrounds = 100))

lgb_imp <- lgb.importance(model)

my dataset:
number of rows: 4610000
number of columns: 21

p-schaefer · 2024-03-07T14:14:12Z

Hi @jameslamb, I'm having the same issue when running lightgbm::lgb.model.dt.tree(lgb_model) and I believe the culprit is here lgb.dump(booster = model, num_iteration = num_iteration). My dataset is also very large with a large number of rows and columns, and is fit with a complex model (i.e., not easy to share).

It looks like lgb.dump is trying to return a single long character string, when num_iteration = NULL, I wonder if instead it could be iterated into a large list or something? i.e., lapply(1:booster$current_iter(), booster$dump_model)

Edit: Never mind, the above won't work because dump_model will return everything up to the selected iteration, not just the selected iteration. For context, I'm trying to run this through treeshap::unify()

jameslamb · 2024-03-22T14:12:00Z

I'm closing this in favor of #6380, which describes the same problem thoroughly with a reproducible example. Let's please focus there.

jameslamb added the r-package label Jan 27, 2024

jameslamb changed the title ~~lgb.importance(): Error: R character strings are limited to 2^31-1 bytes~~ [R-package] lgb.importance(): Error: R character strings are limited to 2^31-1 bytes Jan 27, 2024

jameslamb added question awaiting response labels Jan 27, 2024

github-actions bot removed the awaiting response label Jan 27, 2024

jameslamb added the awaiting response label Jan 27, 2024

github-actions bot removed the awaiting response label Jan 27, 2024

This was referenced Mar 8, 2024

LightGBM categoricals and shap_interactions shap/shap#1644

Open

lightgbm.unify() fails when model or data are too big ModelOriented/treeshap#37

Open

jameslamb mentioned this issue Mar 22, 2024

[R-package] lightgbm::lgb.model.dt.tree() error caused by lgb.dump() error with large models #6380

Open

jameslamb closed this as completed Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R-package] lgb.importance(): Error: R character strings are limited to 2^31-1 bytes #6288

[R-package] lgb.importance(): Error: R character strings are limited to 2^31-1 bytes #6288

Chuang1128 commented Jan 27, 2024

jameslamb commented Jan 27, 2024

Chuang1128 commented Jan 27, 2024

jameslamb commented Jan 27, 2024 •

edited

Loading

Chuang1128 commented Jan 27, 2024

Chuang1128 commented Jan 27, 2024

jameslamb commented Jan 27, 2024

Chuang1128 commented Jan 27, 2024

p-schaefer commented Mar 7, 2024 •

edited

Loading

jameslamb commented Mar 22, 2024

[R-package] lgb.importance(): Error: R character strings are limited to 2^31-1 bytes #6288

[R-package] lgb.importance(): Error: R character strings are limited to 2^31-1 bytes #6288

Comments

Chuang1128 commented Jan 27, 2024

jameslamb commented Jan 27, 2024

Chuang1128 commented Jan 27, 2024

jameslamb commented Jan 27, 2024 • edited Loading

Chuang1128 commented Jan 27, 2024

Chuang1128 commented Jan 27, 2024

train model

jameslamb commented Jan 27, 2024

Chuang1128 commented Jan 27, 2024

p-schaefer commented Mar 7, 2024 • edited Loading

jameslamb commented Mar 22, 2024

jameslamb commented Jan 27, 2024 •

edited

Loading

p-schaefer commented Mar 7, 2024 •

edited

Loading