Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] handle arbitrary length feature names in Python-package #4293

Merged
merged 2 commits into from
May 21, 2021

Conversation

StrikerRUS
Copy link
Collaborator

Refer to #4256 (comment).

@@ -37,6 +39,8 @@ def test_basic(tmp_path):
if i % 10 == 0:
print(bst.eval_train(), bst.eval_valid())

assert train_data.get_feature_name() == feature_names
Copy link
Collaborator Author

@StrikerRUS StrikerRUS May 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current master fails here with

BufferError: Allocated feature name buffer size (255) wasinferior to the needed size (1001).

BTW, I overlooked bad string construction in #4143 🙁

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I overlooked bad string construction in #4143

ah!! I see now

required_string_vuffer_size.value

Sorry about that. I must have looked at that diff 10 times and never noticed that mistake (probably my mistake). I think we should consider adding pylint in CI. I see this error running pylint python-package/ | grep vuffer.

python-package/lightgbm/basic.py:3462:124: E0602: Undefined variable 'required_string_vuffer_size' (undefined-variable)
python-package/build/lib/lightgbm/basic.py:3460:124: E0602: Undefined variable 'required_string_vuffer_size' (undefined-variable)

I mentioned that last year and then forgot about it 😬 #3377 (comment). I should have written up an issue...I'll do that right now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just created #4308

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! Very nice catch! I haven't noticed this too 🙁 .

I was speaking about "no whitespace between words" issue.

BufferError: Allocated feature name buffer size (255) wasinferior to the needed size (1001).
                                                   ------^------

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha oh! Then two mistakes 😆

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you! The tests make sense to me, and I think these calls are correct. I got a bit confused about the difference between actual_*, required_*, and reserved_*, but I can't think of better names.

I compared these changes to the equivalent code in c_api and in R-package/src/lightgbm_R.cpp, and they all seem to match (except for LGBM_BoosterGetFeatureNames(), which is not used in the R package today).

@StrikerRUS StrikerRUS merged commit 237ac29 into master May 21, 2021
@StrikerRUS StrikerRUS deleted the buff_size branch May 21, 2021 12:19
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants