Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature numbering starting from 0 #3638

Closed
alex3s opened this issue Aug 27, 2018 · 3 comments
Closed

feature numbering starting from 0 #3638

alex3s opened this issue Aug 27, 2018 · 3 comments

Comments

@alex3s
Copy link

alex3s commented Aug 27, 2018

featmap.txt has features numbered 0 to 125:

0 cap-shape=bell i
125 habitat=woods i

but the train and test data has features numbered up to 126

For example row 3207 in agaricus.txt.train
0 4:1 7:1 14:1 21:1 29:1 34:1 36:1 39:1 49:1 54:1 55:1 65:1 69:1 75:1 82:1 88:1 92:1 95:1 102:1 106:1 119:1 126:1

This seems to result incorrectly name the features when performing task=dump of an otherwise correctly performing model.

@hcho3
Copy link
Collaborator

hcho3 commented Aug 28, 2018

It looks like mapfeat.py produces a LIBSVM file with 1-based indexing: a894ab6

@alex3s
Copy link
Author

alex3s commented Aug 28, 2018

I think the solution could be to include

0 target int

in the agaricus demo featmap.txt

So the whole file will look like this:

0	target	int
1	cap-shape=bell i
:
126	habitat=woods i

I have tested it on other datasets, it works this way, for me at least! 👍

@hcho3
Copy link
Collaborator

hcho3 commented Aug 28, 2018

Target is not a feature, so I don't think we should include in the feature map. Let me submit a pull request to produce LIBSVM file with 0-based indexing, so that feature index starts with 0.

hcho3 added a commit to hcho3/xgboost that referenced this issue Aug 30, 2018
CodingCat pushed a commit to CodingCat/xgboost that referenced this issue Sep 18, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Nov 28, 2018
alois-bissuel pushed a commit to criteo-forks/xgboost that referenced this issue Dec 4, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants