Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stopwords inconsistency #213

Closed
arnicas opened this issue Feb 3, 2015 · 2 comments
Closed

Stopwords inconsistency #213

arnicas opened this issue Feb 3, 2015 · 2 comments

Comments

@arnicas
Copy link

arnicas commented Feb 3, 2015

I'm a little flummoxed by your stopwords used in Tf-Idf - you've got "he" but not "she"? Immediately noticeable in my data set trial... Would love to make it optional to pass in a custom stoplist or not use one here.

@snellingio
Copy link

Maybe we should update the English list that Chris made in 2011. At the very least, it's not complicated to swap out in your own code.

https://github.com/NaturalNode/natural/blob/master/lib/natural/util/stopwords.js

@kkoch986
Copy link
Member

kkoch986 commented Feb 4, 2015

That is the stopword list used by tfidf, I think that list needs to be fixed up a bit. It wouldn't be hard I'm just tied up in a few other things so its not top of my list currently.

I also agree that it would be nice to make that part of tf idf optional and separate it a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants