Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corpora_and_Vector_Spaces tutorial text fix #1116

Merged
merged 1 commit into from
Jan 29, 2017

Conversation

lgmoneda
Copy link
Contributor

@lgmoneda lgmoneda commented Jan 29, 2017

It's just a text fix to make the tutorial more friendly. It's about the words id given by the built dictionary.

Some places assume it'll be the same and explain the output using fixed ids, that might be confusing. like in:

The sparse vector [(0, 1), (1, 1)] therefore reads: in the document “Human computer interaction”, the words computer (id 0) and human (id 1) appear once; the other ten dictionary words appear (implicitly) zero times.

I'm suggesting a change to:

The sparse vector [(word_id, 1), (word_id, 1)] therefore reads: in the document “Human computer interaction”, the words "computer" and "human", identified by an integer id given by the built dictionary, appear once; the other ten dictionary words appear (implicitly) zero times. Check their id at the dictionary displayed in the previous cell and see that they match.

Since the user can check what was the id given to a word in the previous cell's output, where it prints the dictionary.token2id.

And it has too a little warning about the ids being different from runned notebooks, or the link provided for comparison (Quick Example), so beginners won't find it strange.

@tmylk
Copy link
Contributor

tmylk commented Jan 29, 2017

Maybe add a link to HashDictionary for persistent ids?

@lgmoneda
Copy link
Contributor Author

I've read the documentation for hash dictionary, but i think it's not a case of knowing that it's possible to make ids persistent, but warning about the differences a beginner doing the tutorial may experience.

Looking to a text saying "your output is [(0, 1), (1, 1)] and that means..." and having a different output is confusing.

Do you say about putting its link in addition to the modifications suggested? I would just worry about too much information, since i believe the main point is saying "results may change, but that's ok".

@tmylk tmylk merged commit ba37ff3 into piskvorky:develop Jan 29, 2017
@tmylk
Copy link
Contributor

tmylk commented Jan 29, 2017

Agree, thx for the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants