Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up variable names #2888

Closed
mpenkov opened this issue Jul 19, 2020 · 7 comments
Closed

Clean up variable names #2888

mpenkov opened this issue Jul 19, 2020 · 7 comments
Labels
housekeeping internal tasks and processes

Comments

@mpenkov
Copy link
Collaborator

mpenkov commented Jul 19, 2020

I think go with (a) now and open up a ticket to clean up names globally later. This PR is already large enough.

Originally posted by @mpenkov in https://github.com/_render_node/MDExOlB1bGxSZXF1ZXN0MzQ5Mjk1NTk1/timeline/more_items

@mpenkov mpenkov added the housekeeping internal tasks and processes label Jul 19, 2020
@piskvorky
Copy link
Owner

piskvorky commented Jul 26, 2020

The link in the description doesn't work, but I believe this is related to the discussion in #2698 around *2vec attributes: .w2v vs .word_vectors, .docvecs vs .tag_vectors, etc.

@piskvorky piskvorky added this to the *vec aftermath milestone Jul 26, 2020
@mpenkov
Copy link
Collaborator Author

mpenkov commented Aug 16, 2020

Yeah, looks like the github UI messed up the link. Your reference is correct.

@piskvorky
Copy link
Owner

piskvorky commented Sep 7, 2020

@mpenkov @gojomo what's the conclusion? I'll have some time this week (I'm on holidays) so I'd like to finish as much as I can, to unblock 4.0.0.

Should I rename the variables?

Pros:

  • clearer, more explicit naming
  • if we do it, 4.0.0 is the place to do it

Cons:

  • breaks existing code
  • lots of tedious replacements in code, docs, tutorials
  • possible divergence from other implementations (facebook, google)… although that may be a plus too.

@mpenkov
Copy link
Collaborator Author

mpenkov commented Sep 8, 2020

We can do this at any time, so if your priority is to unblock 4.0.0, then it's probably more productive and rewarding to do something else.

This sort of renaming can happen at any time in the future, right?

@piskvorky
Copy link
Owner

piskvorky commented Sep 8, 2020

  • breaks existing code
  • if we do it, 4.0.0 is the place to do it

One of the things most suitable for a major release bump.

@gojomo
Copy link
Collaborator

gojomo commented Sep 8, 2020

Agree that it's better to do core naming cleanup in fewer releases and correlated with 'major' increments – such that when people notice old code doesn't work (or generates new warnings), they check some release-notes, notice all the changes required, make them in one batch-of-effort, then are OK for as-long-as-possible through future releases.

My proposal remains to leave the attributes on the *2Vec models as .wv for the subcomponent word-vectors object on all classes, and .dv for the parallel subcomponent doc-vectors object on a Doc2Vec model. Yes, it's a small deviation from the usual Pythonic 'be explicit & avoid abbreviations'. But the abbreviations are pretty clear & compact & save a lot of typing/horizontal space in very-frequent problem-domain operations.

That is, these are central enough to warrant bootstrapping a convention/library-specific lingo. I don't see it as too different than having embraced 'doc' instead of 'document', and 'vec' instead of 'vector', and 'hs' instead of 'hierarchical_softmax', and so forth. And, I don't look forward to all the lines of library or demo/tutorial code that would start to need wrapping if all .wv uses change to .word_vectors.

But, renaming them .word_vectors and .doc_vectors (or perhaps to be pedantically-consistent with other internal naming, .tag_vectors instead of .doc_vectors) would also be defensible, if someone wants to do the big search-and-replace.

@piskvorky
Copy link
Owner

piskvorky commented Sep 9, 2020

Let me close this. Why? I'm thinking:

  1. I doubt we'll get to it in time for 4.0.0 (and doing it later would be bad);
  2. (at least) @gojomo is not in favour;
  3. it's a lot of work that breaks existing code for aesthetic reasons = not critical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
housekeeping internal tasks and processes
Projects
None yet
Development

No branches or pull requests

3 participants