-
Notifications
You must be signed in to change notification settings - Fork 916
Baseline model notebook and embeddings trainer notebook #47
Conversation
Check out this pull request on ReviewNB: https://app.reviewnb.com/microsoft/nlp/pull/47 Visit www.reviewnb.com to know how we simplify your Jupyter Notebook workflows. |
@AbhiramE @caseyhong @irshaffe @irshaffe @catherine667 Opened this PR to the staging branch. You all have already reviewed this work, but tagging you in case you want to make any more edits! |
We might want to keep the utils in preprocess.py general-purpose
|
@caseyhong @AbhiramE @janhavi13 Just keeping you updated on the comments that correspond to changes to the utils |
We can also have 2 versions like to_lowercase_all(df) and to_lowecase(df, col_names) |
@saidbleik makes sense for the lowercase. for your second point, do you mean we should explicitly enforce the 2 column limit since that is what is actually happening under the hood? so
|
No, I meant the current implementation only applies to token_cols[0] and token_cols[1]. It should allow an arbitrary number of columns. |
@saidbleik - so like loop through the token_cols[] list passed and do the function for each column. Is that correct ? That way it's not restricted to [0] and [1] ? |
yes, the argument is a list (not restricted to 2) |
examples/sentence_similarity/02-model-deep-dive/baseline_deep_dive.ipynb
Outdated
Show resolved
Hide resolved
you guys are doing an amazing job. Sorry I broke your folder structure. If you have any problem, please let me know and I will help :-) I did a pass to the notebooks and will do another pass later |
1. Refactored word2vec loader to perform existing file checks before downloading or extracting. 2. Added units tests to load, download and extract functions.
1. Refactored word2vec loader to perform existing file checks before downloading or extracting. 2. Added units tests to load, download and extract functions.
1. Added methods to download, extract and load glove vectors. 2. Added units tests to test the public methods. Other changes 1. Made download and extract methods private. 2. Refactored Word2vec unit tests to exclude private methods.
1. Added methods to download, extract and load glove vectors. 2. Added units test to test the public method. Other changes 1. Refactored files to add return types to docstrings. 2. Minor changes to path variables.
958922a
to
b1b5ec1
Compare
…ercase as per said's comments
…_words to more than 2 sentences
Preprocess utils
@saidbleik - Take a look at the fixed nltk utils and I also added to_lowercase_all and to_lowercase variation in preprocess.py. Let know if it's good enough. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
No description provided.