-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Conversation
alexholdenmiller
commented
Jun 11, 2018
- still testing the oom code to see if it helps reduce memory spikes, it's based on fairseq's code and was suggested by myle
- changed the vector caches to reference the parlai data path instead of parlai_home
looks like the oom code is working! this clears out pytorch's GPU memory cache whenever it gets oom during the forward and backward pass during training (not when the network weights are being updated and not during validation) and just moves on to the next batch (logging the oom to the metrics and printing a warning) |
trained with batchsize 350 for a bit and was able to catch some spikes during training and continue |
parlai/agents/seq2seq/seq2seq.py
Outdated
@@ -248,23 +249,15 @@ def __init__(self, opt, shared=None): | |||
embs = vocab.GloVe( | |||
name='840B', | |||
dim=300, | |||
cache=os.path.join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just checking: is this the same as
https://github.com/facebookresearch/ParlAI/blob/master/parlai/zoo/glove_vectors/build.py ?
opt = { 'datapath': datapath }
fnames = ['glove.840B.300d.zip']
download_models(opt, fnames, 'glove_vectors', use_model_type=False,
path = "http://nlp.stanford.edu/data")
not clear it is..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably just remove parlai/zoo/glove_vectors right? torchtext has its own code for downloading its vectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is being used by drqa, if you can make drqa work with the other then yes! fine..! be good to get drqa to work with fasttext as well, anyway..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but yes it is the same, download_models uses os.path.join(opt['datapath'], 'models', model_folder)
where model_folder here is the glove_vectors
string in that call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought one might be a binary file and one a text file or something? i guess just check drqa still works please
see comment |