Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[Numpy] Refactor Roberta #1269

Merged
merged 11 commits into from
Jul 21, 2020
Merged

[Numpy] Refactor Roberta #1269

merged 11 commits into from
Jul 21, 2020

Conversation

zheyuye
Copy link
Member

@zheyuye zheyuye commented Jul 17, 2020

Description

Refactor reoberta model following apache/mxnet#18717

Changes

  • Refactor reoberta and maintain consistency in model structure and external APIS with other pretrained models
  • Maintain consistency in the interface of tokenizers
  • Store the converted weights sperately for backbone and masked language model
  • Revise run_squad.py to support roberta fine-tuning

Comments

@sxjscience @hymzoque

@codecov
Copy link

codecov bot commented Jul 17, 2020

Codecov Report

Merging #1269 into numpy will decrease coverage by 0.11%.
The diff coverage is 86.08%.

Impacted file tree graph

@@            Coverage Diff             @@
##            numpy    #1269      +/-   ##
==========================================
- Coverage   82.53%   82.42%   -0.12%     
==========================================
  Files          38       38              
  Lines        5446     5491      +45     
==========================================
+ Hits         4495     4526      +31     
- Misses        951      965      +14     
Impacted Files Coverage Δ
src/gluonnlp/models/mobilebert.py 81.35% <ø> (ø)
src/gluonnlp/models/roberta.py 88.78% <84.09%> (-4.87%) ⬇️
src/gluonnlp/models/xlmr.py 86.88% <88.23%> (-1.12%) ⬇️
src/gluonnlp/data/tokenizers.py 77.71% <100.00%> (ø)
src/gluonnlp/models/albert.py 96.68% <100.00%> (+0.01%) ⬆️
src/gluonnlp/models/bert.py 84.42% <100.00%> (ø)
src/gluonnlp/models/electra.py 78.94% <100.00%> (+0.07%) ⬆️

@leezu
Copy link
Contributor

leezu commented Jul 17, 2020

It's unclear what part of the PR is related to apache/mxnet#18717 and what part are unrelated changes. I suggest to focus on making the apache/mxnet#18717 feature available in MXNet instead of making the GluonNLP code more complex to workaround the missing feature. What do you think?

Is there any overlap between the weights "Store the converted weights sperately for backbone and masked language model"?

@szha szha requested a review from leezu July 17, 2020 19:25
Copy link
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you cite apache/mxnet#18717 could you please check if apache/mxnet#18749 addresses the feature request and if this PR should be adapted? Thanks

@zheyuye
Copy link
Member Author

zheyuye commented Jul 18, 2020

@leezu There is a huge overlap between 'model.param' and 'model-mlm.params' which could be eliminated by apache/mxnet#18749 by only storing 'model-mlm.params' and load it with ignore_extra=True for backbone model. It can be done in a separate PR for roberta as well as other pretrain models stored in sperate parameters if this solution is reasonable for you.

In addition, this PR also solves the problem that the current roberta model does not handle MLM task properly. That is, the MLM model takes the same input (input_ids, valid_length) as backbone without marking which positions are masked as

mlm_features = select_vectors_by_position(F, contextual_embeddings, masked_positions)
and offical implementation

Copy link
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocked by apache/mxnet#18749

@leezu
Copy link
Contributor

leezu commented Jul 20, 2020

Thanks @zheyuye. Yes, I recommend to adding features in the MXNet side instead of adopting a problematic workaround for GluonNLP

@zheyuye
Copy link
Member Author

zheyuye commented Jul 20, 2020

Thanks @zheyuye. Yes, I recommend to adding features in the MXNet side instead of adopting a problematic workaround for GluonNLP

Alright, I am going to combine backbone and it's mlm models together in this PR once
apache/mxnet#18749 merged

@sxjscience
Copy link
Member

@leezu Why is it blocked by apache/mxnet#18749 ?

Copy link
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sxjscience it adds a workaround for a missing feature in MXNet. We should improve MXNet serialization format instead of adding workarounds. In any case, let's add the workaround now and remove it later again.

@sxjscience sxjscience merged commit 3a0ed9f into dmlc:numpy Jul 21, 2020
@zheyuye zheyuye deleted the roberta branch July 21, 2020 07:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working numpyrefactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants