Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DeBERTa model #8227

Merged
merged 17 commits into from
Apr 11, 2024
Merged

Add DeBERTa model #8227

merged 17 commits into from
Apr 11, 2024

Conversation

w5688414
Copy link
Contributor

@w5688414 w5688414 commented Apr 3, 2024

PR types

New features

  • deepset/deberta-v3-large-squad2
  • microsoft/deberta-v2-xlarge
  • microsoft/deberta-v3-base
  • microsoft/deberta-v3-large
  • microsoft/deberta-base

PR changes

Description

borrow from previous PR: #5414

import numpy as np
import paddle
import torch
from paddlenlp.transformers import DebertaV2Tokenizer

def test_precision(model_name):
    pp_model = PaddleDebertaModel.from_pretrained(model_name)
    # pp_model = PaddleDebertaModel.from_pretrained(model_name.split('/')[-1])
    

    hf_model = HuggingfaceModel.from_pretrained(model_name)
    input_ids = np.random.randint(1, 1000, size=(2, 10))
    pp_inputs = paddle.to_tensor(input_ids)
    hf_inputs = torch.tensor(input_ids)
    pp_model.eval()
    hf_model.eval()
    with paddle.no_grad():
        pp_output = pp_model(pp_inputs, output_hidden_states=True, return_dict=True)
    with torch.no_grad():
        hf_output = hf_model(hf_inputs, output_hidden_states=True)

    if "start_logits" in hf_output.keys():
        for key in ['start_logits', 'end_logits']:
            diff = abs(hf_output[key].detach().numpy() - pp_output[key].numpy())
            print(f"{key} max diff: {np.max(diff)}, min diff: {np.min(diff)}")

    for i in range(pp_model.config.num_hidden_layers + 1):
        diff = abs(hf_output["hidden_states"][i].detach().numpy() - pp_output["hidden_states"][i].numpy())

        print(f"layer {i} max diff: {np.max(diff)}, min diff: {np.min(diff)}")


from transformers import AutoModelForQuestionAnswering as HuggingfaceModel
from paddlenlp.transformers import DebertaV2ForQuestionAnswering as PaddleDebertaModel
model_name = "deepset/deberta-v3-large-squad2"
test_precision(model_name)

output is:

start_logits max diff: 5.0067901611328125e-06, min diff: 1.862645149230957e-08
end_logits max diff: 3.3080577850341797e-06, min diff: 8.940696716308594e-08
layer 0 max diff: 9.5367431640625e-07, min diff: 0.0
layer 1 max diff: 2.86102294921875e-06, min diff: 0.0
layer 2 max diff: 4.291534423828125e-06, min diff: 0.0
layer 3 max diff: 7.152557373046875e-06, min diff: 0.0
layer 4 max diff: 5.7220458984375e-06, min diff: 0.0
layer 5 max diff: 6.198883056640625e-06, min diff: 0.0
layer 6 max diff: 8.106231689453125e-06, min diff: 0.0
layer 7 max diff: 6.67572021484375e-06, min diff: 0.0
layer 8 max diff: 6.198883056640625e-06, min diff: 0.0
layer 9 max diff: 8.106231689453125e-06, min diff: 0.0
layer 10 max diff: 1.0728836059570312e-05, min diff: 0.0
layer 11 max diff: 9.775161743164062e-06, min diff: 0.0
layer 12 max diff: 1.1086463928222656e-05, min diff: 0.0
layer 13 max diff: 9.298324584960938e-06, min diff: 0.0
layer 14 max diff: 8.106231689453125e-06, min diff: 0.0
layer 15 max diff: 1.3113021850585938e-05, min diff: 0.0
layer 16 max diff: 1.2874603271484375e-05, min diff: 0.0
layer 17 max diff: 3.4332275390625e-05, min diff: 0.0
layer 18 max diff: 1.9073486328125e-05, min diff: 0.0
layer 19 max diff: 1.1682510375976562e-05, min diff: 0.0
layer 20 max diff: 1.52587890625e-05, min diff: 0.0
layer 21 max diff: 2.384185791015625e-05, min diff: 0.0
layer 22 max diff: 2.5510787963867188e-05, min diff: 0.0
layer 23 max diff: 3.337860107421875e-05, min diff: 0.0
layer 24 max diff: 1.71661376953125e-05, min diff: 0.0

其中模型的参数是fp16,会产生一些微小的差别,是由于torch是基于fp32加载的(变成fp16会报错,有算子不支持),paddle是基于fp16加载的,计算出来的结果会稍有不同

加入文档:
image

跟huggingface的源代码有如下两个区别:

这两个算子不影响推理,可能会影响训练对齐。

Copy link

paddle-bot bot commented Apr 3, 2024

Thanks for your contribution!

@w5688414 w5688414 requested a review from sijunhe April 3, 2024 09:23
@w5688414 w5688414 self-assigned this Apr 3, 2024
Copy link

codecov bot commented Apr 11, 2024

Codecov Report

Attention: Patch coverage is 72.99509% with 495 lines in your changes are missing coverage. Please review.

Project coverage is 55.23%. Comparing base (7b493a8) to head (78af468).
Report is 9 commits behind head on develop.

Files Patch % Lines
paddlenlp/transformers/deberta_v2/modeling.py 65.83% 234 Missing ⚠️
paddlenlp/transformers/deberta/modeling.py 76.26% 155 Missing ⚠️
paddlenlp/transformers/deberta_v2/tokenizer.py 62.17% 101 Missing ⚠️
paddlenlp/transformers/deberta/tokenizer.py 97.36% 4 Missing ⚠️
paddlenlp/transformers/deberta_v2/configuration.py 97.36% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8227      +/-   ##
===========================================
+ Coverage    55.15%   55.23%   +0.08%     
===========================================
  Files          601      609       +8     
  Lines        91764    94218    +2454     
===========================================
+ Hits         50611    52040    +1429     
- Misses       41153    42178    +1025     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@w5688414 w5688414 requested a review from JunnYu April 11, 2024 03:22
@sijunhe sijunhe merged commit 814e9c4 into PaddlePaddle:develop Apr 11, 2024
7 of 10 checks passed
@seetimee
Copy link

请问能加入中文版的deberta吗?

@w5688414
Copy link
Contributor Author

中文哪个版本?

@seetimee
Copy link

好像只有二郎神的v2

@w5688414
Copy link
Contributor Author

可以给出对应的hf的deberta链接

@seetimee
Copy link

@w5688414
Copy link
Contributor Author

欢迎开发者贡献:

def _get_name_mappings(cls, config):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants