Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add blenderbot_small, blenderbot #868

Merged
merged 18 commits into from
Oct 25, 2021
Merged

add blenderbot_small, blenderbot #868

merged 18 commits into from
Oct 25, 2021

Conversation

kevinng77
Copy link
Contributor

PR types

New Features

PR changes

Models

Description

Add Blenderbot, Blendersmall models in paddlenlp/transformers/.

@CLAassistant
Copy link

CLAassistant commented Aug 10, 2021

CLA assistant check
All committers have signed the CLA.

@yingyibiao yingyibiao self-assigned this Aug 10, 2021
@ZeyuChen ZeyuChen requested a review from yingyibiao August 10, 2021 04:44
@yingyibiao
Copy link
Contributor

Please install pre-commit for code style formatting as follows:

  1. pip install pre-commit
  2. pre-commit install
  3. make some format changes to all your files in order to commit (e.g.: add some space or blank line)
  4. commit your change

@yingyibiao
Copy link
Contributor

Please add docstrings for all your classes and methods which might be utilized by users. You can refer to paddlenlp.transformers.bert.

paddlenlp/transformers/blenderbot/tokenizer.py Outdated Show resolved Hide resolved
Comment on lines 89 to 94
"""
Format of Blenderbot sequence: ``X </s>``
:param token_ids_0: List[int]
:param token_ids_1: List[int], optional
:return: List[int]
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use Google Style docstrings.
Refer to Bert Model for reference.

bpe_tokens.extend(
bpe_token for bpe_token in self.bpe(token).split(' '))
return bpe_tokens

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also included the public tokenize method.

]


# Copied from .paddlenlp.transformers.bart.modeling.shift_tokens_right
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete the starting dot of .paddlenlp.transformers.bart.modeling.shift_tokens_right

paddlenlp/transformers/blenderbot/modeling.py Show resolved Hide resolved


class BlenderbotLearnedPositionalEmbedding(Embedding):
def __init__(self, num_embeddings, embedding_dim, padding_idx):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

padding_idx is not used.

self.embed_tokens = nn.Embedding(vocab_size, d_model, pad_token_id)
self.embed_scale = math.sqrt(d_model) if scale_embedding else 1.0
self.encoder_embed_positions = BlenderbotLearnedPositionalEmbedding(
max_position_embeddings, d_model, pad_token_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove pad_token_id if you changed the BlenderbotLearnedPositionalEmbedding class definition.

Comment on lines 334 to 344
self.encoder = BlenderbotEncoder(
self.shared, vocab_size, pad_token_id, d_model, num_encoder_layers,
encoder_attention_heads, encoder_ffn_dim, dropout,
activation_function, attention_dropout, activation_dropout,
max_position_embeddings, init_std, scale_embedding, normalize_before)

self.decoder = BlenderbotDecoder(
self.shared, vocab_size, pad_token_id, d_model, num_decoder_layers,
decoder_attention_heads, decoder_ffn_dim, dropout,
activation_function, attention_dropout, activation_dropout,
max_position_embeddings, init_std, scale_embedding, normalize_before)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to specify keyword for each named argument.

Comment on lines 366 to 367
decoder_output = self.decoder(decoder_input_ids, decoder_attention_mask,
encoder_output, memory_mask, use_cache, cache)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to specify keyword for each named argument.

"attention_mask": attention_mask,
"use_cache": use_cache,
"cache": cache
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a class named BlenderbotForCausalLM

Comment on lines 63 to 65
super(BlenderbotTokenizer, self).__init__(vocab_file, merges_file, errors,
max_len, special_tokens, pad_token,
eos_token, eol_token)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to specify keyword for each named argument.

Comment on lines 1 to 385
self.init_std = init_std
self.pad_token_id = pad_token_id
self.bos_token_id = bos_token_id
self.eos_token_id = eos_token_id
self.decoder_start_token_id = decoder_start_token_id
self.shared = nn.Embedding(vocab_size, d_model, pad_token_id)
self.encoder = BlenderbotSmallEncoder(
self.shared, vocab_size, pad_token_id, d_model, num_encoder_layers,
encoder_attention_heads, encoder_ffn_dim, dropout,
activation_function, attention_dropout, activation_dropout,
max_position_embeddings, init_std, scale_embedding, normalize_before)

self.decoder = BlenderbotSmallDecoder(
self.shared, vocab_size, pad_token_id, d_model, num_decoder_layers,
decoder_attention_heads, decoder_ffn_dim, dropout,
activation_function, attention_dropout, activation_dropout,
max_position_embeddings, init_std, scale_embedding, normalize_before)
self.apply(self.init_weights)

def forward(self,
input_ids=None,
attention_mask=None,
decoder_input_ids=None,
decoder_attention_mask=None,
encoder_output=None,
use_cache=False,
cache=None):
if decoder_input_ids is None:
decoder_input_ids = shift_tokens_right(input_ids,
self.decoder_start_token_id)
if encoder_output is None:
encoder_output = self.encoder(input_ids, attention_mask)
memory_mask = paddle.cast(
input_ids == self.pad_token_id,
dtype=paddle.get_default_dtype()).unsqueeze([1, 2]) * -1e9
memory_mask.stop_gradient = True

decoder_output = self.decoder(decoder_input_ids, decoder_attention_mask,
encoder_output, memory_mask, use_cache, cache)
# return encoder output for decoder to generate sequence.
return decoder_output, encoder_output


class BlenderbotSmallForConditionalGeneration(BlenderbotSmallPretrainedModel):
def __init__(self, blenderbot_small):
super().__init__()
self.eos_token_id = blenderbot_small.eos_token_id
self.bos_token_id = blenderbot_small.bos_token_id
self.pad_token_id = blenderbot_small.pad_token_id
self.blenderbot_small = blenderbot_small
self.lm_head_weight = self.create_parameter(
shape=[
self.blenderbot_small.config['vocab_size'], self.blenderbot_small.config['d_model']
],
dtype=self.blenderbot_small.shared.weight.dtype,
is_bias=False)
self.register_buffer("final_logits_bias",
paddle.zeros((1, self.blenderbot_small.config['vocab_size']),
dtype=paddle.get_default_dtype()))
self.apply(self.init_weights)

def forward(self,
input_ids=None,
attention_mask=None,
decoder_input_ids=None,
decoder_attention_mask=None,
encoder_output=None,
use_cache=False,
cache=None):
decoder_outputs, encoder_output = self.blenderbot_small(
input_ids, attention_mask, decoder_input_ids,
decoder_attention_mask, encoder_output, use_cache, cache)

lm_logits = paddle.tensor.matmul(
decoder_outputs[0] if use_cache else decoder_outputs,
self.lm_head_weight,
transpose_y=True) + self.final_logits_bias
if use_cache:
cache = decoder_outputs[1]
return lm_logits, cache
return lm_logits

def prepare_inputs_for_generation(self,
decoder_input_ids,
attention_mask=None,
encoder_output=None,
use_cache=True,
cache=None,
**kwargs):
if cache is not None:
decoder_input_ids = decoder_input_ids[:, -1:].unsqueeze(-1)

return {
"input_ids": None, # during prediction, Encoder_output is provided, do not need input_ids.
"decoder_input_ids": decoder_input_ids,
"encoder_output": encoder_output,
"attention_mask": attention_mask,
"use_cache": use_cache,
"cache": cache
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to reviews for Blenderbot modeling.py

@yingyibiao
Copy link
Contributor

Thanks for your contributions again! We have recently merged a PR that provides generate-api support for encoder-decoder model. Please add an example for blenderbot models. : )

Copy link
Contributor

@yingyibiao yingyibiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yingyibiao yingyibiao merged commit 6427591 into PaddlePaddle:develop Oct 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants