Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
Patrick-Star125 committed Dec 5, 2023
1 parent 6ce15e1 commit 6e29bdd
Showing 1 changed file with 6 additions and 7 deletions.
13 changes: 6 additions & 7 deletions rfcs/APIs/20231202_api_design_for_AdaptiveLogSoftmaxWithLoss.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,18 +23,17 @@ Paddle需要扩充API,新增 AdaptiveLogSoftmaxWithLoss API,

adaptive_log_softmax_with_loss的计算分步骤如下

$\text{head_output} = \text{linear}(\text{input}, \text{head_weight}, \text{head_bias})$
$$\text{head_output} = \text{linear}(\text{input}, \text{head_weight}, \text{head_bias})$$

$\text{head_logprob} = \text{log_softmax}(\text{head_output}, \text{axis}=1)$
$$\text{head_logprob} = \text{log_softmax}(\text{head_output}, \text{axis}=1)$$

$\text{output} += \text{take_along_axis}(\text{head_logprob}, \text{gather_inds.unsqueeze(1)}, \text{axis}=1).\text{squeeze()}$
$$\text{output} += \text{take_along_axis}(\text{head_logprob}, \text{gather_inds.unsqueeze(1)}, \text{axis}=1).\text{squeeze()}$$

$\text{loss} = -\text{output.mean()}$
$$\text{loss} = -\text{output.mean()}$$

## 3、意义
在自然语言处理中,当字典维度过大时,embedding 将占据模型大部分参数量。
例如机器翻译任务中,词表维度大约是2^17,embedding维度取1024,那么就会产生将近1亿参数量,
如果不共享embedding矩阵和softmax映射的矩阵,将会再多出1亿参数量。

在自然语言处理中,当字典维度过大时,embedding 可能会占据模型较大部分的参数量。

这样会引起常见的两个问题:

Expand Down

0 comments on commit 6e29bdd

Please sign in to comment.