-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Hackathon 5th No.12】Add AdaptiveLogSoftmaxWithLoss API to Paddle #770
Conversation
|
||
adaptive_log_softmax_with_loss的计算分步骤如下 | ||
|
||
$\text{head_output} = \text{linear}(\text{input}, \text{head_weight}, \text{head_bias})$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
公式格式好像有点问题
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用图片替代了
$\text{output} += \text{take_along_axis}(\text{head_logprob}, \text{gather_inds.unsqueeze(1)}, \text{axis}=1).\text{squeeze()}$ | ||
|
||
$\text{loss} = -\text{output.mean()}$ | ||
|
||
## 3、意义 | ||
在自然语言处理中,当字典维度过大时,embedding 将占据模型大部分参数量。 | ||
例如机器翻译任务中,词表维度大约是2^17,embedding维度取1024,那么就会产生将近1亿参数量, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个共享的说法是否准确?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删除
e5483e0
to
ec7a0af
Compare
|
||
adaptive_log_softmax_with_loss的计算分步骤如下 | ||
|
||
![image](https://github.com/PaddlePaddle/community/assets/69072522/3d43f3e9-deb0-4d52-96be-2cd85a104b90) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个图片好像还是有点问题,那个=1应该是axis=1吧,还有,把每一层在做什么也说明一下
|
||
layer层类API:`paddle.nn.AdaptiveLogSoftmaxWithLoss(in_features, n_classes, cutoffs, div_value=4.0, head_bias=False, name=None)`,包含两个主要方法: | ||
- forward(self, input, label),用于训练,返回为`output` 和 `loss` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个格式好像也有点问题
# 六、测试和验收的考量 | ||
测试考虑的case如下: | ||
|
||
- 数值正确性 | ||
- 数值正确性(CPU、GPU、动态图、静态图) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个正确性准备怎么验证呢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
和torch一样用计算等价的方式验证,numpy一部分缺失部分API,并且该API函数逻辑比较多,所以完全复现会比较繁琐
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
修改 AdaptiveLogSoftmaxWithLoss API 设计文档