-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using pre-trained word vectors in embedding layer #490
Comments
The following answer comes from Jie. I paste the answer here to help more users. Thanks for your interests in paddle。 The situation you gave us below is very popular in NLP tasks, and PaddlePaddle definitely supports such type of requirement.
|
Layer ConfigThe
or
The How to save model as PaddlePaddle formatIf you need to convert your model into PaddlePaddle format, you can use following python function. import struct
import numpy as np
def write_parameter(outfile, weights):
"""
:param outfile: Output file name. **Note**, it should be the same as it in the above config.
:type outfile: string.
:param weights: parameter.
:type weights: 1-dimension array of float type or list.
"""
version = 0
value_size = 4; # means float type
ret = ""
for w in weights:
ret += w.tostring()
size = len(ret) / 4
fo = open(outfile, 'wb')
fo.write(struct.pack('iIQ', version, value_size, size))
fo.write(ret)
# The weights is a 2-dimensional array.
weights=np.array([[w_11,w_12,w_13,w_14],
[w_21, w_22, w_23, w_24],
...]) # each of line is one word vector
write_parameter("embeddings", weights.flatten()) The command line argumentsThe first point described above can refer to the document |
If we need more explainations about def c(outfile, feats): I guess the In addition, what's the |
@backyes Thanks. You are right. I fixed the code above and modify |
@qingqing01 That's clear now. |
我的一系列代码如上,但是log中显示,missing parameters,是哪一步出错了呢? |
可以检查下 |
我的test/sentiment/thirdparty/emb/下面,有一个embeddings文件,
|
missing parameters [test/sentiment/thirdparty/emb/embeddings/embeddings] 有两个embeddings,检查下路径对不对 |
找到问题了,paddle cluster_train --init_model_path的这个路径是按照集群上面的路径来寻找模型,而非本地。
修正版本:
|
word = data_layer(name='word_data', size=word_dict_len); |
@keain 是将L个integer_value (整数index)变成 L * word_dim的矩阵。
这句不太准确,你理解的输出是正确的。 |
fix the ref for api_guides
fix ernie-gen model path
* cinn_builder pybind11 * export net_builder python
delete transformers.md
…e#490) Co-authored-by: Bai Yifan <me@ethanbai.com>
Co-authored-by: danthe3rd <danthe3rd>
The following issue comes from email.
Thank you for your work on Paddle. I think the design is very interesting.
I would like to use pretrained word vectors in an embedding layer. I want the weights to be static, because my training data is small. For clarity, here's how I would implement the desired behaviour with Keras:
Is there a way to implement this with the Paddle Python bindings? Unfortunately I haven't been able to find this in the documentation or source yet.
The text was updated successfully, but these errors were encountered: