Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using pre-trained word vectors in embedding layer #490

Closed
qingqing01 opened this issue Nov 16, 2016 · 12 comments
Closed

Using pre-trained word vectors in embedding layer #490

qingqing01 opened this issue Nov 16, 2016 · 12 comments
Labels

Comments

@qingqing01
Copy link
Contributor

The following issue comes from email.

Thank you for your work on Paddle. I think the design is very interesting.

I would like to use pretrained word vectors in an embedding layer. I want the weights to be static, because my training data is small. For clarity, here's how I would implement the desired behaviour with Keras:

    model.add(
        Embedding(
            embeddings.shape[0],
            embeddings.shape[1],
            input_length=shape['max_length'],
            trainable=False,
            weights=[embeddings],
            mask_zero=True
        )
    )

Is there a way to implement this with the Paddle Python bindings? Unfortunately I haven't been able to find this in the documentation or source yet.

@qingqing01
Copy link
Contributor Author

qingqing01 commented Nov 16, 2016

The following answer comes from Jie. I paste the answer here to help more users.

Thanks for your interests in paddle。 The situation you gave us below is very popular in NLP tasks, and PaddlePaddle definitely supports such type of requirement.
If the variable name of word vectors is “embeddings”, you need to do the followings:

  • In running scripts, add two contrals:
    • --init_model_path=’place_of_the_existed_word_vectors’ # you can also put other file for initialization here. Keep variable names the same as those used in network config.
    • --load_missing_parameter_strategy='rand' # to ignore the other variables when loading the pre-trained part
  • In network config file:
  • In word vectors layer, at input line, you need to set parameter_name='embeddings', and add is_static=True or more straightforwardly add learning_rate=0

@qingqing01
Copy link
Contributor Author

qingqing01 commented Nov 16, 2016

Layer Config

The embedding_layer is as follows:

emb = embedding_layer(
    input=data_layer
    size=word_vector_dim,
    param_attr=ParamAttr(name='embeddings', is_static=True))

or

emb = embedding_layer(
    input=data_layer
    size=word_vector_dim,
    param_attr=ParamAttr(name='embeddings', learning_rate=0.0))

The language_embedding is the name of embedding weight file. This is a Chinese word embedding model tutorial. http://www.paddlepaddle.org/doc/demo/embedding_model/index.html

How to save model as PaddlePaddle format

If you need to convert your model into PaddlePaddle format, you can use following python function.
Note,

import struct
import numpy as np

def write_parameter(outfile, weights):
    """
    :param outfile: Output file name. **Note**, it should be the same as it in the above config.
    :type outfile: string.
    :param weights: parameter.
    :type weights: 1-dimension array of float type or list.
    """
    version = 0
    value_size  = 4; # means float type
    ret = ""
    for w in weights:
        ret += w.tostring()
    size = len(ret) / 4
    fo = open(outfile, 'wb')
    fo.write(struct.pack('iIQ', version, value_size, size))
    fo.write(ret)

# The weights is a 2-dimensional array.
weights=np.array([[w_11,w_12,w_13,w_14],
                        [w_21, w_22, w_23, w_24],
                         ...])  # each of line is one word vector
write_parameter("embeddings", weights.flatten())

The command line arguments

The first point described above can refer to the document
http://www.paddlepaddle.org/doc/ui/cmd_argument/use_case.html#use-model-to-initialize-network

@backyes
Copy link
Contributor

backyes commented Nov 16, 2016

@qingqing01

If we need more explainations about def write_parameterfunction, especially forfeats``

def c(outfile, feats):

I guess the feats is array type in python, but the code fragment you paste explains nothing about it.

In addition, what's the feats format? The second float data of feats is the weight of second value of first row, or second value of first column? Different answer means different implementation fo write_parameter, right?

@qingqing01
Copy link
Contributor Author

@backyes Thanks. You are right. I fixed the code above and modify feats -> weights. is it well understood?

@backyes
Copy link
Contributor

backyes commented Nov 17, 2016

@qingqing01 That's clear now.

@OleNet
Copy link
Contributor

OleNet commented Nov 30, 2016

I1130 10:24:10.250222 18388 Util.cpp:113] Calling runInitFunctions 
I1130 10:24:10.250636 18388 Util.cpp:126] Call runInitFunctions done. 
I1130 10:24:10.493953 18388 Trainer.cpp:169] trainer mode: Normal 
I1130 10:24:10.494329 18388 MultiGradientMachine.cpp:108] numLogicalDevices=1 numThreads=4 numDevices=4 
I1130 10:24:10.555085 18388 PyDataProvider2.cpp:219] loading dataprovider dataprovider::process 
I1130 10:24:10.565850 18388 PyDataProvider2.cpp:219] loading dataprovider dataprovider::process 
I1130 10:24:10.566042 18388 GradientMachine.cpp:123] Loading parameters from test/sentiment/thirdparty/emb/embeddings 
I1130 10:24:10.566069 18388 Parameter.cpp:344] **missing parameters** [test/sentiment/thirdparty/emb/embeddings/embeddings] while loading model. 
I1130 10:24:10.566082 18388 Parameter.cpp:354] embeddings missing, set to random. 
I1130 10:24:10.721670 18388 Parameter.cpp:344] missing parameters [test/sentiment/thirdparty/emb/embeddings/_lstm_transform___bidirectional_lstm_0___fw.w0] while loading model. 
I1130 10:24:10.721714 18388 Parameter.cpp:354] _lstm_transform___bidirectional_lstm_0___fw.w0 missing, set to random. 
I1130 10:24:10.743702 18388 Parameter.cpp:344] missing parameters [test/sentiment/thirdparty/emb/embeddings/___bidirectional_lstm_0___fw.w0] while loading model. 
I1130 10:24:10.743721 18388 Parameter.cpp:354] ___bidirectional_lstm_0___fw.w0 missing, set to random. 
I1130 10:24:10.829339 18388 Parameter.cpp:344] missing parameters [test/sentiment/thirdparty/emb/embeddings/___bidirectional_lstm_0___fw.wbias] while loading model. 
I1130 10:24:10.829360 18388 Parameter.cpp:354] ___bidirectional_lstm_0___fw.wbias missing, set to random. 
I1130 10:24:10.831689 18388 Parameter.cpp:344] missing parameters [test/sentiment/thirdparty/emb/embeddings/_lstm_transform___bidirectional_lstm_0___bw.w0] while loading model. 
I1130 10:24:10.831707 18388 Parameter.cpp:354] _lstm_transform___bidirectional_lstm_0___bw.w0 missing, set to random. 
I1130 10:24:10.851577 18388 Parameter.cpp:344] missing parameters [test/sentiment/thirdparty/emb/embeddings/___bidirectional_lstm_0___bw.w0] while loading model. 
I1130 10:24:10.851593 18388 Parameter.cpp:354] ___bidirectional_lstm_0___bw.w0 missing, set to random. 
I1130 10:24:10.930817 18388 Parameter.cpp:344] missing parameters [test/sentiment/thirdparty/emb/embeddings/___bidirectional_lstm_0___bw.wbias] while loading model.
I1130 10:24:10.930835 18388 Parameter.cpp:354] ___bidirectional_lstm_0___bw.wbias missing, set to random. 
I1130 10:24:10.931156 18388 Parameter.cpp:344] missing parameters [test/sentiment/thirdparty/emb/embeddings/___fc_layer_0__.w0] while loading model. 
I1130 10:24:10.931169 18388 Parameter.cpp:354] ___fc_layer_0__.w0 missing, set to random. 
I1130 10:24:10.974350 18388 Parameter.cpp:344] missing parameters [test/sentiment/thirdparty/emb/embeddings/___fc_layer_0__.wbias] while loading model. 
I1130 10:24:10.974370 18388 Parameter.cpp:354] ___fc_layer_0__.wbias missing, set to random. 
I1130 10:24:10.976465 18388 Parameter.cpp:344] missing parameters [test/sentiment/thirdparty/emb/embeddings/___embedding_1__.w0] while loading model. 
I1130 10:24:10.976483 18388 Parameter.cpp:354] ___embedding_1__.w0 missing, set to random. 
I1130 10:24:11.120676 18388 Parameter.cpp:344] missing parameters [test/sentiment/thirdparty/emb/embeddings/_lstm_transform___bidirectional_lstm_1___fw.w0] while loading model. 
I1130 10:24:11.120702 18388 Parameter.cpp:354] _lstm_transform___bidirectional_lstm_1___fw.w0 missing, set to random. 
I1130 10:24:11.140681 18388 Parameter.cpp:344] missing parameters [test/sentiment/thirdparty/emb/embeddings/___bidirectional_lstm_1___fw.w0] while loading model. 
I1130 10:24:11.140712 18388 Parameter.cpp:354] ___bidirectional_lstm_1___fw.w0 missing, set to random. 
I1130 10:24:11.220038 18388 Parameter.cpp:344] missing parameters [test/sentiment/thirdparty/emb/embeddings/___bidirectional_lstm_1___fw.wbias] while loading model. 
I1130 10:24:11.220065 18388 Parameter.cpp:354] ___bidirectional_lstm_1___fw.wbias missing, set to random. 
I1130 10:24:11.220399 18388 Parameter.cpp:344] missing parameters [test/sentiment/thirdparty/emb/embeddings/_lstm_transform___bidirectional_lstm_1___bw.w0] while loading model. 
paddle cluster_train \
  --config=test/sentiment/cluster_job_config/job_config.py \
 ...
  --init_model_path=test/sentiment/thirdparty/emb/ \
  --load_missing_parameter_strategy=rand
    word_data1 = data_layer("word1", input_dim)
    emb = embedding_layer(input=word_data1, size=emb_dim,
            param_attr=ParamAttr(name='embeddings'))

我的一系列代码如上,但是log中显示,missing parameters,是哪一步出错了呢?

@luotao1
Copy link
Contributor

luotao1 commented Nov 30, 2016

可以检查下test/sentiment/thirdparty/emb/有没有模型

@OleNet
Copy link
Contributor

OleNet commented Nov 30, 2016

我的test/sentiment/thirdparty/emb/下面,有一个embeddings文件,
是根据上文提到的这段代码生成的:

import struct
import numpy as np

def write_parameter(outfile, weights):
    """
    :param outfile: Output file name. **Note**, it should be the same as it in the above config.
   ...

@luotao1
Copy link
Contributor

luotao1 commented Nov 30, 2016

missing parameters [test/sentiment/thirdparty/emb/embeddings/embeddings] 有两个embeddings,检查下路径对不对

@OleNet
Copy link
Contributor

OleNet commented Nov 30, 2016

找到问题了,paddle cluster_train --init_model_path的这个路径是按照集群上面的路径来寻找模型,而非本地。
错误版本:

paddle cluster_train \
  --config=test/sentiment/cluster_job_config/job_config.py \
  --num_nodes=1 \
  --num_passes=20 \
  --log_period=100 \
  --dot_period=10 \
  --trainer_count=16 \
  --saving_period=1 \
  --thirdparty=./test/sentiment/thirdparty \
  --config_args=is_local=0 \
  --use_gpu gpu \
...
  --init_model_path=test/sentiment/thirdparty/emb/ \
  --load_missing_parameter_strategy=rand

修正版本:

paddle cluster_train \  --config=test/sentiment/cluster_job_config/job_config.py \
  --num_nodes=1 \
  --num_passes=20 \
  --log_period=100 \
  --dot_period=10 \
  --trainer_count=16 \
  --saving_period=1 \
  --thirdparty=./test/sentiment/thirdparty \
  --config_args=is_local=0 \
  --use_gpu gpu \
...
  --init_model_path=./thirdparty/thirdparty/emb/ \
  --load_missing_parameter_strategy=rand

@backyes backyes closed this as completed Nov 30, 2016
@keain
Copy link

keain commented Dec 4, 2016

word = data_layer(name='word_data', size=word_dict_len);
word_embedding = embedding_layer(size=word_dim, input=word, param_attr=ptt);
假设输入数据是intervalue_value_sequene类型的,长度是L, 那么经过上面两个语句后,是不是就相当于将输入是L乘以word_dict_len大小的矩阵变成了L乘以word_dim大小的矩阵呢

@qingqing01
Copy link
Contributor Author

@keain 是将L个integer_value (整数index)变成 L * word_dim的矩阵。

将输入是L乘以word_dict_len大小的矩阵

这句不太准确,你理解的输出是正确的。

zhhsplendid pushed a commit to zhhsplendid/Paddle that referenced this issue Sep 25, 2019
Meiyim pushed a commit to Meiyim/Paddle that referenced this issue May 21, 2021
thisjiang pushed a commit to thisjiang/Paddle that referenced this issue Oct 28, 2021
* cinn_builder pybind11

* export net_builder python
wangxicoding pushed a commit to wangxicoding/Paddle that referenced this issue Dec 9, 2021
gglin001 added a commit to graphcore/Paddle-fork that referenced this issue Mar 17, 2022
lizexu123 pushed a commit to lizexu123/Paddle that referenced this issue Feb 23, 2024
wwbitejotunn pushed a commit to wwbitejotunn/Paddle that referenced this issue Jun 21, 2024
Co-authored-by: danthe3rd <danthe3rd>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants