Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

梯度全部为0(v2 api) #4381

Closed
liuyuuan opened this issue Sep 26, 2017 · 7 comments
Closed

梯度全部为0(v2 api) #4381

liuyuuan opened this issue Sep 26, 2017 · 7 comments

Comments

@liuyuuan
Copy link
Contributor

liuyuuan commented Sep 26, 2017

我使用了如下的训练过程:

    def train(args):
           paddle.init(use_gpu=args.use_gpu, trainer_count=args.trainer_count)
           
           optimizer = paddle.optimizer.Momentum(
                               momentum=0.9,
                               learning_rate=1e-3,
                               regularization=paddle.optimizer.L2Regularization(rate=1e-3))
           feeding = {...}
           reader = some_dataset.create_reader()
           train_batch_reader = paddle.batch(reader=reader, batch_size=args.batch_size)
           cost = my_network(args)
           parameters = paddle.parameters.create(cost)
           def event_handler(event):
                  """print logs"""
                  ...
           trainer.train(reader=train_batch_reader,
                                event_handler=event_handler,
                                num_passes=10,
                                feeding=feeding)

cost是用的是 sum_cost, 简化如下:

    label_probs = paddle.layer.scaling(input=neg_log_probs, weight=labels)
    cost = paddle.layer.sum_cost(input=label_probs)

其中 neg_log_probslabels 是两个dense_vector(1) 的sequence。

但是训练得到的梯度全部是0,而参数的值看起来正常,cost的值也正常。看起来就像只执行了forward一样,请问这有可能是什么问题导致的错误?

@guoshengCS
Copy link
Contributor

感觉这里的paddle.layer.scaling可能不是你期望的,ScalingLayer所做计算参见

* y.row[i] = w[i] * x.row[i]
where :math:`x` is size=dataDim input, :math:`w` is size=1 weight,
,如果需要element-wise multiplication的话可能dotmul_operator
out.row[i] += scale * (a.row[i] .* b.row[i])
是你需要的操作

@liuyuuan
Copy link
Contributor Author

@guoshengCS 多谢,不过我试过了dotmul_operator, 梯度依然是全0。

@lcy-seso
Copy link
Contributor

换成CPU试试。

@liuyuuan
Copy link
Contributor Author

@lcy-seso 多谢,cpu就ok了,是sum_cost的backward在gpu上没有实现?最终还是希望能在gpu上正常训练,cpu太慢了。

@lcy-seso
Copy link
Contributor

a related issue #3714

@lcy-seso
Copy link
Contributor

sum_cost 可以在GPU下使用,问题可以参考 #3714 ,我建议可以调参稳定之后,切换到GPU下训练。

@liuyuuan
Copy link
Contributor Author

恩,好的,多谢,这个issue我close掉吧

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants