How to use CE regularizer in chain-model training? #11

glynpu · 2019-10-23T09:06:08Z

Hi,
I wander is it appropriate to add CE regularizer(grad_xent) directly to grad in chain model training?
As implemented as: grad.add_mat(chain_opts.xent_regularize, grad_xent) .

In kaldi's chain model recipe, e.g. aishell s5. The network architecture has two branches after layer tdnn6, one for chain-model(output layer), the other one for CE(output-xent layer).
Derivative matrix grad is applied to "output", while grad_xent is applied to "output-xent".
If grad_xent is merged into grad, there will be no prefinal-xent--> output-xent branch at all.

jzlianglu · 2019-10-25T17:35:37Z

That is a good question. I missed that. Indeed, for the current implementation, the CE regularization does not work in my experiments. It is a bit unclear to me why it needs another branch for CE regularization. For the lattice-based sequence training, I did not use the 2nd branch, and CE regularization worked well. I'll do some comparison, and will update the code. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use CE regularizer in chain-model training? #11

How to use CE regularizer in chain-model training? #11

glynpu commented Oct 23, 2019 •

edited

Loading

jzlianglu commented Oct 25, 2019

How to use CE regularizer in chain-model training? #11

How to use CE regularizer in chain-model training? #11

Comments

glynpu commented Oct 23, 2019 • edited Loading

jzlianglu commented Oct 25, 2019

glynpu commented Oct 23, 2019 •

edited

Loading