We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
展示如何在不使用$\mathbf{g}_t'$的情况下实现算法。为什么这是个好主意? 解答: 在不使用 $\mathbf{g}_t'$ 的情况下,Adadelta算法的更新步骤可以进行如下修改:
def adadelta(params, states, hyperparams): rho, eps = hyperparams['rho'], 1e-5 for param, (s, delta) in zip(params, states): with torch.no_grad(): # 计算梯度平方的移动平均值 s[:] = rho * s + (1 - rho) * param.grad ** 2 # 计算参数更新的变化量 update = (torch.sqrt(delta + eps) / torch.sqrt(s + eps)) * param.grad # 更新参数 param[:] -= update # 计算参数更新的变化量的移动平均值 delta[:] = rho * delta + (1 - rho) * update ** 2 # 清零梯度 param.grad.data.zero_()
我逐行对比了这段代码与书中的adadelta的实现代码:
def adadelta(params, states, hyperparams): rho, eps = hyperparams['rho'], 1e-5 for p, (s, delta) in zip(params, states): with torch.no_grad(): # In-placeupdatesvia[:] s[:] = rho * s + (1 - rho) * torch.square(p.grad) g = (torch.sqrt(delta + eps) / torch.sqrt(s + eps)) * p.grad p[:] -= g delta[:] = rho * delta + (1 - rho) * g * g p.grad.data.zero_()
发现二者的区别仅相当于把变量g改名为update 并没有根据题意实现不使用 $\mathbf{g}_t'$ 实现算法
我的水平有限,也给不出更好的解答 请社区大佬们指点
The text was updated successfully, but these errors were encountered:
No branches or pull requests
练习11.9.2
展示如何在不使用$\mathbf{g}_t'$的情况下实现算法。为什么这是个好主意?$\mathbf{g}_t'$ 的情况下,Adadelta算法的更新步骤可以进行如下修改:
解答:
在不使用
我逐行对比了这段代码与书中的adadelta的实现代码:
发现二者的区别仅相当于把变量g改名为update$\mathbf{g}_t'$ 实现算法
并没有根据题意实现不使用
我的水平有限,也给不出更好的解答
请社区大佬们指点
The text was updated successfully, but these errors were encountered: