-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
are you training some new size nets #99
Comments
Yes, we are training a new network group with size 12x256. |
Actually, the first attempts to train 12x256 over the training data produced by 6x128 and 9x192 failed spectacularly, so we are changing parameters and trying again. There will be occasional matches with few games for weak networks until we get parameters right. |
Why changing 2 variables at once? Why not for example: Did you tried technics likes net2net (https://arxiv.org/abs/1511.05641) to resize an already existing network who have some already available implementations in python? there is an implementation of it in training/tf (https://github.com/sai-dev/sai/blob/sai-0.17/training/tf/net2net.py) |
the 12x256 net seems as strong as the first 9x192 net now. |
Have you experimented with a much larger window size for 12 blocks? leela-zero#1197 (comment) |
The training window has not been 'fixed' for a long time now: normally for each generation we do 3 separate traing runs with different windows lengths. Going back at the argument of the issue: we are essentially replicating (more or less) the window size used for the original training, in order to keep a tigth match between the different trainings. |
Hello!
Thank you for pointing me to that comment which I didn't stumble on previously. As @trinetra75 explained when training/promoting we check different window sizes and try to follow the best one. When training new architectures on previous games, we generally follow the same numbers (maybe smoothing changes a little) and fine-tune the steps. I did some maths on AZ training which is the best example of working no-gating. You can find it here. TLDR we use much smaller windows than anyone other, but as long as we perform proportionally many training steps on those windows, we can avoid overfitting. |
2020-01-14 18:25 97b2fceb
VS 0f9a7f88
9 : 0 : 6 (60.00%) 15 / 5 comparison
The text was updated successfully, but these errors were encountered: