Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

are you training some new size nets #99

Open
l1t1 opened this issue Jan 14, 2020 · 7 comments
Open

are you training some new size nets #99

l1t1 opened this issue Jan 14, 2020 · 7 comments

Comments

@l1t1
Copy link

l1t1 commented Jan 14, 2020

2020-01-14 18:25 97b2fceb

VS 0f9a7f88

9 : 0 : 6 (60.00%) 15 / 5 comparison

@trinetra75
Copy link
Member

Yes, we are training a new network group with size 12x256.
It is still in progress and it is not yet clear when the switch will happen

@Vandertic
Copy link
Member

Actually, the first attempts to train 12x256 over the training data produced by 6x128 and 9x192 failed spectacularly, so we are changing parameters and trying again. There will be occasional matches with few games for weak networks until we get parameters right.

@Cabu
Copy link

Cabu commented Jan 30, 2020

Why changing 2 variables at once? Why not for example:
9x192 -> 12x192 -> 12x256 -> 15x256

Did you tried technics likes net2net (https://arxiv.org/abs/1511.05641) to resize an already existing network who have some already available implementations in python? there is an implementation of it in training/tf (https://github.com/sai-dev/sai/blob/sai-0.17/training/tf/net2net.py)

@l1t1
Copy link
Author

l1t1 commented Jan 31, 2020

the 12x256 net seems as strong as the first 9x192 net now.

@barrtgt
Copy link

barrtgt commented Feb 1, 2020

Have you experimented with a much larger window size for 12 blocks? leela-zero#1197 (comment)

@trinetra75
Copy link
Member

trinetra75 commented Feb 4, 2020

The training window has not been 'fixed' for a long time now: normally for each generation we do 3 separate traing runs with different windows lengths.
Recently those window lengths have been around 12 generations (i.e. from 16/20 down to 8/6) depending on the frequency a given window was producing the best networks.
We did this since, quite near the beginning of this training, we have proven that the window length was the most important meta-parameter for the training: where a window of 16 failed to produce stronger networks a window of 8 instead produced only good candidates.

Going back at the argument of the issue: we are essentially replicating (more or less) the window size used for the original training, in order to keep a tigth match between the different trainings.
However other parameters, such as Learning Rate and Number of steps need to be fine tuned in order to avoid overfitting and this is why we needed to restart the training of the 12x256 nets more than once...

@Vandertic
Copy link
Member

Hello!
Sorry for being away so long, but RL.

Have you experimented with a much larger window size for 12 blocks? leela-zero#1197 (comment)

Thank you for pointing me to that comment which I didn't stumble on previously.

As @trinetra75 explained when training/promoting we check different window sizes and try to follow the best one. When training new architectures on previous games, we generally follow the same numbers (maybe smoothing changes a little) and fine-tune the steps.

I did some maths on AZ training which is the best example of working no-gating. You can find it here.

TLDR we use much smaller windows than anyone other, but as long as we perform proportionally many training steps on those windows, we can avoid overfitting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants