are you training some new size nets #99

l1t1 · 2020-01-14T20:59:26Z

2020-01-14 18:25 97b2fceb

VS 0f9a7f88

9 : 0 : 6 (60.00%) 15 / 5 comparison

trinetra75 · 2020-01-16T21:51:51Z

Yes, we are training a new network group with size 12x256.
It is still in progress and it is not yet clear when the switch will happen

Vandertic · 2020-01-20T17:38:31Z

Actually, the first attempts to train 12x256 over the training data produced by 6x128 and 9x192 failed spectacularly, so we are changing parameters and trying again. There will be occasional matches with few games for weak networks until we get parameters right.

Cabu · 2020-01-30T17:10:36Z

Why changing 2 variables at once? Why not for example:
9x192 -> 12x192 -> 12x256 -> 15x256

Did you tried technics likes net2net (https://arxiv.org/abs/1511.05641) to resize an already existing network who have some already available implementations in python? there is an implementation of it in training/tf (https://github.com/sai-dev/sai/blob/sai-0.17/training/tf/net2net.py)

l1t1 · 2020-01-31T23:47:50Z

the 12x256 net seems as strong as the first 9x192 net now.

barrtgt · 2020-02-01T22:41:00Z

Have you experimented with a much larger window size for 12 blocks? leela-zero#1197 (comment)

trinetra75 · 2020-02-04T20:21:06Z

The training window has not been 'fixed' for a long time now: normally for each generation we do 3 separate traing runs with different windows lengths.
Recently those window lengths have been around 12 generations (i.e. from 16/20 down to 8/6) depending on the frequency a given window was producing the best networks.
We did this since, quite near the beginning of this training, we have proven that the window length was the most important meta-parameter for the training: where a window of 16 failed to produce stronger networks a window of 8 instead produced only good candidates.

Going back at the argument of the issue: we are essentially replicating (more or less) the window size used for the original training, in order to keep a tigth match between the different trainings.
However other parameters, such as Learning Rate and Number of steps need to be fine tuned in order to avoid overfitting and this is why we needed to restart the training of the 12x256 nets more than once...

Vandertic · 2020-02-14T17:46:11Z

Hello!
Sorry for being away so long, but RL.

Have you experimented with a much larger window size for 12 blocks? leela-zero#1197 (comment)

Thank you for pointing me to that comment which I didn't stumble on previously.

As @trinetra75 explained when training/promoting we check different window sizes and try to follow the best one. When training new architectures on previous games, we generally follow the same numbers (maybe smoothing changes a little) and fine-tune the steps.

I did some maths on AZ training which is the best example of working no-gating. You can find it here.

TLDR we use much smaller windows than anyone other, but as long as we perform proportionally many training steps on those windows, we can avoid overfitting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

are you training some new size nets #99

are you training some new size nets #99

l1t1 commented Jan 14, 2020

trinetra75 commented Jan 16, 2020

Vandertic commented Jan 20, 2020

Cabu commented Jan 30, 2020 •

edited

Loading

l1t1 commented Jan 31, 2020

barrtgt commented Feb 1, 2020

trinetra75 commented Feb 4, 2020 •

edited

Loading

Vandertic commented Feb 14, 2020

are you training some new size nets #99

are you training some new size nets #99

Comments

l1t1 commented Jan 14, 2020

trinetra75 commented Jan 16, 2020

Vandertic commented Jan 20, 2020

Cabu commented Jan 30, 2020 • edited Loading

l1t1 commented Jan 31, 2020

barrtgt commented Feb 1, 2020

trinetra75 commented Feb 4, 2020 • edited Loading

Vandertic commented Feb 14, 2020

Cabu commented Jan 30, 2020 •

edited

Loading

trinetra75 commented Feb 4, 2020 •

edited

Loading