-
-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CH7: Small Conv Network Training Error - Conv2DCustomBackpropInputOp only supports NHWC. #80
Comments
Hi Nate, I actually just ran into this same problem myself recently. This is an issue of the The two options are:
Either one ought to fix it -- let us know if that works for you! |
Hi Kevin, That did it! Super fast response. I appreciate it. Great content btw. |
Hi Kevin, in attempting to replicate the model training - I ran the 7.3 code (with the change of channels_last) with the small network layers, and I keep getting a result similar to the following:
I noticed that the cycles are very different - i.e. under epoch it has 2672/2672 instead of 12288/12288. Is that a random factor relative to the 100 (num_games) games it selects? How would I go about getting the accuracy seen in the book? |
Hi Kevin, |
Hello @Nkonovalenko, please see this writeup here: https://kferg.dev/posts/2021/deep-learning-and-the-game-of-go-training-results-from-chapter-7/ Hopefully that gets you unblocked! |
I am trying to get this to run on colab with a tpu, unfortunately the generator in the code base is not compatible with distribution across the tpu cluster. I solved this by just loading the dataset using generator=False. My problem is that the network is quickly overfitting. I guess increasing the number of games should help with this? |
Thank you so much, it did! |
@constant5 The generator version creates large temporary files on disk, so I suspect that's why it won't work with colab (just guessing though). As for the overfitting, more games is a good idea. I'd say around 10,000 games is the minimum to train a network that is useful for actual game play. And more is better. Not sure what the memory constraints are in colab, but you may have to modify the code to chunk it up yourself. |
This may have not been the most efficient way to do it but after I wrote the consolidated NumPy files to disk I rewrote the data to tf records:
Then I created a tfrecords data generator:
This works well for colab and GPU training but not for TPU because the TPU does not support local file sharding. |
hi, i read the writeup and set
Though whether reproducing the result or not is not a blocking point of further reading the book, i still want to get a similar result for a check point. Any hint to check? @Nkonovalenko you mentioned that you did it. So you just reproduce the result after changing only the |
Attempting to run the small convolution network on MacOS Big Sur.
Not sure what my issue is exactly - could be versions used. Any ideas what I can do to make it work?
tensorflow==2.4.0
Python 3.8.2
`...
Epoch 1/5
Traceback (most recent call last):
File "training_small.py", line 37, in
model.fit_generator(generator=generator.generate(batch_size, num_classes), epochs=epochs, steps_per_epoch=generator.get_num_samples() / batch_size, validation_data=test_generator.generate(batch_size, num_classes), validation_steps=test_generator.get_num_samples() / batch_size, callbacks=[ ModelCheckpoint('../checkpoints/small_model_epoch_{epoch}.h5')])
File "/Library/Python/3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1847, in fit_generator
return self.fit(
File "/Library/Python/3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "/Library/Python/3.8/site-packages/tensorflow/python/eager/def_function.py", line 828, in call
result = self._call(*args, **kwds)
File "/Library/Python/3.8/site-packages/tensorflow/python/eager/def_function.py", line 888, in _call
return self._stateless_fn(*args, **kwds)
File "/Library/Python/3.8/site-packages/tensorflow/python/eager/function.py", line 2942, in call
return graph_function._call_flat(
File "/Library/Python/3.8/site-packages/tensorflow/python/eager/function.py", line 1918, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/Library/Python/3.8/site-packages/tensorflow/python/eager/function.py", line 555, in call
outputs = execute.execute(
File "/Library/Python/3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Conv2DCustomBackpropInputOp only supports NHWC.
[[node gradient_tape/sequential/conv2d_3/Conv2D/Conv2DBackpropInput (defined at training_small.py:37) ]] [Op:__inference_train_function_781]
Function call stack:
train_function
`
The text was updated successfully, but these errors were encountered: