You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering if anyone has tried customizing the code to multi GPU training instead of TPUs.
The current code works for a single GPU without a lot of modifications (set use_tpu = False). However, I am facing some trouble in running it with multi GPU.
I changed the configuration as follows (tensorflow-gpu 1.13.1):
distribution = tf.contrib.distribute.MirroredStrategy(num_gpus=FLAGS.num_gpus) run_config = tf.estimator.RunConfig( log_step_count_steps = 10, save_summary_steps = 10, model_dir=FLAGS.output_dir, save_checkpoints_steps=FLAGS.iterations_per_loop, keep_checkpoint_max=5, train_distribute = distribution )
raise ValueError("You must specify an aggregation method to update a "
ValueError: You must specify an aggregation method to update a MirroredVariable in Replica Context.
Has anyone maybe found a solution to this?
Thanks.
The text was updated successfully, but these errors were encountered:
Hi,
I was wondering if anyone has tried customizing the code to multi GPU training instead of TPUs.
The current code works for a single GPU without a lot of modifications (set use_tpu = False). However, I am facing some trouble in running it with multi GPU.
I changed the configuration as follows (tensorflow-gpu 1.13.1):
distribution = tf.contrib.distribute.MirroredStrategy(num_gpus=FLAGS.num_gpus)
run_config = tf.estimator.RunConfig( log_step_count_steps = 10, save_summary_steps = 10, model_dir=FLAGS.output_dir, save_checkpoints_steps=FLAGS.iterations_per_loop, keep_checkpoint_max=5, train_distribute = distribution )
estimator = tf.estimator.Estimator( model_fn=model_fn, config=run_config, model_dir = FLAGS.output_dir, params = {'batch_size': FLAGS.batch_size} )
estimator.train(input_fn=train_input_fn, steps=num_train_steps)
However, I have the following error:
Has anyone maybe found a solution to this?
Thanks.
The text was updated successfully, but these errors were encountered: