Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi GPU training of the discriminator #29

Open
S-Abdelnabi opened this issue Dec 18, 2019 · 1 comment
Open

Multi GPU training of the discriminator #29

S-Abdelnabi opened this issue Dec 18, 2019 · 1 comment

Comments

@S-Abdelnabi
Copy link

Hi,

I was wondering if anyone has tried customizing the code to multi GPU training instead of TPUs.
The current code works for a single GPU without a lot of modifications (set use_tpu = False). However, I am facing some trouble in running it with multi GPU.

I changed the configuration as follows (tensorflow-gpu 1.13.1):

distribution = tf.contrib.distribute.MirroredStrategy(num_gpus=FLAGS.num_gpus)
run_config = tf.estimator.RunConfig( log_step_count_steps = 10, save_summary_steps = 10, model_dir=FLAGS.output_dir, save_checkpoints_steps=FLAGS.iterations_per_loop, keep_checkpoint_max=5, train_distribute = distribution )

estimator = tf.estimator.Estimator( model_fn=model_fn, config=run_config, model_dir = FLAGS.output_dir, params = {'batch_size': FLAGS.batch_size} )

estimator.train(input_fn=train_input_fn, steps=num_train_steps)

However, I have the following error:

raise ValueError("You must specify an aggregation method to update a "
ValueError: You must specify an aggregation method to update a MirroredVariable in Replica Context.

Has anyone maybe found a solution to this?
Thanks.

@wind91725
Copy link

I also have this problem. Did you solve it? Can you help me solve this problem? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants