python train1.py timit -gpu 0,how do i do this without gpu? #75

jackylee1 · 2018-11-15T03:48:15Z

/usr/local/lib/python2.7/dist-packages/pydub/utils.py:165: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
case: timit, logdir: /data/private/vc/logdir/timit/train1
[1115 03:42:59 @logger.py:108] WRN Log directory /data/private/vc/logdir/timit/train1 exists! Use 'd' to delete it.
[1115 03:42:59 @logger.py:111] WRN If you're resuming from a previous run, you can choose to keep it.
Press any other key to exit.
Select Action: k (keep) / d (delete) / q (quit):k
[1115 03:43:24 @logger.py:66] Existing log file '/data/private/vc/logdir/timit/train1/log.log' backuped to '/data/private/vc/logdir/timit/train1/log.log.1115-034324'
[1115 03:43:24 @logger.py:73] Argv: train1.py timit -gpu 0
[1115 03:43:24 @parallel.py:186] [MultiProcessPrefetchData] Will fork a dataflow more than one times. This assumes the datapoints are i.i.d.
[1115 03:43:24 @argtools.py:146] WRN Install python-prctl so that processes can be cleaned with guarantee.
Process _Worker-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/usr/local/lib/python2.7/dist-packages/tensorpack/dataflow/parallel.py", line 163, in run
for dp in self.ds:
File "/usr/local/lib/python2.7/dist-packages/tensorpack/dataflow/common.py", line 116, in iter
for data in self.ds:
File "/content/drive/app/deep-voice-conversion-master/data_load.py", line 34, in get_data
wav_file = random.choice(self.wav_files)
File "/usr/lib/python2.7/random.py", line 277, in choice
return seq[int(self.random() * len(seq))] # raises IndexError if seq is empty
[1115 03:43:24 @config.py:165] WRN TrainConfig.nr_tower was deprecated! Set the number of GPUs on the trainer instead!
IndexError: list index out of range
[1115 03:43:24 @config.py:166] WRN See tensorpack/tensorpack#458 for more information.
[1115 03:43:24 @training.py:52] [DataParallel] Training a model of 2 towers.
[1115 03:43:24 @training.py:54] ERR [DataParallel] TensorFlow was not built with CUDA support!
[1115 03:43:24 @interface.py:46] Automatically applying StagingInput on the DataFlow.
[1115 03:43:24 @develop.py:96] WRN [Deprecated] ModelDescBase._get_inputs() interface will be deprecated after 30 Mar. Use inputs() instead!
Process _Worker-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/usr/local/lib/python2.7/dist-packages/tensorpack/dataflow/parallel.py", line 163, in run
for dp in self.ds:
File "/usr/local/lib/python2.7/dist-packages/tensorpack/dataflow/common.py", line 116, in iter
for data in self.ds:
File "/content/drive/app/deep-voice-conversion-master/data_load.py", line 34, in get_data
wav_file = random.choice(self.wav_files)
File "/usr/lib/python2.7/random.py", line 277, in choice
return seq[int(self.random() * len(seq))] # raises IndexError if seq is empty
IndexError: list index out of range
[1115 03:43:24 @input_source.py:220] Setting up the queue 'QueueInput/input_queue' for CPU prefetching ...
Process _Worker-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/usr/local/lib/python2.7/dist-packages/tensorpack/dataflow/parallel.py", line 163, in run
for dp in self.ds:
File "/usr/local/lib/python2.7/dist-packages/tensorpack/dataflow/common.py", line 116, in iter
for data in self.ds:
File "/content/drive/app/deep-voice-conversion-master/data_load.py", line 34, in get_data
wav_file = random.choice(self.wav_files)
File "/usr/lib/python2.7/random.py", line 277, in choice
return seq[int(self.random() * len(seq))] # raises IndexError if seq is empty
IndexError: list index out of range
Process _Worker-4:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/usr/local/lib/python2.7/dist-packages/tensorpack/dataflow/parallel.py", line 163, in run
for dp in self.ds:
File "/usr/local/lib/python2.7/dist-packages/tensorpack/dataflow/common.py", line 116, in iter
for data in self.ds:
File "/content/drive/app/deep-voice-conversion-master/data_load.py", line 34, in get_data
wav_file = random.choice(self.wav_files)
File "/usr/lib/python2.7/random.py", line 277, in choice
return seq[int(self.random() * len(seq))] # raises IndexError if seq is empty
IndexError: list index out of range
[1115 03:43:24 @training.py:112] Building graph for training tower 0 on device /gpu:0 ...
[1115 03:43:24 @develop.py:96] WRN [Deprecated] ModelDescBase._build_graph() interface will be deprecated after 30 Mar. Use build_graph() instead!
[1115 03:43:25 @develop.py:96] WRN [Deprecated] get_cost() and self.cost will be deprecated after 30 Mar. Return the cost tensor directly in build_graph() instead!
[1115 03:43:25 @develop.py:96] WRN [Deprecated] ModelDescBase._get_optimizer() interface will be deprecated after 30 Mar. Use optimizer() instead!
[1115 03:43:26 @training.py:112] Building graph for training tower 1 on device /gpu:1 ...
[1115 03:43:26 @develop.py:96] WRN [Deprecated] ModelDescBase._build_graph() interface will be deprecated after 30 Mar. Use build_graph() instead!
[1115 03:43:27 @develop.py:96] WRN [Deprecated] get_cost() and self.cost will be deprecated after 30 Mar. Return the cost tensor directly in build_graph() instead!
[1115 03:43:29 @collection.py:164] These collections were modified but restored in tower1: (tf.GraphKeys.SUMMARIES: 3->5)
[1115 03:43:30 @training.py:322] 'sync_variables_from_main_tower' includes 174 operations.
[1115 03:43:30 @model_utils.py:64] Trainable Variables:
name shape dim

net1/prenet/dense1/kernel:0 [40, 128] 5120
net1/prenet/dense1/bias:0 [128] 128
net1/prenet/dense2/kernel:0 [128, 64] 8192
net1/prenet/dense2/bias:0 [64] 64
net1/cbhg/conv1d_banks/num_1/conv1d/conv1d/kernel:0 [1, 64, 64] 4096
net1/cbhg/conv1d_banks/num_1/normalize/beta:0 [64] 64
net1/cbhg/conv1d_banks/num_1/normalize/gamma:0 [64] 64
net1/cbhg/conv1d_banks/num_2/conv1d/conv1d/kernel:0 [2, 64, 64] 8192
net1/cbhg/conv1d_banks/num_2/normalize/beta:0 [64] 64
net1/cbhg/conv1d_banks/num_2/normalize/gamma:0 [64] 64
net1/cbhg/conv1d_banks/num_3/conv1d/conv1d/kernel:0 [3, 64, 64] 12288
net1/cbhg/conv1d_banks/num_3/normalize/beta:0 [64] 64
net1/cbhg/conv1d_banks/num_3/normalize/gamma:0 [64] 64
net1/cbhg/conv1d_banks/num_4/conv1d/conv1d/kernel:0 [4, 64, 64] 16384
net1/cbhg/conv1d_banks/num_4/normalize/beta:0 [64] 64
net1/cbhg/conv1d_banks/num_4/normalize/gamma:0 [64] 64
net1/cbhg/conv1d_banks/num_5/conv1d/conv1d/kernel:0 [5, 64, 64] 20480
net1/cbhg/conv1d_banks/num_5/normalize/beta:0 [64] 64
net1/cbhg/conv1d_banks/num_5/normalize/gamma:0 [64] 64
net1/cbhg/conv1d_banks/num_6/conv1d/conv1d/kernel:0 [6, 64, 64] 24576
net1/cbhg/conv1d_banks/num_6/normalize/beta:0 [64] 64
net1/cbhg/conv1d_banks/num_6/normalize/gamma:0 [64] 64
net1/cbhg/conv1d_banks/num_7/conv1d/conv1d/kernel:0 [7, 64, 64] 28672
net1/cbhg/conv1d_banks/num_7/normalize/beta:0 [64] 64
net1/cbhg/conv1d_banks/num_7/normalize/gamma:0 [64] 64
net1/cbhg/conv1d_banks/num_8/conv1d/conv1d/kernel:0 [8, 64, 64] 32768
net1/cbhg/conv1d_banks/num_8/normalize/beta:0 [64] 64
net1/cbhg/conv1d_banks/num_8/normalize/gamma:0 [64] 64
net1/cbhg/conv1d_1/conv1d/kernel:0 [3, 512, 64] 98304
net1/cbhg/normalize/beta:0 [64] 64
net1/cbhg/normalize/gamma:0 [64] 64
net1/cbhg/conv1d_2/conv1d/kernel:0 [3, 64, 64] 12288
net1/cbhg/highwaynet_0/dense1/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_0/dense1/bias:0 [64] 64
net1/cbhg/highwaynet_0/dense2/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_0/dense2/bias:0 [64] 64
net1/cbhg/highwaynet_1/dense1/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_1/dense1/bias:0 [64] 64
net1/cbhg/highwaynet_1/dense2/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_1/dense2/bias:0 [64] 64
net1/cbhg/highwaynet_2/dense1/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_2/dense1/bias:0 [64] 64
net1/cbhg/highwaynet_2/dense2/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_2/dense2/bias:0 [64] 64
net1/cbhg/highwaynet_3/dense1/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_3/dense1/bias:0 [64] 64
net1/cbhg/highwaynet_3/dense2/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_3/dense2/bias:0 [64] 64
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/gates/kernel:0 [128, 128] 16384
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/gates/bias:0 [128] 128
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/candidate/kernel:0 [128, 64] 8192
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/candidate/bias:0 [64] 64
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/gates/kernel:0 [128, 128] 16384
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/gates/bias:0 [128] 128
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/candidate/kernel:0 [128, 64] 8192
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/candidate/bias:0 [64] 64
net1/dense/kernel:0 [128, 61] 7808
net1/dense/bias:0 [61] 61
Total #vars=58, #params=363389, size=1.39MB
[1115 03:43:30 @base.py:209] Setup callbacks graph ...
[1115 03:43:31 @summary.py:38] Maintain moving average summary of 0 tensors in collection MOVING_SUMMARY_OPS.
[1115 03:43:31 @summary.py:75] Summarizing collection 'summaries' of size 3.
[1115 03:43:32 @base.py:227] Creating the session ...
2018-11-15 03:43:32.986913: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-11-15 03:43:32.986948: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-11-15 03:43:32.986975: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-11-15 03:43:32.986998: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-11-15 03:43:32.987022: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Traceback (most recent call last):
File "train1.py", line 78, in
train(args, logdir=logdir_train1)
File "train1.py", line 60, in train
launch_train_with_config(train_conf, trainer=trainer)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/train/interface.py", line 97, in launch_train_with_config
extra_callbacks=config.extra_callbacks)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/train/base.py", line 341, in train_with_defaults
steps_per_epoch, starting_epoch, max_epoch)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/train/base.py", line 312, in train
self.initialize(session_creator, session_init)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/utils/argtools.py", line 176, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/train/tower.py", line 144, in initialize
super(TowerTrainer, self).initialize(session_creator, session_init)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/utils/argtools.py", line 176, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/train/base.py", line 229, in initialize
self.sess = session_creator.create_session()
File "/usr/local/lib/python2.7/dist-packages/tensorpack/tfutils/sesscreate.py", line 43, in create_session
sess.run(tf.global_variables_initializer())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' with these attrs. Registered devices: [CPU], Registered kernels:

 [[Node: AllReduceGrads/NcclAllReduce_105 = NcclAllReduce[T=DT_FLOAT, num_devices=2, reduction="sum", shared_name="c52", _device="/device:GPU:1"](tower1/gradients/tower1/net1/cbhg/gru/bidirectional_rnn/bw/bw/while/bw/gru_cell/gates/gates/MatMul/Enter_grad/b_acc_3)]]

Caused by op u'AllReduceGrads/NcclAllReduce_105', defined at:
File "train1.py", line 78, in
train(args, logdir=logdir_train1)
File "train1.py", line 60, in train
launch_train_with_config(train_conf, trainer=trainer)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/train/interface.py", line 87, in launch_train_with_config
model._build_graph_get_cost, model.get_optimizer)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/utils/argtools.py", line 176, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/train/tower.py", line 204, in setup_graph
train_callbacks = self._setup_graph(input, get_cost_fn, get_opt_fn)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/train/trainers.py", line 186, in _setup_graph
self._make_get_grad_fn(input, get_cost_fn, get_opt_fn), get_opt_fn)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/graph_builder/training.py", line 244, in build
all_grads = allreduce_grads(all_grads, average=self._average) # #gpu x #param
File "/usr/local/lib/python2.7/dist-packages/tensorpack/tfutils/scope_utils.py", line 94, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/graph_builder/utils.py", line 157, in allreduce_grads
summed = nccl.all_sum(grads)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/nccl/python/ops/nccl_ops.py", line 48, in all_sum
return _apply_all_reduce('sum', tensors)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/nccl/python/ops/nccl_ops.py", line 154, in _apply_all_reduce
shared_name=shared_name))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/nccl/ops/gen_nccl_ops.py", line 43, in nccl_all_reduce
shared_name=shared_name, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'NcclAllReduce' with these attrs. Registered devices: [CPU], Registered kernels:

 [[Node: AllReduceGrads/NcclAllReduce_105 = NcclAllReduce[T=DT_FLOAT, num_devices=2, reduction="sum", shared_name="c52", _device="/device:GPU:1"](tower1/gradients/tower1/net1/cbhg/gru/bidirectional_rnn/bw/bw/while/bw/gru_cell/gates/gates/MatMul/Enter_grad/b_acc_3)]]

The text was updated successfully, but these errors were encountered:

LukeJacob · 2018-12-18T08:39:39Z

I got the same problem. waiting for answers. BTW, Is ffmpeg necessary for this project?

dheerajinampudi · 2019-02-21T05:14:20Z

@jackylee1 -gpu=0 worked for me. @LukeJacob ffmpeg is necessary

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python train1.py timit -gpu 0,how do i do this without gpu? #75

python train1.py timit -gpu 0,how do i do this without gpu? #75

jackylee1 commented Nov 15, 2018

LukeJacob commented Dec 18, 2018

dheerajinampudi commented Feb 21, 2019

python train1.py timit -gpu 0,how do i do this without gpu? #75

python train1.py timit -gpu 0,how do i do this without gpu? #75

Comments

jackylee1 commented Nov 15, 2018

LukeJacob commented Dec 18, 2018

dheerajinampudi commented Feb 21, 2019