Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python train1.py timit -gpu 0,how do i do this without gpu? #75

Open
jackylee1 opened this issue Nov 15, 2018 · 2 comments
Open

python train1.py timit -gpu 0,how do i do this without gpu? #75

jackylee1 opened this issue Nov 15, 2018 · 2 comments

Comments

@jackylee1
Copy link

/usr/local/lib/python2.7/dist-packages/pydub/utils.py:165: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
case: timit, logdir: /data/private/vc/logdir/timit/train1
[1115 03:42:59 @logger.py:108] WRN Log directory /data/private/vc/logdir/timit/train1 exists! Use 'd' to delete it.
[1115 03:42:59 @logger.py:111] WRN If you're resuming from a previous run, you can choose to keep it.
Press any other key to exit.
Select Action: k (keep) / d (delete) / q (quit):k
[1115 03:43:24 @logger.py:66] Existing log file '/data/private/vc/logdir/timit/train1/log.log' backuped to '/data/private/vc/logdir/timit/train1/log.log.1115-034324'
[1115 03:43:24 @logger.py:73] Argv: train1.py timit -gpu 0
[1115 03:43:24 @parallel.py:186] [MultiProcessPrefetchData] Will fork a dataflow more than one times. This assumes the datapoints are i.i.d.
[1115 03:43:24 @argtools.py:146] WRN Install python-prctl so that processes can be cleaned with guarantee.
Process _Worker-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/usr/local/lib/python2.7/dist-packages/tensorpack/dataflow/parallel.py", line 163, in run
for dp in self.ds:
File "/usr/local/lib/python2.7/dist-packages/tensorpack/dataflow/common.py", line 116, in iter
for data in self.ds:
File "/content/drive/app/deep-voice-conversion-master/data_load.py", line 34, in get_data
wav_file = random.choice(self.wav_files)
File "/usr/lib/python2.7/random.py", line 277, in choice
return seq[int(self.random() * len(seq))] # raises IndexError if seq is empty
[1115 03:43:24 @config.py:165] WRN TrainConfig.nr_tower was deprecated! Set the number of GPUs on the trainer instead!
IndexError: list index out of range
[1115 03:43:24 @config.py:166] WRN See tensorpack/tensorpack#458 for more information.
[1115 03:43:24 @training.py:52] [DataParallel] Training a model of 2 towers.
[1115 03:43:24 @training.py:54] ERR [DataParallel] TensorFlow was not built with CUDA support!
[1115 03:43:24 @interface.py:46] Automatically applying StagingInput on the DataFlow.
[1115 03:43:24 @develop.py:96] WRN [Deprecated] ModelDescBase._get_inputs() interface will be deprecated after 30 Mar. Use inputs() instead!
Process _Worker-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/usr/local/lib/python2.7/dist-packages/tensorpack/dataflow/parallel.py", line 163, in run
for dp in self.ds:
File "/usr/local/lib/python2.7/dist-packages/tensorpack/dataflow/common.py", line 116, in iter
for data in self.ds:
File "/content/drive/app/deep-voice-conversion-master/data_load.py", line 34, in get_data
wav_file = random.choice(self.wav_files)
File "/usr/lib/python2.7/random.py", line 277, in choice
return seq[int(self.random() * len(seq))] # raises IndexError if seq is empty
IndexError: list index out of range
[1115 03:43:24 @input_source.py:220] Setting up the queue 'QueueInput/input_queue' for CPU prefetching ...
Process _Worker-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/usr/local/lib/python2.7/dist-packages/tensorpack/dataflow/parallel.py", line 163, in run
for dp in self.ds:
File "/usr/local/lib/python2.7/dist-packages/tensorpack/dataflow/common.py", line 116, in iter
for data in self.ds:
File "/content/drive/app/deep-voice-conversion-master/data_load.py", line 34, in get_data
wav_file = random.choice(self.wav_files)
File "/usr/lib/python2.7/random.py", line 277, in choice
return seq[int(self.random() * len(seq))] # raises IndexError if seq is empty
IndexError: list index out of range
Process _Worker-4:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/usr/local/lib/python2.7/dist-packages/tensorpack/dataflow/parallel.py", line 163, in run
for dp in self.ds:
File "/usr/local/lib/python2.7/dist-packages/tensorpack/dataflow/common.py", line 116, in iter
for data in self.ds:
File "/content/drive/app/deep-voice-conversion-master/data_load.py", line 34, in get_data
wav_file = random.choice(self.wav_files)
File "/usr/lib/python2.7/random.py", line 277, in choice
return seq[int(self.random() * len(seq))] # raises IndexError if seq is empty
IndexError: list index out of range
[1115 03:43:24 @training.py:112] Building graph for training tower 0 on device /gpu:0 ...
[1115 03:43:24 @develop.py:96] WRN [Deprecated] ModelDescBase._build_graph() interface will be deprecated after 30 Mar. Use build_graph() instead!
[1115 03:43:25 @develop.py:96] WRN [Deprecated] get_cost() and self.cost will be deprecated after 30 Mar. Return the cost tensor directly in build_graph() instead!
[1115 03:43:25 @develop.py:96] WRN [Deprecated] ModelDescBase._get_optimizer() interface will be deprecated after 30 Mar. Use optimizer() instead!
[1115 03:43:26 @training.py:112] Building graph for training tower 1 on device /gpu:1 ...
[1115 03:43:26 @develop.py:96] WRN [Deprecated] ModelDescBase._build_graph() interface will be deprecated after 30 Mar. Use build_graph() instead!
[1115 03:43:27 @develop.py:96] WRN [Deprecated] get_cost() and self.cost will be deprecated after 30 Mar. Return the cost tensor directly in build_graph() instead!
[1115 03:43:29 @collection.py:164] These collections were modified but restored in tower1: (tf.GraphKeys.SUMMARIES: 3->5)
[1115 03:43:30 @training.py:322] 'sync_variables_from_main_tower' includes 174 operations.
[1115 03:43:30 @model_utils.py:64] Trainable Variables:
name shape dim


net1/prenet/dense1/kernel:0 [40, 128] 5120
net1/prenet/dense1/bias:0 [128] 128
net1/prenet/dense2/kernel:0 [128, 64] 8192
net1/prenet/dense2/bias:0 [64] 64
net1/cbhg/conv1d_banks/num_1/conv1d/conv1d/kernel:0 [1, 64, 64] 4096
net1/cbhg/conv1d_banks/num_1/normalize/beta:0 [64] 64
net1/cbhg/conv1d_banks/num_1/normalize/gamma:0 [64] 64
net1/cbhg/conv1d_banks/num_2/conv1d/conv1d/kernel:0 [2, 64, 64] 8192
net1/cbhg/conv1d_banks/num_2/normalize/beta:0 [64] 64
net1/cbhg/conv1d_banks/num_2/normalize/gamma:0 [64] 64
net1/cbhg/conv1d_banks/num_3/conv1d/conv1d/kernel:0 [3, 64, 64] 12288
net1/cbhg/conv1d_banks/num_3/normalize/beta:0 [64] 64
net1/cbhg/conv1d_banks/num_3/normalize/gamma:0 [64] 64
net1/cbhg/conv1d_banks/num_4/conv1d/conv1d/kernel:0 [4, 64, 64] 16384
net1/cbhg/conv1d_banks/num_4/normalize/beta:0 [64] 64
net1/cbhg/conv1d_banks/num_4/normalize/gamma:0 [64] 64
net1/cbhg/conv1d_banks/num_5/conv1d/conv1d/kernel:0 [5, 64, 64] 20480
net1/cbhg/conv1d_banks/num_5/normalize/beta:0 [64] 64
net1/cbhg/conv1d_banks/num_5/normalize/gamma:0 [64] 64
net1/cbhg/conv1d_banks/num_6/conv1d/conv1d/kernel:0 [6, 64, 64] 24576
net1/cbhg/conv1d_banks/num_6/normalize/beta:0 [64] 64
net1/cbhg/conv1d_banks/num_6/normalize/gamma:0 [64] 64
net1/cbhg/conv1d_banks/num_7/conv1d/conv1d/kernel:0 [7, 64, 64] 28672
net1/cbhg/conv1d_banks/num_7/normalize/beta:0 [64] 64
net1/cbhg/conv1d_banks/num_7/normalize/gamma:0 [64] 64
net1/cbhg/conv1d_banks/num_8/conv1d/conv1d/kernel:0 [8, 64, 64] 32768
net1/cbhg/conv1d_banks/num_8/normalize/beta:0 [64] 64
net1/cbhg/conv1d_banks/num_8/normalize/gamma:0 [64] 64
net1/cbhg/conv1d_1/conv1d/kernel:0 [3, 512, 64] 98304
net1/cbhg/normalize/beta:0 [64] 64
net1/cbhg/normalize/gamma:0 [64] 64
net1/cbhg/conv1d_2/conv1d/kernel:0 [3, 64, 64] 12288
net1/cbhg/highwaynet_0/dense1/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_0/dense1/bias:0 [64] 64
net1/cbhg/highwaynet_0/dense2/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_0/dense2/bias:0 [64] 64
net1/cbhg/highwaynet_1/dense1/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_1/dense1/bias:0 [64] 64
net1/cbhg/highwaynet_1/dense2/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_1/dense2/bias:0 [64] 64
net1/cbhg/highwaynet_2/dense1/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_2/dense1/bias:0 [64] 64
net1/cbhg/highwaynet_2/dense2/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_2/dense2/bias:0 [64] 64
net1/cbhg/highwaynet_3/dense1/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_3/dense1/bias:0 [64] 64
net1/cbhg/highwaynet_3/dense2/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_3/dense2/bias:0 [64] 64
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/gates/kernel:0 [128, 128] 16384
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/gates/bias:0 [128] 128
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/candidate/kernel:0 [128, 64] 8192
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/candidate/bias:0 [64] 64
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/gates/kernel:0 [128, 128] 16384
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/gates/bias:0 [128] 128
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/candidate/kernel:0 [128, 64] 8192
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/candidate/bias:0 [64] 64
net1/dense/kernel:0 [128, 61] 7808
net1/dense/bias:0 [61] 61
Total #vars=58, #params=363389, size=1.39MB
[1115 03:43:30 @base.py:209] Setup callbacks graph ...
[1115 03:43:31 @summary.py:38] Maintain moving average summary of 0 tensors in collection MOVING_SUMMARY_OPS.
[1115 03:43:31 @summary.py:75] Summarizing collection 'summaries' of size 3.
[1115 03:43:32 @base.py:227] Creating the session ...
2018-11-15 03:43:32.986913: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-11-15 03:43:32.986948: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-11-15 03:43:32.986975: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-11-15 03:43:32.986998: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-11-15 03:43:32.987022: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Traceback (most recent call last):
File "train1.py", line 78, in
train(args, logdir=logdir_train1)
File "train1.py", line 60, in train
launch_train_with_config(train_conf, trainer=trainer)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/train/interface.py", line 97, in launch_train_with_config
extra_callbacks=config.extra_callbacks)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/train/base.py", line 341, in train_with_defaults
steps_per_epoch, starting_epoch, max_epoch)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/train/base.py", line 312, in train
self.initialize(session_creator, session_init)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/utils/argtools.py", line 176, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/train/tower.py", line 144, in initialize
super(TowerTrainer, self).initialize(session_creator, session_init)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/utils/argtools.py", line 176, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/train/base.py", line 229, in initialize
self.sess = session_creator.create_session()
File "/usr/local/lib/python2.7/dist-packages/tensorpack/tfutils/sesscreate.py", line 43, in create_session
sess.run(tf.global_variables_initializer())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' with these attrs. Registered devices: [CPU], Registered kernels:

 [[Node: AllReduceGrads/NcclAllReduce_105 = NcclAllReduce[T=DT_FLOAT, num_devices=2, reduction="sum", shared_name="c52", _device="/device:GPU:1"](tower1/gradients/tower1/net1/cbhg/gru/bidirectional_rnn/bw/bw/while/bw/gru_cell/gates/gates/MatMul/Enter_grad/b_acc_3)]]

Caused by op u'AllReduceGrads/NcclAllReduce_105', defined at:
File "train1.py", line 78, in
train(args, logdir=logdir_train1)
File "train1.py", line 60, in train
launch_train_with_config(train_conf, trainer=trainer)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/train/interface.py", line 87, in launch_train_with_config
model._build_graph_get_cost, model.get_optimizer)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/utils/argtools.py", line 176, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/train/tower.py", line 204, in setup_graph
train_callbacks = self._setup_graph(input, get_cost_fn, get_opt_fn)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/train/trainers.py", line 186, in _setup_graph
self._make_get_grad_fn(input, get_cost_fn, get_opt_fn), get_opt_fn)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/graph_builder/training.py", line 244, in build
all_grads = allreduce_grads(all_grads, average=self._average) # #gpu x #param
File "/usr/local/lib/python2.7/dist-packages/tensorpack/tfutils/scope_utils.py", line 94, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorpack/graph_builder/utils.py", line 157, in allreduce_grads
summed = nccl.all_sum(grads)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/nccl/python/ops/nccl_ops.py", line 48, in all_sum
return _apply_all_reduce('sum', tensors)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/nccl/python/ops/nccl_ops.py", line 154, in _apply_all_reduce
shared_name=shared_name))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/nccl/ops/gen_nccl_ops.py", line 43, in nccl_all_reduce
shared_name=shared_name, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'NcclAllReduce' with these attrs. Registered devices: [CPU], Registered kernels:

 [[Node: AllReduceGrads/NcclAllReduce_105 = NcclAllReduce[T=DT_FLOAT, num_devices=2, reduction="sum", shared_name="c52", _device="/device:GPU:1"](tower1/gradients/tower1/net1/cbhg/gru/bidirectional_rnn/bw/bw/while/bw/gru_cell/gates/gates/MatMul/Enter_grad/b_acc_3)]]
@LukeJacob
Copy link

I got the same problem. waiting for answers. BTW, Is ffmpeg necessary for this project?

@dheerajinampudi
Copy link

@jackylee1 -gpu=0 worked for me. @LukeJacob ffmpeg is necessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants