Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluid benchmark support recordio reader #11121

Conversation

typhoonzero
Copy link
Contributor

This can also fix the issue when running with --gpus > 1

label = fluid.layers.data(name='label', shape=[1], dtype='int64')
if args.use_reader_op:
filelist = [
os.path.join(args.data_path, f) for f in os.listdir(args.data_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use glob to specify the files.

and batch_size you choose:

```bash
python -c 'from recordio_converter import *; prepare_mnist("data", 32)'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to set batch_size=1, we can set the batch_size in the trainer reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@luotao1 luotao1 mentioned this pull request Jun 4, 2018

iters, num_samples, start_time = 0, 0, time.time()
for pass_id in range(args.pass_num):
train_losses = []
for batch_id, data in enumerate(train_reader()):
reader_generator = train_reader()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reader_generator = train_reader() ==>

if not args.use_reader_op:
    reader_generator = train_reader()

num_samples += len(data)
batch_id += 1
# FIXME(wuyi): last batch size maybe different
num_samples += len(args.batch_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For use_reader_op, if the current pass is not the last, the last batch of this pass is also equal to args.batch_size.

for pass_id in range(args.pass_num):
num_samples = 0
iters = 0
start_time = time.time()
for batch_id, data in enumerate(train_reader()):
reader_generator = train_reader()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reader_generator = train_reader() ==>

if not args.use_reader_op:
    reader_generator = train_reader()

thread_num=args.gpus)
data_file = fluid.layers.double_buffer(
fluid.layers.batch(
data_file, batch_size=args.batch_size))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For use_reader_op, the batch_size of fluid.layers.batch is set with a single card, this is to say if the batch size is 256 when training Vgg, and the machine has 4 cards, the batch_size for fluid.layers.batch should be 64.

thread_num=args.gpus)
data_file = fluid.layers.double_buffer(
fluid.layers.batch(
data_file, batch_size=args.batch_size))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@@ -296,9 +331,10 @@ def train_parallel(avg_loss, infer_prog, optimizer, train_reader, test_reader,
if iters == args.skip_batch_num:
start_time = time.time()
num_samples = 0
if iters == args.iterations:
# NOTE: if use reader ops, the input data is not splited to multiple cards
if args.use_reader_op and iters >= args.iterations / args.gpus:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think iters >= args.iterations / args.gpus is appropriate.
Because the model's accuracy is highly related to the new parameters that have learned, but the new parameters may be related to the times of updating parameter. So maybe we should not do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, args.iterations is intended to let the benchmark finish fast, no concerns for model accuracy. To run a full model training, we can set args.iterations to -1 so that it can run until all train data have been fed.

@@ -266,7 +266,10 @@ def train(avg_loss, infer_prog, optimizer, train_reader, test_reader, batch_acc,
# FIXME(wuyi): For use_reader_op, if the current
# pass is not the last, the last batch of this pass
# is also equal to args.batch_size.
num_samples += len(args.batch_size)
if args.use_reader_op:
num_samples += args.batch_size
Copy link
Contributor

@chengduoZH chengduoZH Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args.batch_size is the batch size on each GPU now. So it should be num_samples += args.batch_size * args.gpus.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much! Done. Current know issue, if set --use_reader_op we must also set --no_test will fix this in next PR.

@typhoonzero typhoonzero merged commit 635099c into PaddlePaddle:develop Jun 7, 2018
@typhoonzero typhoonzero deleted the fluid_benchmark_support_recordioreader branch June 7, 2018 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants