Problem with gluon.utils.split_data() #17117

zburning · 2019-12-19T03:12:36Z

Description

The current gluon.utils.split_data() has:

step = size // num_slice

# If size < num_slice, make fewer slices
if not even_split and size < num_slice:
        step = 1
        num_slice = size

if batch_axis == 0:
        slices = [data[i*step:(i+1)*step] if i < num_slice - 1 else data[i*step:size]
                  for i in range(num_slice)]

Considering an example:
we have a tensor of shape (31, *), and we want to split it into 8 slices. According to the function, step will be (31 // 8 = 3), so that the tensor will be split into 8 tensors of size [3, 3 ,3 ,3 ,3 ,3, 3, 10], in which the last tensor is excessive large. A better result could be [4, 4, 4, 4, 4, 4, 4, 3]

Maybe we can follow np.array_split()?

Error Message

(Paste the complete error message. Please also include stack trace by setting environment variable DMLC_LOG_STACK_TRACE_DEPTH=10 before running your script.)

To Reproduce

(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)

Steps to reproduce

(Paste the commands you ran that produced the error.)

What have you tried to solve it?

Environment

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python

# paste outputs here

The text was updated successfully, but these errors were encountered:

wkcn · 2019-12-19T03:23:22Z

slice_len = length // num_slice
rest = length % num_slice
start = slice_len * index + min(index, rest)
end = start + slice_len + (index < rest)

zburning · 2019-12-19T03:30:18Z

Thank you, this is a clean solution.

leezu · 2019-12-19T04:41:59Z

Following np.array_split is a good idea. It should have been done from the beginning. Would you like to create a PR?

zburning · 2019-12-19T06:12:05Z

@leezu Yes

sxjscience · 2019-12-19T19:33:35Z

@leezu @zburning How about labeling it as a performance issue?

zburning added the Bug label Dec 19, 2019

wkcn added the Gluon label Dec 19, 2019

leezu added API change Feature request Bug and removed Bug labels Dec 19, 2019

leezu assigned zburning Dec 19, 2019

zburning mentioned this issue Dec 19, 2019

refactor gluon.utils.split_data() following np.array_split() #17123

Merged

7 tasks

sxjscience added Performance and removed Bug labels Dec 20, 2019

haojin2 closed this as completed in #17123 Jan 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with gluon.utils.split_data() #17117

Problem with gluon.utils.split_data() #17117

zburning commented Dec 19, 2019

wkcn commented Dec 19, 2019

zburning commented Dec 19, 2019

leezu commented Dec 19, 2019

zburning commented Dec 19, 2019

sxjscience commented Dec 19, 2019

Problem with gluon.utils.split_data() #17117

Problem with gluon.utils.split_data() #17117

Comments

zburning commented Dec 19, 2019

Description

Error Message

To Reproduce

Steps to reproduce

What have you tried to solve it?

Environment

wkcn commented Dec 19, 2019

zburning commented Dec 19, 2019

leezu commented Dec 19, 2019

zburning commented Dec 19, 2019

sxjscience commented Dec 19, 2019