Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

pruned model size no change and inference time is even longer #2225

Closed
misslibra opened this issue Mar 23, 2020 · 19 comments · Fixed by #2579
Closed

pruned model size no change and inference time is even longer #2225

misslibra opened this issue Mar 23, 2020 · 19 comments · Fixed by #2579

Comments

@misslibra
Copy link

nni Environment:pytorch

  • nni version:1.4.1
  • nni mode(local|pai|remote):local
  • OS:ubuntu 16.04
  • python version:3.6
  • is conda or virtualenv used?: conda
  • is running in docker?:no
    I run the example code: model_prune_torch.py
    and the pretrain_naive model is 1.7M, the pruned_model is also 1.7M,the same with the mask.
    The inference time using pretrain model is 0.4ms,but for the pruned model, time increase to 1.5ms.
    I am so confused that what the function of the example? isn't is downscale the model and speedup?
    and I also try the speedup method follow the example for my model base on YOLOv3 , still the same .
    Please help me what is going wrong ?
    Thx!
@QuanluZhang QuanluZhang self-assigned this Mar 23, 2020
@QuanluZhang
Copy link
Contributor

@misslibra thanks for reporting this issue. It is expected that the pruned model is also 1.7M, because the pruners are responsible for finding weight masks that make the model still performs reasonably good. ModelSpeedup is responsible for making the model smaller based on the generated masks.

For your case, could you tell us how you measured the number 1.5ms? with pruner applied? or loading the saved model weight checkpoint to the original model (i.e., before pruning)? if the former, inference latency should be higher because weights should be multiplied by the masks in forward. if the latter, the inference latency should not be different.

For ModelSpeedup, it would be great if you can share the code with us, so that we can check whether your model is really compressed.

@misslibra
Copy link
Author

misslibra commented Mar 24, 2020

Thanks for your support!
I add measure time code in test,
s_time = time.time()
output = model(data)
print('inference time is : ', (time.time() - s_time)*1000 )
before model pruning, time is 0.7ms, and after prune it is 1.5ms
now I understand that mask multiply take time.

@misslibra
Copy link
Author

misslibra commented Mar 24, 2020

`

if name == "main":

parser = argparse.ArgumentParser()

parser.add_argument("--image_folder", type=str, default="data/demo_data/image_ori/", help="path to dataset")

parser.add_argument("--model_def", type=str, default="config/geely_yolo3d.cfg", help="path to model definition file")

# parser.add_argument("--weights_path", type=str, default="weights/yolov3.weights", help="path to weights file")

parser.add_argument("--class_path", type=str, default="data/geely.names", help="path to class label file")

parser.add_argument("--conf_thres", type=float, default=0.5, help="object confidence threshold")

parser.add_argument("--nms_thres", type=float, default=0.5, help="iou thresshold for non-maximum suppression")

parser.add_argument("--batch_size", type=int, default=1, help="size of the batches")

parser.add_argument("--n_cpu", type=int, default=0, help="number of cpu threads to use during batch generation")

parser.add_argument("--img_size", type=int, default=(192, 640), help="size of each image dimension")

parser.add_argument("--weights_path", type=str, default='./weights/best_model_Epoch_1060_step_619624_mAP_0.7210_lr_0.0001', help="path to checkpoint model")

opt = parser.parse_args()

# print(opt)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# config_list = [{'sparsity': 1, 'op_types': ['Conv2d']}]

model = Darknet(opt.model_def, img_size=opt.img_size).to(device)

model.load_state_dict(torch.load(opt.weights_path))

if compression == 'prune':
    print('do prune')
    config_list = [{ 'sparsity': 0.2, 'op_types': ['default'] }]##Conv2d
    pruner = L1FilterPruner(model, config_list)
    # pruner = ActivationMeanRankFilterPruner(model, config_list)
    pruner.compress()
    pruner.export_model('model.pth', 'mask.pth')

"""model inference time"""
if do_speedup_detection:
    # Get dataloader
    dataloader = DataLoader(
        ImageFolder(opt.image_folder, img_size=opt.img_size),
        batch_size=opt.batch_size,
        shuffle=False,
        num_workers=opt.n_cpu,
    )

    Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
    model.eval()
    masks_file = './mask.pth'
    apply_compression_results(model, masks_file)
    for batch_i, (img_paths, input_imgs) in enumerate(dataloader):
        input_imgs = Variable(input_imgs.type(Tensor))
        input_imgs = input_imgs.to(device)
        with torch.no_grad():
            start = time.time()
            detections = model(input_imgs)
            durable = time.time()
            print('inference time : ', 1000*(durable - start))
        # break`

@misslibra
Copy link
Author

this is my code to use speedup

@misslibra
Copy link
Author

misslibra commented Mar 24, 2020

I try to load new model exported by pruner.export_model, and use use_mask logic , inference time is still not cut down.

`

"""model inference time"""

if do_speedup_detection:

    print('------')

    model_1 = Darknet(opt.model_def, img_size=opt.img_size).to(device)

    model_1.load_state_dict(torch.load('model.pth'))

    # Get dataloader

    dataloader = DataLoader(
        ImageFolder(opt.image_folder, img_size=opt.img_size),
        batch_size=opt.batch_size,
        shuffle=False,
        num_workers=opt.n_cpu,
    )

    Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor

    model_1.eval()

    # masks_file = './mask.pth'
    if use_mask:
       apply_compression_results(model, masks_file)
    # else:
    #     m_speedup = ModelSpeedup(model, input_imgs, masks_file)
    #     m_speedup.speedup_model()

    for batch_i, (img_paths, input_imgs) in enumerate(dataloader):
        input_imgs = Variable(input_imgs.type(Tensor))
        input_imgs = input_imgs.to(device)
        with torch.no_grad():
            start = time.time()
            detections = model_1(input_imgs)
            durable = time.time()
            print('inference time : ', 1000*(durable - start))
        # break`

@misslibra
Copy link
Author

and how to use ModelSpeedup to get a smaller model ?
when I use ModelSpeedup, I get error:


/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:291: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if grid_size[0] != self.grid_size[0]:
/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:257: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert self.stride == self.img_dim[1] / self.grid_size[1]
/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:262: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors])
/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:299: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
pred_boxes = FloatTensor(prediction[..., :4].shape)
Traceback (most recent call last):
File "/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/model_compression.py", line 111, in
m_speedup = ModelSpeedup(model_1, input_imgs, masks_file)
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/nni/compression/speedup/torch/compressor.py", line 91, in init
self.trace_graph = torch.jit.trace(model, dummy_input)
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 858, in trace
check_tolerance, _force_outplace, _module_class)
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 1007, in trace_module
check_tolerance, _force_outplace, True, _module_class)
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
return func(*args, **kwargs)
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 676, in _check_trace
raise TracingCheckError(*diag_info)
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Graphs differed across invocations!
Graph diff:
graph(%self : ClassType,
%x.1 : Tensor):
%2 : ClassType = prim::GetAttrname="module_list"
%3 : ClassType = prim::GetAttrname="0"
%4 : ClassType = prim::GetAttrname="conv_0"

@QuanluZhang
Copy link
Contributor

`

if name == "main":

parser = argparse.ArgumentParser()

parser.add_argument("--image_folder", type=str, default="data/demo_data/image_ori/", help="path to dataset")

parser.add_argument("--model_def", type=str, default="config/geely_yolo3d.cfg", help="path to model definition file")

# parser.add_argument("--weights_path", type=str, default="weights/yolov3.weights", help="path to weights file")

parser.add_argument("--class_path", type=str, default="data/geely.names", help="path to class label file")

parser.add_argument("--conf_thres", type=float, default=0.5, help="object confidence threshold")

parser.add_argument("--nms_thres", type=float, default=0.5, help="iou thresshold for non-maximum suppression")

parser.add_argument("--batch_size", type=int, default=1, help="size of the batches")

parser.add_argument("--n_cpu", type=int, default=0, help="number of cpu threads to use during batch generation")

parser.add_argument("--img_size", type=int, default=(192, 640), help="size of each image dimension")

parser.add_argument("--weights_path", type=str, default='./weights/best_model_Epoch_1060_step_619624_mAP_0.7210_lr_0.0001', help="path to checkpoint model")

opt = parser.parse_args()

# print(opt)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# config_list = [{'sparsity': 1, 'op_types': ['Conv2d']}]

model = Darknet(opt.model_def, img_size=opt.img_size).to(device)

model.load_state_dict(torch.load(opt.weights_path))

if compression == 'prune':
    print('do prune')
    config_list = [{ 'sparsity': 0.2, 'op_types': ['default'] }]##Conv2d
    pruner = L1FilterPruner(model, config_list)
    # pruner = ActivationMeanRankFilterPruner(model, config_list)
    pruner.compress()
    pruner.export_model('model.pth', 'mask.pth')

"""model inference time"""
if do_speedup_detection:
    # Get dataloader
    dataloader = DataLoader(
        ImageFolder(opt.image_folder, img_size=opt.img_size),
        batch_size=opt.batch_size,
        shuffle=False,
        num_workers=opt.n_cpu,
    )

    Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
    model.eval()
    masks_file = './mask.pth'
    apply_compression_results(model, masks_file)
    for batch_i, (img_paths, input_imgs) in enumerate(dataloader):
        input_imgs = Variable(input_imgs.type(Tensor))
        input_imgs = input_imgs.to(device)
        with torch.no_grad():
            start = time.time()
            detections = model(input_imgs)
            durable = time.time()
            print('inference time : ', 1000*(durable - start))
        # break`

@misslibra there are two issues in your code. First, after calling pruner.compress() you should fine tune your model. pruner.compress() generates masks based on for example model weights, but it does not fine tune model for you, you still need to write fine tune logic after calling pruner.compress(). Second, apply_compression_results is expected to make inference slower if you use v1.4.1, please try master branch which would not make inference slower.

@misslibra
Copy link
Author

@QuanluZhang Thank you so much ! I will try now and update the result

@QuanluZhang
Copy link
Contributor

QuanluZhang commented Mar 24, 2020

apply_compression_results simply multiplies generated masks to weights, it does not speedup model inference. ModelSpeedup does, but ModelSpeedup is still in Alpha release, it only supports torch 1.3.1, please refer to https://nni.readthedocs.io/en/latest/Compressor/ModelSpeedup.html for details.

BTW, the following two examples might help:
https://github.com/microsoft/nni/blob/master/examples/model_compress/model_speedup.py
https://github.com/microsoft/nni/blob/master/examples/model_compress/model_prune_torch.py

@misslibra
Copy link
Author

apply_compression_results simply multiplies generated masks to weights, it does not speedup model inference. ModelSpeedup does, but ModelSpeedup is still in Alpha release, it only supports torch 1.3.1, please refer to https://nni.readthedocs.io/en/latest/Compressor/ModelSpeedup.html for details.

My torch version is 1.3.1

@QuanluZhang
Copy link
Contributor

and how to use ModelSpeedup to get a smaller model ?
when I use ModelSpeedup, I get error:

/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:291: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if grid_size[0] != self.grid_size[0]:
/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:257: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert self.stride == self.img_dim[1] / self.grid_size[1]
/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:262: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors])
/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:299: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
pred_boxes = FloatTensor(prediction[..., :4].shape)
Traceback (most recent call last):
File "/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/model_compression.py", line 111, in
m_speedup = ModelSpeedup(model_1, input_imgs, masks_file)
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/nni/compression/speedup/torch/compressor.py", line 91, in init
self.trace_graph = torch.jit.trace(model, dummy_input)
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 858, in trace
check_tolerance, _force_outplace, _module_class)
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 1007, in trace_module
check_tolerance, _force_outplace, True, _module_class)
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
return func(*args, **kwargs)
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 676, in _check_trace
raise TracingCheckError(*diag_info)
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Graphs differed across invocations!
Graph diff:
graph(%self : ClassType,
%x.1 : Tensor):
%2 : ClassType = prim::GetAttrname="module_list"
%3 : ClassType = prim::GetAttrname="0"
%4 : ClassType = prim::GetAttrname="conv_0"

looks like a bug in torch.jit, some related issues in pytorch:
pytorch/pytorch#23993
pytorch/pytorch#33491

@misslibra
Copy link
Author

and how to use ModelSpeedup to get a smaller model ?
when I use ModelSpeedup, I get error:
/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:291: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if grid_size[0] != self.grid_size[0]:
/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:257: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert self.stride == self.img_dim[1] / self.grid_size[1]
/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:262: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors])
/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:299: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
pred_boxes = FloatTensor(prediction[..., :4].shape)
Traceback (most recent call last):
File "/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/model_compression.py", line 111, in
m_speedup = ModelSpeedup(model_1, input_imgs, masks_file)
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/nni/compression/speedup/torch/compressor.py", line 91, in init
self.trace_graph = torch.jit.trace(model, dummy_input)
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 858, in trace
check_tolerance, _force_outplace, _module_class)
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 1007, in trace_module
check_tolerance, _force_outplace, True, _module_class)
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
return func(*args, **kwargs)
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 676, in _check_trace
raise TracingCheckError(*diag_info)
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Graphs differed across invocations!
Graph diff:
graph(%self : ClassType,
%x.1 : Tensor):
%2 : ClassType = prim::GetAttrname="module_list"
%3 : ClassType = prim::GetAttrname="0"
%4 : ClassType = prim::GetAttrname="conv_0"

looks like a bug in torch.jit, some related issues in pytorch:
pytorch/pytorch#23993
pytorch/pytorch#33491

this error can be solved by this notice(from source code):
ModelSpeedup( model, dummy_input, masks_file)
The dummy input for jit.trace, users should put it on right device before pass in

@QuanluZhang
Copy link
Contributor

@misslibra thanks for sharing the cause.

@misslibra
Copy link
Author

misslibra commented Mar 31, 2020

@QuanluZhang hi ,when I try to apply ModelSpeedup with Pytorch model nesnet50
`
import torchvision.models as models

model = models.resnet50(pretrained=False)

m_speedup = ModelSpeedup(model, input_imgs, masks_file)

m_speedup.speedup_model() `
I hit error :

File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/nni/compression/speedup/torch/compressor.py", line 496, in infer_module_mask
self.infer_module_mask(_module_name, in_shape=output_cmask)
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/nni/compression/speedup/torch/compressor.py", line 496, in infer_module_mask
self.infer_module_mask(_module_name, in_shape=output_cmask)
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/nni/compression/speedup/torch/compressor.py", line 474, in infer_module_mask
.format(m_type, module_name))
RuntimeError: Has not supported infering output shape from input shape for module/function: aten::_convolution, ResNet/Sequential[layer1]/Bottleneck[1]/Conv2d[conv1].aten::_convolution.1

and I try to add "aten::_convolution" in map : infer_from_inshape in infer_shape.py .
BUT, error still happen for another item "aten::_add"......
Is it because this function not suitable for resnet ? Or I still need to do other modify?

@QuanluZhang
Copy link
Contributor

@misslibra ModelSpeedup relies on shape inference of operations to figure out what modules should be replaced and how. In the current alpha release, we only support limited operations/modules for shape inference. We are working on simplifying the process and interface for adding new operation/module support, will be included in future release.

Specifically for the error you encountered, seems like induced by a bug that has been already fixed. Could you pull the latest master branch, source install and try ModelSpeedup again?

@TangChangcheng
Copy link

@misslibra ModelSpeedup relies on shape inference of operations to figure out what modules should be replaced and how. In the current alpha release, we only support limited operations/modules for shape inference. We are working on simplifying the process and interface for adding new operation/module support, will be included in future release.

Specifically for the error you encountered, seems like induced by a bug that has been already fixed. Could you pull the latest master branch, source install and try ModelSpeedup again?

Hello, I also encounter problems when I tried to speed up ResNet. It seems like some conflicts occur between 2 shortcuts. For example, A -> conv_bn_relu_1 -> B, out1 = A+B -> conv_bn_relu_2 -> C, out2 = out1 + C, the mask of B should be apply on out1 and out2 because of successor relationships, but it conflicts with the mask of C...

@QuanluZhang
Copy link
Contributor

@TangChangcheng could you provide an executable python script along with the mask file you use, so that we can reproduce the problem?

@QuanluZhang
Copy link
Contributor

@TangChangcheng @misslibra your issue may be resolved by pr @2579

@suyashhchougule
Copy link

suyashhchougule commented May 21, 2021

hi @misslibra which pytorch version were you using for L!filterpruner ? I am having import error with pytorch 1.8.1 .

ImportError: cannot import name 'L1FilterPruner' from 'nni.compression.pytorch'

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants