-
-
Notifications
You must be signed in to change notification settings - Fork 16.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pruning/Sparsity Tutorial #304
Comments
@glenn-jocher why the speed doesn't change at all after prune? Is that only remove the weight of conv but not changed the structure actually? how to save the pruned model and it's architecture for retraining? |
Is there a guideline on how much we should prune by? What are the benefits to doing this? |
@jinfagang yes, structure is not changed at all, and parameter count is the same, it's just that some of the weights are 0 instead of near zero as they were before. I suppose this would allow for effective kmeans quantization to lower bits (for smaller filesizes), but I'm not sure about any possible speed improvement. I think as long as the parameter count remains the same, the speed will remain the same. @NanoCode012 no guidelines really, its just an experiment to see how many of the weights you can remove and what effect that has on performance. Honestly I don't really see any great applications at the moment based on my results above, but it's there in case anyone would like to explore it further. |
@glenn-jocher Looka like prune has a
and all weights and params saved in module.state_dict which can be used for new pruned model. |
@jinfagang yes, this .remove() method is deleting the original weights as there is a pruned copy also in the model. So before applying remove the model/module will have 2X the normal parameters, after using it it is back to it's normal parameter count. You have to consider the shapes of the operations in the forward pass. For a convolution from say shape(1,128,20,20) to shape(1,256,20,20) you must have a weight matrix of shape 128x256. It's not possible to remove elements from a normal matrix or tensor, as it will always need 128*256 weights inside it. There are special cases of sparse matrices in some packages/languages, it may be possible pytorch is converting the original tensor to a sparse tensor with the same shape, though I'm not sure if this is the case. Even if it were, any exported models (i.e. onnx, coreml, tensorrt) using these sparse matrices would need special support for them, or they would be handled as normal matrices. |
The current pruning method incorporates the line of code you mention already as well: Lines 88 to 97 in 121d90b
|
@glenn-jocher Nice. do u figure out how to obtain the pruned model architecture? |
@jinfagang well that's what I was saying, the architecture does not change. In my example above, the 128x256 convolution weights are still a 128x256 weights, it's just that some of their values that were previously near-zero have been set equal to zero during the pruning. The 128x256 matrix may or may not then be stored as a sparse matrix, which is a special type of matrix intended for use with data that contains mostly zeros, and saves memory (and maybe or maybe not also saves processing time). TLDR the architecture is exactly the same when pruning, no layers are removed as far as I know, and the input and output shapes (and shapes of all intermediate layers) remain the same. |
@glenn-jocher so the simplified model can not get it's new channel num and shape automatically, is there anyway to make it happen? |
@glenn-jocher First feel your work! Let me ask you, which paper or project address is your pruning based on? |
@Lornatang I based this pruning implementation off of the original pytorch pruning tutorial at the link below, but the idea to apply pruning here originally came from @jinfagang. I don't actually have any experience pruning models. @jinfagang I modified detect.py to prune and save, and print updated model info: # Load model
model = attempt_load(weights, map_location=device) # load FP32 model
torch_utils.model_info(model)
torch.save({'model': model}, 'model_normal.pt')
torch_utils.prune(model, 0.3)
torch_utils.model_info(model)
torch.save({'model': model}, 'model_pruned.pt') Output:
|
So maybe layer pruning or channel-level sparsity works better since it changes the architecture of the network? |
@HenryWang628 I see, thanks for the link. The tensorboard histograms are very nice. So it seems a more useful method would be channel prune, mAP drop > finetune x epochs, recover some lost mAP. This all raises the question though, if you are going to go through all of this effort on a large model like YOLOv5x, why not just train a smaller model like YOLOv5s? The training time will be much faster, and you don't need the extra pruning and finetuning steps. |
For anyone interested, there is a detailed discussion on this here pytorch/tutorials#1054 (comment) The author there says this:
|
More info from pytorch/tutorials#605 (comment)
|
@glenn-jocher I think you can refer to https://github.com/vainf/torch-pruning, he has implemented this function in detail. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hi, thank you everyone for the informative comments. Thanks Glen for this super-cool library. Not sure if there is a way to implement a line like - "sparsified = pruned.to_sparse()" (pytorch/tutorials#605 (comment)) for nn.conv2d? I am trying to reduce the overall model weights. Eventually, I want to port this to a Jetson Nano. My understanding is that a smaller model yields --> faster speeds. Please correct me if my understanding is wrong. Thanks. |
@shoebNTU any speed benefits would depend on the capability of your hardware and drivers to exploit sparse matrices, so there is no single answer to your question. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@glenn-jocher I refer to the following projects: As the sparse training epoch progresses, more and more gamma approaches 0 by looking at tensorboard bn. After training, pruning can be performed. A basic principle is that the threshold cannot be greater than the maximum gamma of any channel bn. Then prune according to the percentage. |
@jayer95 our tutorial is in need of updating! I wrote it myself a while ago. If you'd like to propose updates/fixes that would be awesome to help everyone :) |
@glenn-jocher Sure, I got it :) |
Hello, is it possible to retrain pruned model? We have trained yolov5 on our custom data, then pruned the model, and would like to retrain it on the same custom data. The naive attempt to perform normal training on the pruned model was not successful and the following error was caught: |
Hi, Thanks a lot for the tutorial and the very insightful conversation. I have successfully managed to prune and save yolov5s. However, when I come to run val.py on the saved model I get the following error:
Note, the val.py works fine when I run it using the yolov5s.pt model, but throws out the error above when running the pruned saved model. I used the code provided earlier in this conversation to save the model (https://docs.ultralytics.com/yolov5/tutorials/model_pruning_and_sparsity#issuecomment-655284445). I think the issue might be in how the model gets saved rather than the pruning, because I also tried just simply saving the yolov5s.pt model without the pruning using the save code provided here https://docs.ultralytics.com/yolov5/tutorials/model_pruning_and_sparsity#issuecomment-655284445 and it resulted in the same error when running val.py on it. I have been looking at this for a while and can not seem to find what is causing this error or what is the issue with the saving method. The only thing I was able to spot is that the files inside the yolov5s.pt/data/ and yolov5s_fp_32_pruned.pt/data/ have different numerals. See attached screenshots below. Could this be the issue? if yes, any idea what is causing it and how to correct it please? Thanks |
I have same problem. In yolov5, the pt file is a ckpt, not just the model part. My ugly solution is create a new ckpt, and copy all options except the model from the original ckpt to new new ckpt, and set the pruned model to the new ckpt. |
@relaxtheo hi, The error may be caused by how the model saves in the detect.py file. In YOLOv5, the .pt file is a checkpoint that contains the whole model, not just the model part. Therefore, when you save a pruned model, you're saving a checkpoint file that still contains the original unpruned parameters, which can cause issues with loading the pruned model. One solution could be to create a new checkpoint file and manually copy all options except the model from the original checkpoint to the new checkpoint. Then, you can set the pruned model to the new checkpoint. This could help ensure that the pruned model is loaded correctly in val.py. Alternatively, you could try using the latest version of YOLOv5, which may have some updates related to model pruning and loading. You can also check the saved model and make sure that it only contains the pruned weights and not the original unpruned weights. I hope this helps! Let me know if you have any further questions. |
After the model
Thank you very much for the reply. I am currently using v6.2, compare to the latest code, the prune method has no change, and seems a bit change in attempt_load function. But what makes me confusing is what I can gain from this pruning? Seems model file size has no change, parameters number keeps same, inference speed has no change, so it seems I can only get a worse model with low inference performance without any gain |
@relaxtheo thank you for your response. Model pruning can help reduce the computation required for inference by removing redundant and unnecessary parameters from the model. Although the file size and number of parameters may not change significantly, the inference speed can be improved if the pruning is performed correctly. However, the effectiveness of pruning may depend on the specific model architecture and the amount of pruning applied. It's possible that in your case, the pruning method didn't achieve significant improvements in speed or performance. If you're looking to improve the performance of your model, you may want to try other optimization techniques such as quantization or knowledge distillation. These methods can help reduce the size and computation required for inference, resulting in faster and more efficient models. I hope this helps! If you have any further questions or concerns, please let me know. |
Thank you very much! |
@relaxtheo hi there, Thanks for sharing your experience with model pruning in YOLOv5. While model pruning aims to reduce the computation required for inference by removing redundant and unnecessary parameters, the effectiveness of pruning may depend on various factors, including the specific model architecture and the amount of pruning applied. Therefore, it's possible that in your case, the pruning method you used didn't achieve significant improvements in speed or performance. If you're looking to further optimize your model, you may want to consider other approaches such as quantization or knowledge distillation. These optimization techniques can help reduce the size and computation required for inference, resulting in faster and more efficient models. Please let us know if you have any further questions or concerns. We're here to help! Best, [Your name/Team name] |
@relaxtheo I think the current pruning method is specifically "unstructured pruning" (correct me if I am wrong) where filters with small weight magnitudes are set to 0s, but they are still stored in the model weight file (i.e. <model>.pth) and those zero values are not actually removed which still take some space in the disk. That's why the model file size is not changed. During inference, unless the code has an explicit way to accelerate like skipping those zeros, it will still do the same amount of computation on those parameters with zero values. But the advantage is that I treat it as an efficient way to estimate how the model performance can preserve and the potential to accelerate, so that I know when to actually prune the model in the next step. The thing you are looking for might be "structure pruning" (https://github.com/VainF/Torch-Pruning) that actually removes those zeros after pruning to save both space and time, but it is not easy to implement due to the dependency among layers in various network architectures. |
@bryanbocao hi there, Thank you for reaching out. You are correct that the current pruning method in YOLOv5 uses unstructured pruning, where filters with small weight magnitude are set to 0s, while they are still stored in the weight file. As a result, the model file size may not change significantly, and inference speed may not be improved unless the code has an explicit way to accelerate like skipping those zeros. Structure pruning, on the other hand, removes those zeros after pruning to save both space and time. However, implementing structure pruning may not be easy due to the dependency among layers in various network architectures. We appreciate your feedback on this issue, and we'll keep it in mind as we continue to improve YOLOv5. If you have any further questions or concerns, don't hesitate to let us know. Best, [Your name/Team name] |
@bryanbocao @glenn-jocher Thank you all very much, I will try your recommendations |
@relaxtheo Thank you for reaching out, and we're glad to hear that our recommendations were helpful. Don't hesitate to let us know if you have any further questions or concerns. We're here to help! |
@Mary14-design it seems like there might be an issue with the image link you've provided; it's not displaying correctly. However, I'm here to help you with your training issue. Could you please provide more details about the error message you're encountering during training with YOLOv5? This will help me understand the problem better and assist you accordingly. If you can copy and paste the error message or describe the issue in more detail, that would be great. |
📚 This guide explains how to apply pruning to YOLOv5 🚀 models. UPDATED 25 September 2022.
Before You Start
Clone repo and install requirements.txt in a Python>=3.7.0 environment, including PyTorch>=1.7. Models and datasets download automatically from the latest YOLOv5 release.
Test Normally
Before pruning we want to establish a baseline performance to compare to. This command tests YOLOv5x on COCO val2017 at image size 640 pixels.
yolov5x.pt
is the largest and most accurate model available. Other options areyolov5s.pt
,yolov5m.pt
andyolov5l.pt
, or you own checkpoint from training a custom dataset./weights/best.pt
. For details on all available models please see our README table.Output:
Test YOLOv5x on COCO (0.30 sparsity)
We repeat the above test with a pruned model by using the
torch_utils.prune()
command. We updateval.py
to prune YOLOv5x to 0.3 sparsity:30% pruned output:
In the results we can observe that we have achieved a sparsity of 30% in our model after pruning, which means that 30% of the model's weight parameters in
nn.Conv2d
layers are equal to 0. Inference time is essentially unchanged, while the model's AP and AR scores a slightly reduced.Environments
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
Status
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.
The text was updated successfully, but these errors were encountered: