-
Notifications
You must be signed in to change notification settings - Fork 869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Refinement to Improve High Resolution Image Inpainting #112
Changes from 3 commits
24a20f8
809ba78
c1d0f16
3f571e6
e4b276b
63d7816
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,3 +12,13 @@ dataset: | |
|
||
device: cuda | ||
out_key: inpainted | ||
|
||
refine: False # refiner will only run if this is True | ||
refiner: | ||
gpu_ids: 0,1 # the GPU ids of the machine to use. If only single GPU, use: "0," | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd suggest using only 0 by default - or even introduce "None" default (so refiner would rely on the parent There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually the refiner needs around 24GB GPU to process 1.8 megapixel images (~1200x1500). Since most people have two 12GB GPUs instead, we decided to split the model onto two GPUs, that's why the default config setting. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, right, I have not thought about memory consumption. It seems that the most of the consumption comes from storing activations for backward... And you're splitting res-blocks between GPUs to distribute that memory - not to speedup inference - because GPUs are called sequentially. I have a couple ideas how to overcome that without complex logic or requirement to have two GPUs:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the ideas! We were already looking at activation checkpointing, will focus on it more now that you have also mentioned it. Will try the third idea also. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. torch.cuda.amp isn't working because pytorch doesn't seem to support There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Great, thank you!
Sure, I've forgot that I've already tried half and failed because of that... We could wrap rfftn/irfftn with conversion to and from .float(), but I'm not sure there wouldn't be other issues.. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hello @windj007, sorry for coming back after 2 months! We picked up the experiments, our findings:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you please share your code for mixed-precision? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, it's in https://github.com/geomagical/lama-with-refiner/tree/amp_float16 You can get this code by:
Also, I've changed the config file of the refiner to run on a single GPU. But yeah feel free to play around with config parameters or anything :) Link to the config file in the code: https://github.com/geomagical/lama-with-refiner/blob/amp_float16/configs/prediction/default.yaml There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you so much 🙏 |
||
modulo: ${dataset.pad_out_to_modulo} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @windj007 refiner padding is defined here |
||
n_iters: 15 # number of iterations of refinement for each scale | ||
lr: 0.002 # learning rate | ||
min_side: 512 # all sides of image on all scales should be >= min_side / sqrt(2) | ||
max_scales: 3 # max number of downscaling scales for the image-mask pyramid | ||
px_budget: 1800000 # pixels budget. Any image will be resized to satisfy height*width <= px_budget |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L84-87 should be outside if-else - they need to be executed regardless refinement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L84-87 already get addressed inside the refiner. Refiner works on unpadded images (it does the necesssary padding internally and then unpads the output appropriately). We can:
unpad_to_size
is not Noneunpad_to_size
isNone
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see. I'd move padding-unpadding from the refiner to predict.py - so both parts of the code are simplified and no logic duplication is introduced. What do you think, is it possible and does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can let the refiner get padded input. But refiner still needs some padding in place. Because -
Suppose your input image is a square of size 1000. Then the original image isn't padded because
1000%8==0
, but in the refiner, once we downscale the image, it's size becomes 500, and500%8!=0
. So we have to pad it to make it 504x504.So we can't get rid of lines 301 and 302 in
refinement.py
, but we can:unpad_to_size
argument at all.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thank you for the clarification! Let's just leave that piece of the as is - and add a comment about "padding-unpadding is handled within refiner"
Padding size depends on depth of the generator and thus needs to be configurable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, will add the comment. Yeah the padding size of the refiner is not exactly 8, but exactly equal to dataset.pad_out_to_modulo in the predict config. I'll add a comment there in the PR