-
Notifications
You must be signed in to change notification settings - Fork 27.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
variable-scale noise and noise operations #2163
Comments
In txt2img, and more globally everywhere where noise is needed, the idea would be to generate the noise at a select resolution, then upscale or downscale the result to match the final image resolution. In img2img, at least img2img alternative script, the solution would be to first upscale or downscale the input image to a select resolution, process the decode part at that resolution, then upscale or downscale to the final output resolution and process to the end. |
I am by no means an expert, but I’m under the impression that 1 unit of noise is “locked” to an 8x8 grid of pixels. That is, a 512x512 pixel image with 3 channels maps onto a 64x64 tensor of noise with 4 channels. I believe this ratio is intrinsic to the thing’s training. Do you propose changing this ratio? (I don’t know whether or not it’s possible, and I don’t know what effect it would have; I’m just making sure we’re on the same page.) EDIT: I recall there being a “seed resize” option. Is that somehow related to this? EDIT2: No, looks like seed resize only “extends the borders” of the noise while preserving the ratio. Changing that ratio seems unprecedented Considering it’s unilaterally stressed that pixel resolutions be multiples of 64, I really think changing this ratio is a non-starter, unfortunately. I had some real fun ideas that involved 1:1 mappings of pixel and noise, too, and this bummed me out when I discovered it. Closest thing I can find is this, which upscales the IMAGE. Keeps the ratio of pixels to noise constant. Not really related but might spark some ideas. |
I tried to make sense of the code behind seed resize in commit b170755 in modules/processing.py |
I think some kind of upscaling / downscaling of the initial noise image produced at step zero might do the trick ? Maybe ? For example : case when some image is really cool at default 512x512 resolution but for some reason we wish to do the work at another (maybe lower resolution, for fast iteration on animation or X/Y/Z plot maybe ?) Then :
My expectation is that the final image at 256x256 will be very similar to that produced initially at 512x512. But much faster because of halved résolution. Similar thinking about upscaling the 512x512 noise image at zero step to 1024x1024 for example, and going on with lmg2img iterations from that, I expect same global composition but higher resolution. Similar to high Res fix but in 1 pass of 50 iterations (for example), not 50 steps at base resolution, then 50 again at higher resolution. But again, maybe highres fix doesn't do full 2x50 steps but only "partial render" at low resolution and "partial render" at higher resolution ? I don't know, my expectations could be all wrong... |
As long as we're on the same page in terms of seed resize's practical value, the specifics don't really matter. The top image's frames have nothing to do with each other; while the bottom one's resized noise preserves the content of the square. Of course, this is merely an approximation. I personally don't think it's fruitful to think of the noise as an IMAGE in any conventional sense of the term. For one thing, it's not really meant to be viewed by us directly. But for another, we intuitively expect the same image at different resolutions will have persistent features. I don't believe the same can be said for this noise tensor -- or, to be more precise, how the model interprets it. (For instance, let's say you choose some upsampling/downsampling method and apply it to the noise. Where will the model have learned what that means -- How can it be expected to draw the relationship between the noise before and after up/downsampling?) But I, too, could be wrong. EDIT: Here's another thing that we can't necessarily take for granted: that small, directed changes to the content of the noise tensor will have correspondingly small, correspondingly directed changes to the output. For instance, let's say you only change the bottom right cell of noise. Will that mean only the bottom right corner of the image will change? Let's say you translate each unit of noise one space to the left. Will the output image be translated to the left?? If you change only the bottom half of the noise tensor, will only the bottom half of the image be changed??? Even these seemingly basic things can't really be taken for granted. |
I will try, tomorrow, to find how to save that very first image of the noise. I think there is a script for that. Then I will apply some transformation (upscale, downscale, translation, rotation, mask) through some image editor, then img2img with same prompt, and post the results here. |
@Ehplodor Thank you for rekindling my interest in this, I'd given up right after 1.4 first had been released. But I think there could be something to this. Looking forward to seeing your results! |
I got some overhead done for you, @Ehplodor, and it's not looking good. It seems the precision of png channels is nowhere near close enough to the noise tensors' float precision, so after normalizing, converting, saving, uploading, converting back, and denormalizing; too much information is lost. The outputs barely resemble each other. Some are better than others: But on the whole, png simply lacks the precision to store noise tensor reliably. At most, there could be a tool to manipulate the noise and merely visualize it as a png? But manipulating the png itself seems like a no go. Here's a branch with my hacky code if you want to play with what I wrote. After checkout, you'll need to navigate to here in your editor to tinker with it. Weirdly, when setting |
I tried a bunch of things today. Nothing really interesting though. I re-discovered I think by trial and error, thanks to img2img alt script, that the noise that is encoded at a given resolution effectively doesnt work at another resolution. I scraped the last noise image produced at 512x512, fed it back to "just" img2img with the same prompt and got interesting results, albeit different. But global composition was hopefully similar, not as similar as what you got with your modified processing, but still quite similar (will try uploading some images of these tests tomorrow if i get the time). However, when I tried downsampling the scraped noise at 256x256 for example, nothing good came out. the initial composition was totally lost. I just looked at your "hacky code" and will give it a try tomorrow I hope ! This will hopefully be a lot better than "scraping the noise from img2img alt" XD Thank you |
There is a "scale latent" option to highres fix. Maybe relevant for your idea about manipulating the noise itself and not the png |
This PR might be highly relevant ? #4021 @timntorres |
scale latent commit where it is "brought back" : f7ca639 |
Is your feature request related to a problem? Please describe.
The scale of noise features seems related to the output image dimensions only. I may not understand the concept of noise here, and might be very wrong... but I believe that this limits in some way or another the artistic possibilities, especially finer scale image structure but also bigger scale ?
Describe the solution you'd like
I think it would be fruitful to be able to vary the scale of the noise, using some kind of multiplier (just like when one is using perlin noise at multiple scale for achieving 3d modeling of mountains for example, from the mountains to the smallest rock). And maybe addition/multiplication of noise scales would be good too ?
Describe alternatives you've considered
I can "play" with the noise scale when generating img2img with varying dimensions (same ratio).
The noise resizing might help but only when seed is available (so NO img2img alternative playing with noise, and only for generating higher resolutions with features of lower resolution... so quite limited actually.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: