Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

variable-scale noise and noise operations #2163

Closed
Ehplodor opened this issue Oct 10, 2022 · 13 comments
Closed

variable-scale noise and noise operations #2163

Ehplodor opened this issue Oct 10, 2022 · 13 comments
Labels
enhancement New feature or request

Comments

@Ehplodor
Copy link

Is your feature request related to a problem? Please describe.
The scale of noise features seems related to the output image dimensions only. I may not understand the concept of noise here, and might be very wrong... but I believe that this limits in some way or another the artistic possibilities, especially finer scale image structure but also bigger scale ?

Describe the solution you'd like
I think it would be fruitful to be able to vary the scale of the noise, using some kind of multiplier (just like when one is using perlin noise at multiple scale for achieving 3d modeling of mountains for example, from the mountains to the smallest rock). And maybe addition/multiplication of noise scales would be good too ?

Describe alternatives you've considered

I can "play" with the noise scale when generating img2img with varying dimensions (same ratio).

The noise resizing might help but only when seed is available (so NO img2img alternative playing with noise, and only for generating higher resolutions with features of lower resolution... so quite limited actually.

Additional context
Add any other context or screenshots about the feature request here.

@Ehplodor
Copy link
Author

In txt2img, and more globally everywhere where noise is needed, the idea would be to generate the noise at a select resolution, then upscale or downscale the result to match the final image resolution.

In img2img, at least img2img alternative script, the solution would be to first upscale or downscale the input image to a select resolution, process the decode part at that resolution, then upscale or downscale to the final output resolution and process to the end.

@timntorres
Copy link
Contributor

timntorres commented Oct 27, 2022

I am by no means an expert, but I’m under the impression that 1 unit of noise is “locked” to an 8x8 grid of pixels. That is, a 512x512 pixel image with 3 channels maps onto a 64x64 tensor of noise with 4 channels. I believe this ratio is intrinsic to the thing’s training. Do you propose changing this ratio? (I don’t know whether or not it’s possible, and I don’t know what effect it would have; I’m just making sure we’re on the same page.)

EDIT: I recall there being a “seed resize” option. Is that somehow related to this?

EDIT2: No, looks like seed resize only “extends the borders” of the noise while preserving the ratio. Changing that ratio seems unprecedented

Considering it’s unilaterally stressed that pixel resolutions be multiples of 64, I really think changing this ratio is a non-starter, unfortunately. I had some real fun ideas that involved 1:1 mappings of pixel and noise, too, and this bummed me out when I discovered it.

Closest thing I can find is this, which upscales the IMAGE. Keeps the ratio of pixels to noise constant. Not really related but might spark some ideas.

FA6515B3-E0B7-4CED-8FD1-EF133E189351

@Ehplodor
Copy link
Author

I tried to make sense of the code behind seed resize in commit b170755 in modules/processing.py
But I just don't understand it for the time being. There is floor division by 8 that sparks my attention and it must be linked to the 8x8 good you talked about.
But really I don't understand it at the moment...

@Ehplodor
Copy link
Author

Ehplodor commented Oct 27, 2022

I think some kind of upscaling / downscaling of the initial noise image produced at step zero might do the trick ? Maybe ?

For example : case when some image is really cool at default 512x512 resolution but for some reason we wish to do the work at another (maybe lower resolution, for fast iteration on animation or X/Y/Z plot maybe ?) Then :

  • txt2img something at 512x512
  • stop at step 0 before first img2img pass. This is our noise reference.
  • downscale the noise image to 256x256
  • continue img2img steps from the 256x256 noise image all the way to the end

My expectation is that the final image at 256x256 will be very similar to that produced initially at 512x512. But much faster because of halved résolution.

Similar thinking about upscaling the 512x512 noise image at zero step to 1024x1024 for example, and going on with lmg2img iterations from that, I expect same global composition but higher resolution. Similar to high Res fix but in 1 pass of 50 iterations (for example), not 50 steps at base resolution, then 50 again at higher resolution. But again, maybe highres fix doesn't do full 2x50 steps but only "partial render" at low resolution and "partial render" at higher resolution ?

I don't know, my expectations could be all wrong...

@timntorres
Copy link
Contributor

timntorres commented Oct 27, 2022

But really I don't understand it at the moment...

As long as we're on the same page in terms of seed resize's practical value, the specifics don't really matter.

Click for demo(warning: flashing colors) Without seed resize:

without-resize-BAD

With seed resize:

with-resize-GOOD

The top image's frames have nothing to do with each other; while the bottom one's resized noise preserves the content of the square.

Of course, this is merely an approximation. I personally don't think it's fruitful to think of the noise as an IMAGE in any conventional sense of the term. For one thing, it's not really meant to be viewed by us directly. But for another, we intuitively expect the same image at different resolutions will have persistent features. I don't believe the same can be said for this noise tensor -- or, to be more precise, how the model interprets it. (For instance, let's say you choose some upsampling/downsampling method and apply it to the noise. Where will the model have learned what that means -- How can it be expected to draw the relationship between the noise before and after up/downsampling?) But I, too, could be wrong.

EDIT: Here's another thing that we can't necessarily take for granted: that small, directed changes to the content of the noise tensor will have correspondingly small, correspondingly directed changes to the output. For instance, let's say you only change the bottom right cell of noise. Will that mean only the bottom right corner of the image will change? Let's say you translate each unit of noise one space to the left. Will the output image be translated to the left?? If you change only the bottom half of the noise tensor, will only the bottom half of the image be changed??? Even these seemingly basic things can't really be taken for granted.

@Ehplodor
Copy link
Author

Ehplodor commented Oct 27, 2022

I will try, tomorrow, to find how to save that very first image of the noise. I think there is a script for that. Then I will apply some transformation (upscale, downscale, translation, rotation, mask) through some image editor, then img2img with same prompt, and post the results here.

@timntorres
Copy link
Contributor

@Ehplodor Thank you for rekindling my interest in this, I'd given up right after 1.4 first had been released. But I think there could be something to this. Looking forward to seeing your results!

@timntorres
Copy link
Contributor

timntorres commented Oct 28, 2022

I got some overhead done for you, @Ehplodor, and it's not looking good. It seems the precision of png channels is nowhere near close enough to the noise tensors' float precision, so after normalizing, converting, saving, uploading, converting back, and denormalizing; too much information is lost. The outputs barely resemble each other. Some are better than others:

Click for comparison 1.

A
A2

Click for comparison 2.

01136-3-Photo of a cute dog
01137-3-Photo of a cute dog

Click for comparison 3.

01132-1-Photo of a cute dog
01133-1-Photo of a cute dog

But on the whole, png simply lacks the precision to store noise tensor reliably. At most, there could be a tool to manipulate the noise and merely visualize it as a png? But manipulating the png itself seems like a no go.

Here's a branch with my hacky code if you want to play with what I wrote. After checkout, you'll need to navigate to here in your editor to tinker with it.

Weirdly, when setting noise to uploaded_noise, but inputting a different seed in the UI (ie hardcoding the uploaded input to the name of some extant noise png) the output's a glitchy green square. This tells me the raw seed value might STILL be pulling some weight in terms of resemblance; and even then, the outputs hardly match.

@Ehplodor
Copy link
Author

Ehplodor commented Oct 28, 2022

I tried a bunch of things today. Nothing really interesting though.

I re-discovered I think by trial and error, thanks to img2img alt script, that the noise that is encoded at a given resolution effectively doesnt work at another resolution. I scraped the last noise image produced at 512x512, fed it back to "just" img2img with the same prompt and got interesting results, albeit different. But global composition was hopefully similar, not as similar as what you got with your modified processing, but still quite similar (will try uploading some images of these tests tomorrow if i get the time). However, when I tried downsampling the scraped noise at 256x256 for example, nothing good came out. the initial composition was totally lost.

I just looked at your "hacky code" and will give it a try tomorrow I hope ! This will hopefully be a lot better than "scraping the noise from img2img alt" XD Thank you

@Ehplodor
Copy link
Author

There is a "scale latent" option to highres fix.
-> modules/processing.py#L608

Maybe relevant for your idea about manipulating the noise itself and not the png

@Ehplodor
Copy link
Author

This PR might be highly relevant ? #4021 @timntorres

@Ehplodor
Copy link
Author

Ehplodor commented Nov 3, 2022

There is a "scale latent" option to highres fix. -> modules/processing.py#L608

Maybe relevant for your idea about manipulating the noise itself and not the png

scale latent commit where it is "brought back" : f7ca639
High res fix with/without scale latents option : #2613
and some more info that might or might not be relevant : #1716

@mezotaken mezotaken added the enhancement New feature or request label Jan 12, 2023
@catboxanon
Copy link
Collaborator

Implemented in #12564, #12616

Atry pushed a commit to Atry/stable-diffusion-webui that referenced this issue Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants