Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradio App with Multi-GPU #59

Merged
merged 1 commit into from
Oct 13, 2024
Merged

Gradio App with Multi-GPU #59

merged 1 commit into from
Oct 13, 2024

Conversation

tpc2233
Copy link
Contributor

@tpc2233 tpc2233 commented Oct 13, 2024

User can set 2 or 4 GPUs in the UI
App assume the models are located in the ./pyramid_flow_model directory within the project.
(use regular App to easy download)

Due to Gradio, decided to do external engine to handle the heavy computation.
Make sure to set permission: chmod +x app_multigpu_engine.sh

Benchmarking the 768p Model on GPUs with 24GB VRAM:

Prompt:
A sloth with pink sunglasses lays on a donut float in a pool. The sloth is holding a tropical drink. The world is tropical. The sunlight casts a shadow

Results:
4x NVIDIA RTX 4090 (24GB VRAM) - 768p Model:

Duration 2: 51 seconds
Duration 4: 56 seconds
Duration 8: 74 seconds
Duration 16: 191 seconds
Duration 20: Out of Memory (OOM)

User can set 2 or 4 GPUs in the UI
App assume the models are located in the ./pyramid_flow_model directory within the project. (use regular App to easy download)

Due to Gradio's single-threaded nature, decided to do external engine (app_multigpu_engine.sh and app_multigpu_engine.py) to handle the heavy computation. 
make sure to set permission: chmod +x app_multigpu_engine.sh 

Benchmarking the 768p Model on GPUs with 24GB VRAM:
Prompt: 
A sloth with pink sunglasses lays on a donut float in a pool. The sloth is holding a tropical drink. The world is tropical. The sunlight casts a shadow

Results:

4x NVIDIA RTX 4090 (24GB VRAM) - 768p Model:
Duration 2: 51 seconds
Duration 4: 56 seconds
Duration 8: 74 seconds
Duration 16: 191 seconds
Duration 20: Out of Memory (OOM)
@feifeiobama feifeiobama merged commit ce12046 into jy0205:main Oct 13, 2024
@feifeiobama
Copy link
Collaborator

feifeiobama commented Oct 13, 2024

Thanks for the multi-GPU Gradio app. Just a follow-up question: should we merge app_multigpu_engine.py and inference_multigpu.py, and put the bash script to the scripts/ folder?

@tpc2233
Copy link
Contributor Author

tpc2233 commented Oct 13, 2024

Thanks for the multi-GPU Gradio app. Just a follow-up question: should we merge app_multigpu_engine.py and inference_multigpu.py, and put the bash script to the scripts/ folder?

makes totally sense, updated scripts, request merge is here:
#77

@pharrowboy
Copy link

pharrowboy commented Oct 14, 2024

its worth noting after moving the DIT to cuda, you can add optimum quanto to the script, and you wont go OOM with longer generation, i'm currently using optimum quanto to run the 768p model on single 24GB 3090, my other 3 3090s are running a full flux model balanced across them, to create a text to flux to pyramid flow gradio app.

from optimum.quanto import freeze, qfloat8, quantize
model.dit.to("cuda:0")
quantize(model.dit, weights=qfloat8)
freeze(model.dit)

@feifeiobama feifeiobama mentioned this pull request Oct 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants