-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Framework and hardware-specific CI tests #997
Changes from all commits
b43319a
3247464
f796f2b
b30fadd
ff02418
eaeadab
d463c79
9148936
54d9357
9f9ae16
24420c1
f4fdf5c
c3c03bd
b5821a4
adede47
0c5cc43
a6c4f31
45bb7be
3a644b6
6c8bc3e
f3ac32f
a62cdd1
85ce44b
2b03693
948b666
99bfc51
7436fd8
cbc03a4
0b7e57b
cb7db9b
47225c2
e3cbd63
2894f76
735f4ee
c5ffe37
c4e8dd6
cf7c438
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
name: Build Docker images (nightly) | ||
|
||
on: | ||
workflow_dispatch: | ||
schedule: | ||
- cron: "0 0 * * *" # every day at midnight | ||
|
||
concurrency: | ||
group: docker-image-builds | ||
cancel-in-progress: false | ||
|
||
env: | ||
REGISTRY: diffusers | ||
|
||
jobs: | ||
build-docker-images: | ||
runs-on: ubuntu-latest | ||
|
||
permissions: | ||
contents: read | ||
packages: write | ||
|
||
strategy: | ||
fail-fast: false | ||
matrix: | ||
image-name: | ||
- diffusers-pytorch-cpu | ||
- diffusers-pytorch-cuda | ||
- diffusers-flax-cpu | ||
- diffusers-flax-tpu | ||
- diffusers-onnxruntime-cpu | ||
- diffusers-onnxruntime-cuda | ||
|
||
steps: | ||
- name: Checkout repository | ||
uses: actions/checkout@v3 | ||
|
||
- name: Login to Docker Hub | ||
uses: docker/login-action@v2 | ||
with: | ||
username: ${{ env.REGISTRY }} | ||
password: ${{ secrets.DOCKERHUB_TOKEN }} | ||
|
||
- name: Build and push | ||
uses: docker/build-push-action@v3 | ||
with: | ||
no-cache: true | ||
context: ./docker/${{ matrix.image-name }} | ||
push: true | ||
tags: ${{ env.REGISTRY }}/${{ matrix.image-name }}:latest |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,19 +11,45 @@ concurrency: | |
|
||
env: | ||
DIFFUSERS_IS_CI: yes | ||
OMP_NUM_THREADS: 8 | ||
MKL_NUM_THREADS: 8 | ||
OMP_NUM_THREADS: 4 | ||
MKL_NUM_THREADS: 4 | ||
PYTEST_TIMEOUT: 60 | ||
MPS_TORCH_VERSION: 1.13.0 | ||
|
||
jobs: | ||
run_tests_cpu: | ||
name: CPU tests on Ubuntu | ||
runs-on: [ self-hosted, docker-gpu ] | ||
run_fast_tests: | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
config: | ||
- name: Fast PyTorch CPU tests on Ubuntu | ||
framework: pytorch | ||
runner: docker-cpu | ||
image: diffusers/diffusers-pytorch-cpu | ||
report: torch_cpu | ||
- name: Fast Flax CPU tests on Ubuntu | ||
framework: flax | ||
runner: docker-cpu | ||
image: diffusers/diffusers-flax-cpu | ||
report: flax_cpu | ||
- name: Fast ONNXRuntime CPU tests on Ubuntu | ||
framework: onnxruntime | ||
runner: docker-cpu | ||
image: diffusers/diffusers-onnxruntime-cpu | ||
report: onnx_cpu | ||
Comment on lines
+23
to
+39
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This matrix defines the different combinations of frameworks, docker images and runners to test There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Very nice |
||
|
||
name: ${{ matrix.config.name }} | ||
|
||
runs-on: ${{ matrix.config.runner }} | ||
|
||
container: | ||
image: python:3.7 | ||
image: ${{ matrix.config.image }} | ||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't need There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, this is PR tests, and only on CPU. Sorry to bother |
||
|
||
defaults: | ||
run: | ||
shell: bash | ||
|
||
steps: | ||
- name: Checkout diffusers | ||
uses: actions/checkout@v3 | ||
|
@@ -32,34 +58,56 @@ jobs: | |
|
||
- name: Install dependencies | ||
run: | | ||
python -m pip install --upgrade pip | ||
python -m pip install torch --extra-index-url https://download.pytorch.org/whl/cpu | ||
python -m pip install -e .[quality,test] | ||
python -m pip install git+https://github.com/huggingface/accelerate | ||
|
||
- name: Environment | ||
run: | | ||
python utils/print_env.py | ||
|
||
- name: Run all fast tests on CPU | ||
- name: Run fast PyTorch CPU tests | ||
if: ${{ matrix.config.framework == 'pytorch' }} | ||
env: | ||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }} | ||
run: | | ||
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \ | ||
-s -v -k "not Flax and not Onnx" \ | ||
--make-reports=tests_${{ matrix.config.report }} \ | ||
tests/ | ||
|
||
- name: Run fast Flax TPU tests | ||
if: ${{ matrix.config.framework == 'flax' }} | ||
env: | ||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }} | ||
run: | | ||
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \ | ||
-s -v -k "Flax" \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (nit) think it's a bit saver/easier to work with environment variables e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea, will add it soon! |
||
--make-reports=tests_${{ matrix.config.report }} \ | ||
tests/ | ||
|
||
- name: Run fast ONNXRuntime CPU tests | ||
if: ${{ matrix.config.framework == 'onnxruntime' }} | ||
env: | ||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }} | ||
run: | | ||
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile -s -v --make-reports=tests_torch_cpu tests/ | ||
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \ | ||
-s -v -k "Onnx" \ | ||
--make-reports=tests_${{ matrix.config.report }} \ | ||
tests/ | ||
|
||
- name: Failure short reports | ||
if: ${{ failure() }} | ||
run: cat reports/tests_torch_cpu_failures_short.txt | ||
run: cat reports/tests_${{ matrix.config.report }}_failures_short.txt | ||
|
||
- name: Test suite reports artifacts | ||
if: ${{ always() }} | ||
uses: actions/upload-artifact@v2 | ||
with: | ||
name: pr_torch_cpu_test_reports | ||
name: pr_${{ matrix.config.report }}_test_reports | ||
path: reports | ||
|
||
run_tests_apple_m1: | ||
name: MPS tests on Apple M1 | ||
run_fast_tests_apple_m1: | ||
name: Fast PyTorch MPS tests on MacOS | ||
runs-on: [ self-hosted, apple-m1 ] | ||
|
||
steps: | ||
|
@@ -91,7 +139,7 @@ jobs: | |
run: | | ||
${CONDA_RUN} python utils/print_env.py | ||
|
||
- name: Run all fast tests on MPS | ||
- name: Run fast PyTorch tests on M1 (MPS) | ||
shell: arch -arch arm64 bash {0} | ||
env: | ||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,12 +14,38 @@ env: | |
RUN_SLOW: yes | ||
|
||
jobs: | ||
run_tests_single_gpu: | ||
name: Diffusers tests | ||
runs-on: [ self-hosted, docker-gpu, single-gpu ] | ||
run_slow_tests: | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
config: | ||
- name: Slow PyTorch CUDA tests on Ubuntu | ||
framework: pytorch | ||
runner: docker-gpu | ||
image: diffusers/diffusers-pytorch-cuda | ||
report: torch_cuda | ||
- name: Slow Flax TPU tests on Ubuntu | ||
framework: flax | ||
runner: docker-tpu | ||
image: diffusers/diffusers-flax-tpu | ||
report: flax_tpu | ||
- name: Slow ONNXRuntime CUDA tests on Ubuntu | ||
framework: onnxruntime | ||
runner: docker-gpu | ||
image: diffusers/diffusers-onnxruntime-cuda | ||
report: onnx_cuda | ||
|
||
name: ${{ matrix.config.name }} | ||
|
||
runs-on: ${{ matrix.config.runner }} | ||
|
||
container: | ||
image: nvcr.io/nvidia/pytorch:22.07-py3 | ||
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache | ||
image: ${{ matrix.config.image }} | ||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ ${{ matrix.config.runner == 'docker-tpu' && '--privileged' || '--gpus 0'}} | ||
|
||
defaults: | ||
run: | ||
shell: bash | ||
|
||
steps: | ||
- name: Checkout diffusers | ||
|
@@ -28,44 +54,68 @@ jobs: | |
fetch-depth: 2 | ||
|
||
- name: NVIDIA-SMI | ||
if : ${{ matrix.config.runner == 'docker-gpu' }} | ||
run: | | ||
nvidia-smi | ||
|
||
- name: Install dependencies | ||
run: | | ||
python -m pip install --upgrade pip | ||
python -m pip uninstall -y torch torchvision torchtext | ||
python -m pip install torch --extra-index-url https://download.pytorch.org/whl/cu117 | ||
python -m pip install -e .[quality,test] | ||
python -m pip install git+https://github.com/huggingface/accelerate | ||
|
||
- name: Environment | ||
run: | | ||
python utils/print_env.py | ||
|
||
- name: Run all (incl. slow) tests on GPU | ||
- name: Run slow PyTorch CUDA tests | ||
if: ${{ matrix.config.framework == 'pytorch' }} | ||
env: | ||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }} | ||
run: | | ||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \ | ||
-s -v -k "not Flax and not Onnx" \ | ||
--make-reports=tests_${{ matrix.config.report }} \ | ||
tests/ | ||
|
||
- name: Run slow Flax TPU tests | ||
if: ${{ matrix.config.framework == 'flax' }} | ||
env: | ||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }} | ||
run: | | ||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v --make-reports=tests_torch_gpu tests/ | ||
python -m pytest -n 0 \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Precisely! Looks like |
||
-s -v -k "Flax" \ | ||
--make-reports=tests_${{ matrix.config.report }} \ | ||
tests/ | ||
|
||
- name: Run slow ONNXRuntime CUDA tests | ||
if: ${{ matrix.config.framework == 'onnxruntime' }} | ||
env: | ||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }} | ||
run: | | ||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \ | ||
-s -v -k "Onnx" \ | ||
--make-reports=tests_${{ matrix.config.report }} \ | ||
tests/ | ||
|
||
- name: Failure short reports | ||
if: ${{ failure() }} | ||
run: cat reports/tests_torch_gpu_failures_short.txt | ||
run: cat reports/tests_${{ matrix.config.report }}_failures_short.txt | ||
|
||
- name: Test suite reports artifacts | ||
if: ${{ always() }} | ||
uses: actions/upload-artifact@v2 | ||
with: | ||
name: torch_test_reports | ||
name: ${{ matrix.config.report }}_test_reports | ||
path: reports | ||
|
||
run_examples_single_gpu: | ||
name: Examples tests | ||
runs-on: [ self-hosted, docker-gpu, single-gpu ] | ||
run_examples_tests: | ||
name: Examples PyTorch CUDA tests on Ubuntu | ||
|
||
runs-on: docker-gpu | ||
|
||
container: | ||
image: nvcr.io/nvidia/pytorch:22.07-py3 | ||
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache | ||
image: diffusers/diffusers-pytorch-cuda | ||
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ | ||
|
||
steps: | ||
- name: Checkout diffusers | ||
|
@@ -79,9 +129,6 @@ jobs: | |
|
||
- name: Install dependencies | ||
run: | | ||
python -m pip install --upgrade pip | ||
python -m pip uninstall -y torch torchvision torchtext | ||
python -m pip install torch --extra-index-url https://download.pytorch.org/whl/cu117 | ||
python -m pip install -e .[quality,test,training] | ||
python -m pip install git+https://github.com/huggingface/accelerate | ||
|
||
|
@@ -93,11 +140,11 @@ jobs: | |
env: | ||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }} | ||
run: | | ||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v --make-reports=examples_torch_gpu examples/ | ||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v --make-reports=examples_torch_cuda examples/ | ||
|
||
- name: Failure short reports | ||
if: ${{ failure() }} | ||
run: cat reports/examples_torch_gpu_failures_short.txt | ||
run: cat reports/examples_torch_cuda_failures_short.txt | ||
|
||
- name: Test suite reports artifacts | ||
if: ${{ always() }} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
FROM ubuntu:20.04 | ||
LABEL maintainer="Hugging Face" | ||
LABEL repository="diffusers" | ||
|
||
ENV DEBIAN_FRONTEND=noninteractive | ||
|
||
RUN apt update && \ | ||
apt install -y bash \ | ||
build-essential \ | ||
git \ | ||
git-lfs \ | ||
curl \ | ||
ca-certificates \ | ||
python3.8 \ | ||
python3-pip \ | ||
python3.8-venv && \ | ||
rm -rf /var/lib/apt/lists | ||
|
||
# make sure to use venv | ||
RUN python3 -m venv /opt/venv | ||
ENV PATH="/opt/venv/bin:$PATH" | ||
|
||
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py) | ||
# follow the instructions here: https://cloud.google.com/tpu/docs/run-in-container#train_a_jax_model_in_a_docker_container | ||
RUN python3 -m pip install --no-cache-dir --upgrade pip && \ | ||
python3 -m pip install --upgrade --no-cache-dir \ | ||
clu \ | ||
"jax[cpu]>=0.2.16,!=0.3.2" \ | ||
"flax>=0.4.1" \ | ||
"jaxlib>=0.1.65" && \ | ||
python3 -m pip install --no-cache-dir \ | ||
accelerate \ | ||
datasets \ | ||
hf-doc-builder \ | ||
huggingface-hub \ | ||
modelcards \ | ||
numpy \ | ||
scipy \ | ||
tensorboard \ | ||
transformers | ||
|
||
CMD ["/bin/bash"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CPU runner has 8 cores => 2 pytest workers * 4 cores.
The speed isn't affected by this change (only faster due to the new docker image)