-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pytest-xdist with tensorflow tests to reduce running time of tests #795
Conversation
Changed back to draft status. On the second run it appears that maybe the 2 processes may not bring as much benefit as the first suggested (16,16,19 mins) vs first (10, 10, 10 mins). This may indicate that the 2 virtual CPUs that are visible are not completely independent of each other (and of other workflows) running at the same time. If we want to be more confident that this approach is helping, perhaps we need some more data-points on the running time. and compare with other other builds over the next few days |
Click to view CI ResultsGitHub pull request #795 of commit 4cba47465b4310319f77aa3db7eb03377395c48e, no merge conflicts. Running as SYSTEM Setting status of 4cba47465b4310319f77aa3db7eb03377395c48e to PENDING with url https://10.20.13.93:8080/job/merlin_models/1512/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/795/*:refs/remotes/origin/pr/795/* # timeout=10 > git rev-parse 4cba47465b4310319f77aa3db7eb03377395c48e^{commit} # timeout=10 Checking out Revision 4cba47465b4310319f77aa3db7eb03377395c48e (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 4cba47465b4310319f77aa3db7eb03377395c48e # timeout=10 Commit message: "Use pytest-xdist params to distribute tests across processes" > git rev-list --no-walk f1134323b518a753610b9bc90180e6482ca2494f # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins18346185061155354712.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /usr/local/lib/python3.8/dist-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.5.0) Requirement already satisfied: nbclient>=0.4.0 in /usr/local/lib/python3.8/dist-packages (from testbook) (0.6.8) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.16.1) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.16.0) Requirement already satisfied: jupyter_core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.11.1) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.4.0) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.5) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (22.1.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.9.0) Requirement already satisfied: pkgutil-resolve-name>=1.3.10; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (1.3.10) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (24.0.0) Requirement already satisfied: tornado>=6.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.2) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.1) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-4.0.0 collected 749 items |
Click to view CI ResultsGitHub pull request #795 of commit 409990634fa5cabeef194f51942dd10e738edae4, no merge conflicts. Running as SYSTEM Setting status of 409990634fa5cabeef194f51942dd10e738edae4 to PENDING with url https://10.20.13.93:8080/job/merlin_models/1514/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/795/*:refs/remotes/origin/pr/795/* # timeout=10 > git rev-parse 409990634fa5cabeef194f51942dd10e738edae4^{commit} # timeout=10 Checking out Revision 409990634fa5cabeef194f51942dd10e738edae4 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 409990634fa5cabeef194f51942dd10e738edae4 # timeout=10 Commit message: "Merge branch 'main' into pytest-xdist" > git rev-list --no-walk 2afac55cefae8b6bd31ab2cd7a532e4cc6824fb0 # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins14025387262714391365.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /usr/local/lib/python3.8/dist-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.5.0) Requirement already satisfied: nbclient>=0.4.0 in /usr/local/lib/python3.8/dist-packages (from testbook) (0.6.8) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.16.1) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.16.0) Requirement already satisfied: jupyter_core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.11.1) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.4.0) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.5) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (22.1.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.9.0) Requirement already satisfied: pkgutil-resolve-name>=1.3.10; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (1.3.10) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (24.0.0) Requirement already satisfied: tornado>=6.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.2) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.1) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-4.0.0 collected 770 items |
Click to view CI ResultsGitHub pull request #795 of commit 0a76fa194ddef2ec3365c7c1f26c7ff705632010, no merge conflicts. Running as SYSTEM Setting status of 0a76fa194ddef2ec3365c7c1f26c7ff705632010 to PENDING with url https://10.20.13.93:8080/job/merlin_models/1517/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/795/*:refs/remotes/origin/pr/795/* # timeout=10 > git rev-parse 0a76fa194ddef2ec3365c7c1f26c7ff705632010^{commit} # timeout=10 Checking out Revision 0a76fa194ddef2ec3365c7c1f26c7ff705632010 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 0a76fa194ddef2ec3365c7c1f26c7ff705632010 # timeout=10 Commit message: "Merge branch 'main' into pytest-xdist" > git rev-list --no-walk be11791b412cb7b04350cedcedef3aac31f6b914 # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins678190253080870487.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /usr/local/lib/python3.8/dist-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.5.0) Requirement already satisfied: nbclient>=0.4.0 in /usr/local/lib/python3.8/dist-packages (from testbook) (0.6.8) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.16.1) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.16.0) Requirement already satisfied: jupyter_core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.11.1) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.4.0) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.5) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (22.1.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.9.0) Requirement already satisfied: pkgutil-resolve-name>=1.3.10; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (1.3.10) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (24.0.0) Requirement already satisfied: tornado>=6.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.2) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.1) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-4.0.0 collected 770 items |
Click to view CI ResultsGitHub pull request #795 of commit 98b88014d80b3f366da4261b27d04cda70c4d301, no merge conflicts. Running as SYSTEM Setting status of 98b88014d80b3f366da4261b27d04cda70c4d301 to PENDING with url https://10.20.13.93:8080/job/merlin_models/1528/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/795/*:refs/remotes/origin/pr/795/* # timeout=10 > git rev-parse 98b88014d80b3f366da4261b27d04cda70c4d301^{commit} # timeout=10 Checking out Revision 98b88014d80b3f366da4261b27d04cda70c4d301 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 98b88014d80b3f366da4261b27d04cda70c4d301 # timeout=10 Commit message: "Merge branch 'main' into pytest-xdist" > git rev-list --no-walk f3ed07cee6339b16df55c6227c8c6cbe711135a4 # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins13396972336812324408.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /usr/local/lib/python3.8/dist-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.5.0) Requirement already satisfied: nbclient>=0.4.0 in /usr/local/lib/python3.8/dist-packages (from testbook) (0.6.8) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.16.1) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.16.0) Requirement already satisfied: jupyter_core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.11.1) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.4.0) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.5) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (22.1.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.9.0) Requirement already satisfied: pkgutil-resolve-name>=1.3.10; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (1.3.10) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (24.0.0) Requirement already satisfied: tornado>=6.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.2) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.1) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-4.0.0 collected 772 items |
The tests seem to be consistently running slightly more quickly with this setup, but has not halved the running time. |
#795) * Add pytest-xdist to dev dependencies * Use pytest-xdist params to distribute tests across processes
Goals β½
Reduce running time of tensorflow tests
Implementation Details π§
Using pytest-xdist to run tensorflow tests in parallel when more then one process is available.
Testing Details π
Based on the logs in the unittest there are 2 processes available.
Existing tests completed in ~10mins.
Which appears to be faster than the current usual completion time of ~15-20mins