Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the example script and rename examples #91

Merged
merged 109 commits into from
Nov 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
109 commits
Select commit Hold shift + click to select a range
9412dda
Simplify the example script a bit
lebrice Nov 13, 2024
7bfc1b0
Fix broken tests
lebrice Nov 13, 2024
49fbce8
Add new regression files (init is on GPU now)
lebrice Nov 13, 2024
2fbc6f1
Simplify the imports of `project/main.py`
lebrice Nov 13, 2024
3029001
Add xfail for the example on macos
lebrice Nov 14, 2024
f381acf
Fix error in main.py
lebrice Nov 14, 2024
f1a7ddb
Fix ULTRA weird bug w/ pickling and singledispatch
lebrice Nov 14, 2024
662d0e5
Fix raised exception type in example_test.py
lebrice Nov 15, 2024
957f4b7
Also move the example input array to the GPU
lebrice Nov 15, 2024
827d09f
Add a bit of a hack to fix self._device
lebrice Nov 15, 2024
b84bba7
Require 16gb vram for finetuning tests
lebrice Nov 15, 2024
d6517f7
Add mark on flaky test :(
lebrice Nov 15, 2024
0049482
Remove duplicated code in text_classification_example_test.py
lebrice Nov 15, 2024
3b1322f
text_classification_example-->text_classification
lebrice Nov 15, 2024
8a3150f
llm_finetuning_example-->llm_finetuning
lebrice Nov 15, 2024
3ecf12f
Rename regression files as well
lebrice Nov 15, 2024
63b70f8
`LearningAlgorithmTests`-->`LightningModuleTests`
lebrice Nov 15, 2024
de30db9
Remove duplicate module (?)
lebrice Nov 15, 2024
2e4c0a0
Fix minuscule typing error
lebrice Nov 15, 2024
8a0c9e2
Remove oudated todo
lebrice Nov 15, 2024
3a37ea5
Add missing regression files
lebrice Nov 18, 2024
5b63330
[HUGE] Rename examples (drop "Example" suffix)
lebrice Nov 18, 2024
fce5db6
Fix JaxImageClassifier test issues
lebrice Nov 18, 2024
e64ed06
Add fix 4 non-deterministic jax_image_classifier
lebrice Nov 18, 2024
19fc94e
Standardize ImageClassifier/JaxImageClassifier
lebrice Nov 18, 2024
e30aa39
Fix issue in `main_test.py`
lebrice Nov 18, 2024
2c6b126
Add test for the `demo` of jax_image_classifier.py
lebrice Nov 18, 2024
1e2d330
Fix / rename examples in docs
lebrice Nov 18, 2024
5695f99
Remove NETWORK_DIR from devcontainer.json
lebrice Nov 18, 2024
1358307
Fix test for `demo` of jax_image_classifier.py
lebrice Nov 18, 2024
2429cb8
Add back all regression files
lebrice Nov 18, 2024
4012858
Use temp dir for logs in test of demo
lebrice Nov 18, 2024
2446d77
Cleanup jax_ppo.yaml values
lebrice Nov 18, 2024
2546a6d
Rename jax trainer config to jax_trainer.yaml
lebrice Nov 18, 2024
1acfd91
Remove oudated comments in main.py
lebrice Nov 18, 2024
2d6d2a4
Fix test for autoref mkdocs plugin
lebrice Nov 18, 2024
6b94e76
Mark 'algorithm' as required
lebrice Nov 18, 2024
a5be470
Remove unused test in text_classification_test.py
lebrice Nov 18, 2024
3f4075b
Move `import_object` to where it is used
lebrice Nov 18, 2024
54024c0
Fix test in `remote_launcher_plugin_test.py`
lebrice Nov 18, 2024
60e08c9
Add missing regression files for RL test
lebrice Nov 18, 2024
9bab03a
Use dependency-groups instead of dev-dependencies
lebrice Nov 18, 2024
5a804b8
Remove empty test file
lebrice Nov 18, 2024
f88d25e
Remove `seeding.py`
lebrice Nov 18, 2024
c21aecc
Cleanup, remove unused code
lebrice Nov 18, 2024
6d33edb
Minor doc improvements
lebrice Nov 18, 2024
310754d
Remove more unused code
lebrice Nov 18, 2024
074cfd6
Remove unused `get_constant`
lebrice Nov 18, 2024
9881398
Remove unnecessary use of Datamodule
lebrice Nov 18, 2024
52dbf1b
Revert "Remove unused `get_constant`"
lebrice Nov 18, 2024
e4a31d4
Fix error in profiling_test.py
lebrice Nov 18, 2024
9a4c442
Add note in profiling_test.py
lebrice Nov 18, 2024
15ae76e
Fix `test_demo`
lebrice Nov 19, 2024
81b8d0d
Skip some tests on MAC in CI (instead of xfail)
lebrice Nov 19, 2024
965dfef
Don't remove normalization if normalize=False
lebrice Nov 19, 2024
bc3b1d2
Fix issue in cifar10, add note about protocol
lebrice Nov 19, 2024
6249495
Fix bug in VisionDataModule.__init__
lebrice Nov 19, 2024
c63d46e
Fix bug in remote_launcher_test.py
lebrice Nov 19, 2024
a9a8de7
Fix pre-commit issues
lebrice Nov 19, 2024
daab5ac
"fix" weird pre-commit issue?
lebrice Nov 19, 2024
a013817
Try to make tests faster
lebrice Nov 19, 2024
ec951a7
Silence some typing errors
lebrice Nov 19, 2024
8d3b65b
Add missing regression files
lebrice Nov 19, 2024
5238ef3
Fix device of example_input_array (and network!)
lebrice Nov 19, 2024
759781a
Make the timeout longer for integration tests
lebrice Nov 19, 2024
a04369c
Save correct device type in regression test
lebrice Nov 19, 2024
b4ca1b0
Add some more `type: ignore` comments
lebrice Nov 19, 2024
5ccc0f2
Update regression files (missing llm_finetuning)
lebrice Nov 19, 2024
6cbdaf2
Add skip mark for macOS tests in CI
lebrice Nov 20, 2024
035e205
Add a mark on strangely-failing test in main_test
lebrice Nov 20, 2024
547c4ac
Use a skip on macos instead of xfail (again)
lebrice Nov 20, 2024
fe1d3ce
Fix bug with tuples and lists in regression tests
lebrice Nov 20, 2024
62f7c5d
Adjust regression files, add missing files
lebrice Nov 20, 2024
668086f
Reset the simpler content for regression files
lebrice Nov 20, 2024
0fb4824
Add missing regression files
lebrice Nov 20, 2024
e468c07
Update regression files
lebrice Nov 20, 2024
c5f8e32
Add built docs directory to norecursedirs
lebrice Nov 20, 2024
ad44b9e
Remove ImageNet32 Datamodule
lebrice Nov 21, 2024
c08c935
Fix issue with display of seed in jax_ppo_test.py
lebrice Nov 21, 2024
d950a68
Make tests faster to run by skipping visualization
lebrice Nov 21, 2024
0931a3d
Fix an incorrect reason for xfail mark in test
lebrice Nov 21, 2024
c243f80
Fix broken link in FashionMNIST datamodule
lebrice Nov 21, 2024
04e7fb4
Reduce logging verbosity in hydra_config_utils.py
lebrice Nov 21, 2024
771d5fc
Remove hydra_config_utils.py
lebrice Nov 21, 2024
b4ae910
Adjust the name of regression files for ppo tests
lebrice Nov 21, 2024
2695cf8
Add an xfail mark on test failing for MacOS
lebrice Nov 21, 2024
9ed0634
Adjust xfail mark: xfail if no GPU (on CI)
lebrice Nov 21, 2024
e2a18e0
Add missing `yield` in fixture
lebrice Nov 21, 2024
f1b3167
Also set XLA_PYTHON_CLIENT_ALLOCATOR="platform"
lebrice Nov 21, 2024
08d5bf5
Add xfail on lightning test
lebrice Nov 21, 2024
8544e4a
Add missing regression files for ImageNet
lebrice Nov 21, 2024
72a77ff
Add other (?) missing ImageNet regression files
lebrice Nov 22, 2024
61ecc0c
Fix regression files (different gpu type?)
lebrice Nov 22, 2024
0391ca5
Update regression files (agAIN!)
lebrice Nov 23, 2024
cd07bfa
Adjust regression tests (again)
lebrice Nov 26, 2024
59dd92c
Increase timeout for slurm integration tests
lebrice Nov 26, 2024
74a02e8
Add xfail on failing repro test
lebrice Nov 27, 2024
111abb8
Fix try-except block in testutils.py
lebrice Nov 27, 2024
bb50f2d
Increase the number of CPUS and RAM for tests
lebrice Nov 27, 2024
f3a9477
Add xfail on flaky tests on SLURM
lebrice Nov 27, 2024
67feee0
Don't include GPU name in the regression file
lebrice Nov 27, 2024
5a2ee40
Make sure the train_dataloader is 100% seeded
lebrice Nov 28, 2024
284011c
Fix bug with default device and configure_model
lebrice Nov 28, 2024
0c40eb1
Fix bug in llm_finetuning_test.py
lebrice Nov 28, 2024
e28eedf
Update regression files
lebrice Nov 28, 2024
24f0d3c
Update regression files for jax tests
lebrice Nov 28, 2024
b7a88ce
Revert "Update regression files for jax tests"
lebrice Nov 28, 2024
3ef914c
Add another xfail on llm reproducibility test :(
lebrice Nov 28, 2024
c569dd2
Add yet another xfail mark on llm test (!)
lebrice Nov 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,7 @@
".venv": true,
".pytest_cache": true,
".benchmarks": true,
".ruff_cache": true,
".regression_files": true
".ruff_cache": true
},
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
Expand Down Expand Up @@ -85,7 +84,6 @@
"containerEnv": {
"SCRATCH": "/home/vscode/scratch",
"SLURM_TMPDIR": "/tmp",
"NETWORK_DIR": "/network",
"UV_LINK_MODE": "symlink",
"UV_CACHE_DIR": "/home/vscode/.uv_cache"
},
Expand Down
4 changes: 2 additions & 2 deletions .github/actions-runner-job.sh
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=16G
#SBATCH --cpus-per-task=4
#SBATCH --mem=32G
#SBATCH --gpus=rtx8000:1
#SBATCH --time=00:30:00
#SBATCH --dependency=singleton
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ jobs:
local_integration_tests:
needs: [unit_tests, check_docs]
runs-on: self-hosted
timeout-minutes: 20
timeout-minutes: 30
strategy:
max-parallel: 1
matrix:
Expand Down Expand Up @@ -150,7 +150,7 @@ jobs:
name: Run integration tests on the ${{ matrix.cluster }} cluster in job ${{ needs.launch-slurm-actions-runner.outputs.job_id}}
needs: [launch-slurm-actions-runner]
runs-on: ${{ matrix.cluster }}
timeout-minutes: 20
timeout-minutes: 30
strategy:
max-parallel: 5
matrix:
Expand Down

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

Loading