Skip to content

Commit

Permalink
Abra merge test (#2870)
Browse files Browse the repository at this point in the history
* Bump torch to 2.1.1 version (#2717)

* Add more info when run doesnt complete (#2751)

* Lower sequence generation length on code gen to be dependent on max canonical solution length  (#2682)

* sequentialize generations_per_sample

* fix bug

* lower generation length

* lower generation length

* lower generation length

* fix gen len

* restore

* restore

* restore

* fix tests

* fix test

* Remove flatten params (#2761)

* remove flatten params

* simplify tests

* simplify tests

* clean

* fix more tests

* rerun tests

* speed up icl

* fix tests

* fix cpu tests

* add more fixtures

* fix tests

* token count

* fix vocab size

* remove logger

* remove clears

* fix mosaicml logger

* change codeowners

* clean up codeowners

* rerun tests

* shrink dataset

* fix tests

* fix test

* rerun tests

* fix tests

* fix tests

* fix seed

* set to 0

* rerun tests

* rerun tests

* change threshold

* rerun tests

* rerun tests

* logs

* remove changes

* logs

* logs

* remove logs

* rerun tests

* rerun tests

* logs

* rerun

* logs

* rerun

* rerun

* rerun tests

* many more logs

* rerun tests

* strip logs

* enable tests

* remove opt

* rerun tests

* add test

* lint

* rerun tests

* fix lint

* lint

* filter warnings

* rerun tests

* fixture

* add fixture

* change

* logs

* rerun tests

* add logs

* rerun tests

* fixture

* lint

* lint

* rerun tests

* fix ignore warning

* logs

* regex

* regex

* regex

* fix

* logs

* reformat

* fix lint (#2767)

* lint (#2768)

* Use time.tokens for speedmonitor instead of dataset length (#2762)

* change token math

* tokens

* add test

* fix tests

* remove exception (#2759)

* time to clean up time parsing 😉 (#2770)

* time to clean up time parsing

* fix type error

* updates

* Upgrade RunConfig compute specification (#2772)

* Upgrade RunConfig compute specification

* extra cluster

* Use async logging in MLflowLogger (#2693)

* async mlflow logging

Signed-off-by: chenmoneygithub <chen.qian@databricks.com>

* small fix

Signed-off-by: chenmoneygithub <chen.qian@databricks.com>

* clean up

* fix test

* fix tests

* deflake

* pin mlflow

---------

Signed-off-by: chenmoneygithub <chen.qian@databricks.com>

* Fix FSDP _param_init_fn to not reinit parameters multiple times (#2765)

* Gate FSDP param init test on torch 2.1 (#2774)

* Parallelize OCI multipart download (#2750)

* [UCVolumes] Add support for list API (#2769)

* Add the memory timeline profiling support through the PyTorch profiler. (#2771)

* v1

* fix issues

* add logs

* change names

* comment

* add device

* uncomment original trace

* add custome plot

* fix pyright

* Update composer/profiler/torch_profiler.py

Co-authored-by: Charles Tang <j316chuck@users.noreply.github.com>

* address comments

* fix code check

* fix formatting

* address comments

* add unit test

* fix check

* fix check

* fix check

* fix check

* fix print

* add test comment

* add test comment

---------

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
Co-authored-by: Charles Tang <j316chuck@users.noreply.github.com>

* Improve torch memory profiling arguments processing (#2777)

* improve torch profile args

* improve torch profile args

* change default torch_prof_memory_filename

* add memory profiling arg test

* fix check

* fix check

* fix check

* fix check

* fix check

* fix check

* Add platform AWS and bump aws ofi nccl version (#2776)

* Extend checkpoint loading to accept a validation function (#2726)

* Fix checkpoint validation tests for torch 1.13 (#2779)

* fix checkpoint validation tests for torch 1.13

* Fix

* Bump version to 0.17.2 (#2780)

* bump version

* 0.17.2

* update matrix

* bump transformers version (#2781)

* Bump sphinxext-opengraph from 0.9.0 to 0.9.1 (#2784)

Bumps [sphinxext-opengraph](https://github.com/wpilibsuite/sphinxext-opengraph) from 0.9.0 to 0.9.1.
- [Release notes](https://github.com/wpilibsuite/sphinxext-opengraph/releases)
- [Commits](wpilibsuite/sphinxext-opengraph@v0.9.0...v0.9.1)

---
updated-dependencies:
- dependency-name: sphinxext-opengraph
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump coverage[toml] from 7.3.0 to 7.3.3 (#2783)

Bumps [coverage[toml]](https://github.com/nedbat/coveragepy) from 7.3.0 to 7.3.3.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](nedbat/coveragepy@7.3.0...7.3.3)

---
updated-dependencies:
- dependency-name: coverage[toml]
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update torch requirement from <2.1.2,>=1.13.1 to >=1.13.1,<2.1.3 (#2785)

Updates the requirements on [torch](https://github.com/pytorch/pytorch) to permit the latest version.
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](pytorch/pytorch@v1.13.1...v2.1.2)

---
updated-dependencies:
- dependency-name: torch
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [UCVolumes] Rely on databricks-sdk auth for the right requirements (#2789)

* Enable system metrics in mosaic mlflow logger (#2775)

* Enable system metrics in mosaic mlflow logger

* remove fixture

* Update composer/loggers/mlflow_logger.py

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

* Update composer/loggers/mlflow_logger.py

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

* Update composer/loggers/mlflow_logger.py

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

---------

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>

* Update parse_uri (#2787)

* default-no-memory-timeline (#2790)

* Add eot token to ICL generate kwargs (#2782)

* add custome gen kwargs and stopping on eos token

* modify test

* modify test

* finish

* finish

* finish

* finish

* Add nightly image for torch 2.2.0 12-20-23 (#2791)

* Add torch nightly 12-13 (#2792)

* Add process group as arg to FSDP (#2794)

* add test

* only cast if PG is specified

* add to docstring

* filter warning

* filter warning

* docs

* support lists

* remove warnings

* lint

* hsdp monkeypatch

* logs

* change log

* fix patch

* typo

* clean up logs

* Bump coverage[toml] from 7.3.3 to 7.3.4 (#2798)

Bumps [coverage[toml]](https://github.com/nedbat/coveragepy) from 7.3.3 to 7.3.4.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](nedbat/coveragepy@7.3.3...7.3.4)

---
updated-dependencies:
- dependency-name: coverage[toml]
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix load_ignore_keys with rng (#2803)

* fix rng load

* lint

* Bump ipykernel from 6.26.0 to 6.28.0 (#2806)

Bumps [ipykernel](https://github.com/ipython/ipykernel) from 6.26.0 to 6.28.0.
- [Release notes](https://github.com/ipython/ipykernel/releases)
- [Changelog](https://github.com/ipython/ipykernel/blob/main/CHANGELOG.md)
- [Commits](ipython/ipykernel@v6.26.0...v6.28.0)

---
updated-dependencies:
- dependency-name: ipykernel
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump junitparser from 3.1.0 to 3.1.1 (#2805)

Bumps [junitparser](https://github.com/weiwei/junitparser) from 3.1.0 to 3.1.1.
- [Changelog](https://github.com/weiwei/junitparser/blob/master/CHANGELOG.md)
- [Commits](weiwei/junitparser@3.1.0...3.1.1)

---
updated-dependencies:
- dependency-name: junitparser
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump pytest from 7.4.3 to 7.4.4 (#2807)

Bumps [pytest](https://github.com/pytest-dev/pytest) from 7.4.3 to 7.4.4.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](pytest-dev/pytest@7.4.3...7.4.4)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Avoid futures on close for MosaicML logger (#2804)

* avoid futures on close

* typo

* logs

* logs

* check (#2812)

* Better communication computation overlap (#2811)

* patched torch

* fixed torch imports

* fixed torch imports

* fixed torch imports

* patching through composer

* patching through composer

* patching typingr

* comment added

* don't patch torch 2.1.0

* patch torch 2.1.1 and 2.2.0

* linting fix

* Improve error message for speed monitor (#2801)

* fix flops

* stacklevel

* bump torch version (#2814)

* bump vision (#2815)

* fix rng load (#2816)

* Correct multi-unshard stream patching for torch 2.2.0dev, and stream waiting correctness. (#2817)

* patched torch

* fixed torch imports

* fixed torch imports

* fixed torch imports

* patching through composer

* patching through composer

* patching typingr

* comment added

* don't patch torch 2.1.0

* patch torch 2.1.1 and 2.2.0

* linting fix

* waiting on computation stream from unshard stream

* waiting on computation stream from unshard stream

* less waiting

* no waiting

* all unshard streams wait on computation stream now

* 2.2.0 dev change

* fix profiler (#2818)

* Bump traitlets from 5.13.0 to 5.14.1 (#2822)

Bumps [traitlets](https://github.com/ipython/traitlets) from 5.13.0 to 5.14.1.
- [Release notes](https://github.com/ipython/traitlets/releases)
- [Changelog](https://github.com/ipython/traitlets/blob/main/CHANGELOG.md)
- [Commits](ipython/traitlets@v5.13.0...v5.14.1)

---
updated-dependencies:
- dependency-name: traitlets
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* All unshard streams wait on computation every step (#2823)

* patched torch

* fixed torch imports

* fixed torch imports

* fixed torch imports

* patching through composer

* patching through composer

* patching typingr

* comment added

* don't patch torch 2.1.0

* patch torch 2.1.1 and 2.2.0

* linting fix

* waiting on computation stream from unshard stream

* waiting on computation stream from unshard stream

* less waiting

* no waiting

* all unshard streams wait on computation stream now

* 2.2.0 dev change

* correct waiting on computation stream

* fsdp state typiung

* patching root pre forward

* patching root pre forward

* fsdp state typing

* patch forward

* correct waiting

* linting

* Add encoding=utf-8 (#2824)

* Fix import for daily test (#2826)

* patched torch

* fixed torch imports

* fixed torch imports

* fixed torch imports

* patching through composer

* patching through composer

* patching typingr

* comment added

* don't patch torch 2.1.0

* patch torch 2.1.1 and 2.2.0

* linting fix

* waiting on computation stream from unshard stream

* waiting on computation stream from unshard stream

* less waiting

* no waiting

* all unshard streams wait on computation stream now

* 2.2.0 dev change

* correct waiting on computation stream

* fsdp state typiung

* patching root pre forward

* patching root pre forward

* fsdp state typing

* patch forward

* correct waiting

* linting

* daily test change

* daily test fix

* [MLFlowObjectStore] [1/2] Base implementation for MLFlowObjectStore (#2802)

* Implementation of MLFlowObjectStore

* Update object store test settings

* Import mlflow dependencies inline

* Fix tests and ignore some pyright

* Bugfix

* Enforce experiment and run in get_artifact_path

* Update placeholders

* Make logs debug instead of info

* Minor PR comments

* MLflow casing

* tracking_uri fixes

* Update comments

* Update placeholders

* Fix tests

* Fix pyright

* Use tempfile for temp dirs

* Read tracking uri env var directly

* Remove dist from MLFlowObjectStore

---------

Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>

* Remove fused layernorm (already deprecated for 2 versions) (#2827)

* remove fused layernorm

* remove import

* remove import

* remove

* fix

* remove docs

* all

* fix

* filter warnings

* norm

* lint

* refactor

---------

Co-authored-by: Your Name <you@example.com>

* checkpoint saver tracks all checkpoints/intervals in state (#2819)

* checkpoint tracking state

* fix some tests

* Update tests/callbacks/test_checkpoint_saver.py

* Checkpoint itself should be included in state, dont pickle timestamp object

* patch the key error (doesnt fix the bug though :sad:)

* avoid slashes in state, adjust tests

* fix gpu test, probably

* formatting

* feedback

* add a comment

* Apply suggestions from code review

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

---------

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

* code-quality timeout update (#2830)

Timed out after 10 minutes here https://github.com/mosaicml/composer/actions/runs/7465107219/job/20313553654?pr=2819 

Bumps runtime up to 15min

* [S] Fix how single value tensors are logged (#2831)

Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>

* Adds DTensor Support (#2821)

* fixes to get dtensor to work

* more fixes

* Change state dict materialization for new version of torch

* get load working for new set_state_dict api

* use device_mesh

* Add fsdp init monkeypatch for DTensor

* Add checkpoint profiling logs

* attempt

* working single node

* fix optimizer

* allow 3d device mesh

* attempt to use different pg during 3d mesh save

* undo 3d mesh changes

* load_state_dict -> load

* allow parent mesh in FSDP init

* allow override of force_sync_module_states

* remove unnecessary exit

* ignore _validate_and_get_shard_state()

* save/load hsdp-moe working

* remove prints

* v1

* v2

* lint

* add more tests

* switch to PRs

* ignore warning

* fix lint

* version error

* fix version

* fix state dict

* update versions

* lint

* lint

* disable lint for mosaic fsdp utils

* remove bad line

* move around for legacy

* device mesh

* ignore warning

* fix import

* always init

* fix error

* fix load planner

* remove

* fix lint

* lint

* delay state dict

* test checkpoint

* checkpoint

* fix cpu tests

* fix rotate tests

* fix precision

* lint

* fix alibi

* cleanup

* cleanup

* remove force sync

* fix type

* merge

* lint

* fix gpt

* comment

* fix test

* lint

* minor optimizations

* Update composer/core/state.py

Co-authored-by: Evan Racah <evan@mosaicml.com>

* revert tests

---------

Co-authored-by: Evan Racah <ejracah@gmail.com>
Co-authored-by: Abhinav Venigalla <abhi.venigalla@databricks.com>
Co-authored-by: root <23239305+b-chu@users.noreply.github.com>
Co-authored-by: Abhinav Venigalla <abhi@mosaicml.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Evan Racah <evan@mosaicml.com>

* Remove duplicate checkpoint verifications (#2828)

* Fix seed for FSDP wrap (#2833)

* first try

* add context

* lint

* more lint

* remove comment

---------

Co-authored-by: Daniel King <daniel@mosaicml.com>
Co-authored-by: Your Name <you@example.com>

* Remove fsdp patch for comm overlap (#2836)

* allow hsdp (#2838)

* Bump torch 2.1.2 (#2840)

* bump torch

* bump

* bump

* Upgrade pyright to 1.1.310 (#2841)

* [MLFlowObjectStore] [2/2] Support checkpointing with MLFlow (#2810)

* Support checkpoint uploads to MLFlow (untested)

Use MLFlow run tag for autoresume

Add MLFlowLogger test for existing composer run tag

* Try formatting mlflow save folder after INIT

Make MLFlow experiment and run ID available on all ranks

Fix path issue

Format mlflow placeholders in remote filenames

* Unit tests for partial_format

* Log mlflow info as hyperparams

* partial_format doc update

* Fix formatting

* Pull distributed logic out of MLFlowObjectStore

Add debug tracebacks

Bugfix

Add path to debug info

Try fixing RUD object store init

Pyright

* Partial format in format_name helpers

* Fix import

* Add extra partial_format test

* Fix mlflow RUD check

* Fix test

pyright

No longer expect KeyError for format_with_dist using partial_format

Refactor partial_format for readability

* Max iters on partial_format

* Fix partial_format

* Clean up

* fix test import

* Fix test

* update nightly to torch 2.3 (#2842)

* update nightly to torch 2.3

* tighten

---------

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

* Pin sphinxcontrib applehelp (#2854)

* pin release

* bump

* break pypi

* tighter pin

* pin

* pin

* pin

* Update setup.py (#2855)

* Torch 2.3 patch (#2849)

* add monkeypatch for verify_options

* patch

* fix

* fix

* partial precommit

* bit of cleanup

* doc

* debug

* fix version pinning

* precommit

* checkdown

* lint

---------

Co-authored-by: Evan Racah <ejracah@gmail.com>
Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

* Update mosaicml-cli requirement from <0.6,>=0.5.25 to >=0.5.25,<0.7 (#2866)

Updates the requirements on [mosaicml-cli](https://github.com/mosaicml/mosaicml-cli) to permit the latest version.
- [Commits](https://github.com/mosaicml/mosaicml-cli/commits)

---
updated-dependencies:
- dependency-name: mosaicml-cli
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Rewrite to use individual state functions (#2860)

* checkdown

* checkdown

* lint

* fix

* load ignore keys

* fix

* resolve comments

* fix load ignore keys

* offload

* fix gate

* merge

* lint

* use flag

* force trye

* Add custom stopping criteria to ICL generate tasks (#2800)

* add custome gen kwargs and stopping on eos token

* modify test

* modify test

* finish

* finish

* finish

* finish

* finish pr

* implement early stop

* add tesT

* fix bug

* bug fix

* add keys

* diff split

* fix typo

* fix precommit

* fix precommit

* fix precommit

* fix precommit

* fix precommit

* fix precommit

* fix conditional import

* add nlp metrics

* remove code gen changes

* fix nits

---------

Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>

* Add save_ignore_keys (#2868)

* comment

* add it

* debug

* add the keys

* debug

* debug

* remove print statement

* docs and tests

* fix tests

---------

Co-authored-by: Daniel King <daniel@mosaicml.com>

---------

Signed-off-by: chenmoneygithub <chen.qian@databricks.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Charles Tang <j316chuck@users.noreply.github.com>
Co-authored-by: Anna <anna@mosaicml.com>
Co-authored-by: Jeremy D <115047575+bmosaicml@users.noreply.github.com>
Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
Co-authored-by: Chen Qian <chenmoney@google.com>
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
Co-authored-by: coryMosaicML <83666378+coryMosaicML@users.noreply.github.com>
Co-authored-by: Harsh Panchal <68880048+panchalhp-db@users.noreply.github.com>
Co-authored-by: willgleich <22464726+willgleich@users.noreply.github.com>
Co-authored-by: Irene Dea <deaairene@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: snarayan21 <saaketh@mosaicml.com>
Co-authored-by: Jerry Chen <jerry.chen@databricks.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Evan Racah <ejracah@gmail.com>
Co-authored-by: Abhinav Venigalla <abhi.venigalla@databricks.com>
Co-authored-by: root <23239305+b-chu@users.noreply.github.com>
Co-authored-by: Abhinav Venigalla <abhi@mosaicml.com>
Co-authored-by: Evan Racah <evan@mosaicml.com>
Co-authored-by: Daniel King <daniel@mosaicml.com>
  • Loading branch information
21 people authored Jan 17, 2024
1 parent cf6d9e9 commit 76a0e43
Show file tree
Hide file tree
Showing 154 changed files with 4,866 additions and 1,820 deletions.
28 changes: 19 additions & 9 deletions .github/mcli/mcli_pytest.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import argparse
import time

from mcli import RunConfig, RunStatus, create_run, follow_run_logs, stop_run, wait_for_run_status
from mcli import RunConfig, RunStatus, create_run, follow_run_logs, wait_for_run_status

if __name__ == '__main__':

Expand Down Expand Up @@ -67,8 +67,6 @@
export COMMON_ARGS="-v --durations=20 -m '{args.pytest_markers}' {s3_bucket_flag} {clear_tmp_path_flag}"
export PYTHONUNBUFFERED=1
make test PYTEST='{args.pytest_command}' EXTRA_ARGS="$COMMON_ARGS --codeblocks"
make test-dist PYTEST='{args.pytest_command}' EXTRA_ARGS="$COMMON_ARGS" WORLD_SIZE=2
Expand All @@ -79,13 +77,25 @@
'''
config = RunConfig(
name=name,
cluster=args.cluster,
gpu_type=args.gpu_type,
gpu_num=args.gpu_num,
compute={
'cluster': args.cluster,
'gpu_type': args.gpu_type,
'gpus': args.gpu_num
},
image=args.image,
integrations=[git_integration],
command=command,
scheduling={'max_duration': args.timeout / 60 / 60},
env_variables=[
{
'key': 'MOSAICML_PLATFORM',
'value': 'False',
},
{
'key': 'PYTHONUNBUFFERED',
'value': '1',
},
],
)

# Create run
Expand All @@ -102,7 +112,7 @@
print(line, end='')

print('[GHA] Run completed. Waiting for run to finish...')
run = wait_for_run_status(run, status='completed')
run = wait_for_run_status(run, status=RunStatus.COMPLETED)

# Fail if command exited with non-zero exit code or timed out
assert run.status == RunStatus.COMPLETED
# Fail if command exited with non-zero exit code or timed out (didn't reach COMPLETED)
assert run.status == RunStatus.COMPLETED, f'Run {run.name} did not complete: {run.status} ({run.reason})'
2 changes: 1 addition & 1 deletion .github/workflows/code-quality.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ defaults:
jobs:
code-quality:
runs-on: ubuntu-20.04
timeout-minutes: 10
timeout-minutes: 15
strategy:
matrix:
python_version:
Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/pr-cpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,11 @@ jobs:
markers: 'not daily and not remote and not gpu and not vision and not doctest'
pytest_command: 'coverage run -m pytest'
composer_package_name: 'mosaicml'
# - name: 'cpu-3.10-2.2'
# container: mosaicml/pytorch:2.2.0_cu121-nightly20231213-python3.10-ubuntu20.04
# markers: 'not daily and not remote and not gpu and not vision and not doctest'
# pytest_command: 'coverage run -m pytest'
# composer_package_name: 'mosaicml'
- name: 'cpu-vision'
container: mosaicml/pytorch_vision:1.13.1_cpu-python3.10-ubuntu20.04
markers: 'not daily and not remote and not gpu and vision and not doctest'
Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/pr-gpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,11 @@ jobs:
markers: 'not daily and not remote and gpu and (doctest or not doctest)'
pytest_command: 'coverage run -m pytest'
composer_package_name: 'mosaicml'
# - name: 'gpu-3.10-2.2'
# container: mosaicml/pytorch:2.2.0_cu121-nightly20231213-python3.10-ubuntu20.04
# markers: 'not daily and not remote and gpu and (doctest or not doctest)'
# pytest_command: 'coverage run -m pytest'
# composer_package_name: 'mosaicml'
name: ${{ matrix.name }}
if: github.repository_owner == 'mosaicml'
with:
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ repos:
types: [python]
pass_filenames: false
args: [--warnings]
additional_dependencies: ["pyright@1.1.256"]
additional_dependencies: ["pyright@1.1.310"]
- repo: https://github.com/trufflesecurity/trufflehog.git
rev: v3.40.0
hooks:
Expand Down
6 changes: 3 additions & 3 deletions CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@
# as an owner for all sections, so anyone on Composer Eng can approve any Composer PR
# According to the CODEOWNER docs, the last match takes precedence, so @mosaicml/composer-team-eng
# must be mentioned for each rule below.
/composer/algorithms/ @dskhudia @mvpatel2000 @nik-mosaic
/composer/algorithms/ @mosaicml/composer-team-eng
/composer/cli/ @mosaicml/composer-team-eng
/composer/datasets/ @mosaicml/composer-team-eng
/composer/functional/ @dblalock @mvpatel2000
/composer/loggers/ @eracah @dakinggg
/composer/functional/ @mosaicml/composer-team-eng @dblalock
/composer/loggers/ @mosaicml/composer-team-eng @eracah @dakinggg
/composer/loss/ @mosaicml/composer-team-eng
/composer/metrics/ @mosaicml/composer-team-eng
/composer/models/ @mosaicml/composer-team-eng
Expand Down
2 changes: 0 additions & 2 deletions composer/algorithms/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@ def apply(self, state: State, event: Event, logger: Logger):
from composer.algorithms.cutout import CutOut
from composer.algorithms.ema import EMA
from composer.algorithms.factorize import Factorize
from composer.algorithms.fused_layernorm import FusedLayerNorm
from composer.algorithms.gated_linear_units import GatedLinearUnits
from composer.algorithms.ghost_batchnorm import GhostBatchNorm
from composer.algorithms.gradient_clipping import GradientClipping
Expand Down Expand Up @@ -79,7 +78,6 @@ def apply(self, state: State, event: Event, logger: Logger):
'CutOut',
'EMA',
'Factorize',
'FusedLayerNorm',
'GatedLinearUnits',
'GhostBatchNorm',
'GradientClipping',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
from composer.utils import MissingConditionalImportError

try:
from composer.algorithms.alibi.attention_surgery_functions import _bert, _gpt2 # pyright: reportUnusedImport=none
from composer.algorithms.alibi.attention_surgery_functions import _bert # pyright: ignore[reportUnusedImport]
from composer.algorithms.alibi.attention_surgery_functions import _gpt2 # pyright: ignore[reportUnusedImport]
from composer.algorithms.alibi.attention_surgery_functions.utils import policy_registry
except ImportError as e:
raise MissingConditionalImportError(extra_deps_group='nlp', conda_package='transformers') from e
Expand Down
10 changes: 6 additions & 4 deletions composer/algorithms/alibi/attention_surgery_functions/_bert.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Copyright 2022 MosaicML Composer authors
# SPDX-License-Identifier: Apache-2.0

import copy
import math
from types import MethodType
from typing import Optional, Tuple
Expand All @@ -20,13 +21,14 @@ def bert_embedding_converter(module: torch.nn.Module, module_index: int, max_seq
"""
assert isinstance(module, (BertEmbeddings, RobertaEmbeddings))
del module_index # unused
zero_and_freeze_expand_position_embeddings(module,
new_module = copy.deepcopy(module)
zero_and_freeze_expand_position_embeddings(new_module,
max_sequence_length,
position_embedding_attribute='position_embeddings')

module_device = next(module.parameters()).device
module.register_buffer('position_ids', torch.arange(max_sequence_length).expand((1, -1)).to(module_device))
return module
module_device = next(new_module.parameters()).device
new_module.register_buffer('position_ids', torch.arange(max_sequence_length).expand((1, -1)).to(module_device))
return new_module


@policy_registry.register(BertSelfAttention, RobertaSelfAttention)
Expand Down
15 changes: 10 additions & 5 deletions composer/algorithms/colout/colout.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,12 @@
__all__ = ['ColOut', 'ColOutTransform', 'colout_batch']


def colout_batch(sample: Union[ImgT, Tuple[ImgT, ImgT]],
p_row: float = 0.15,
p_col: float = 0.15,
resize_target: Union[bool, str] = 'auto') -> Union[ImgT, Tuple[ImgT, ImgT]]:
def colout_batch(
sample: Union[ImgT, Tuple[ImgT, ImgT]],
p_row: float = 0.15,
p_col: float = 0.15,
resize_target: Union[bool,
str] = 'auto') -> Union[torch.Tensor, ImgT, Tuple[Tensor, Tensor], Tuple[ImgT, ImgT]]:
"""Applies ColOut augmentation to a batch of images and (optionally) targets,
dropping the same random rows and columns from all images and targets in a batch.
Expand Down Expand Up @@ -136,7 +138,10 @@ def __init__(self, p_row: float = 0.15, p_col: float = 0.15, resize_target: Unio
self.p_col = p_col
self.resize_target = resize_target

def __call__(self, sample: Union[ImgT, Tuple[ImgT, ImgT]]) -> Union[ImgT, Tuple[ImgT, ImgT]]:
def __call__(
self, sample: Union[ImgT,
Tuple[ImgT,
ImgT]]) -> Union[torch.Tensor, ImgT, Tuple[Tensor, Tensor], Tuple[ImgT, ImgT]]:
"""Drops random rows and columns from up to two images.
Args:
Expand Down
17 changes: 9 additions & 8 deletions composer/algorithms/factorize/factorize_modules.py
Original file line number Diff line number Diff line change
Expand Up @@ -327,8 +327,8 @@ def solution_for_rank(self, input: torch.Tensor, rank: int) -> LowRankSolution:

def apply_solution(self, solution: LowRankSolution):
self.latent_size = solution.rank
self.module0.out_channels = solution.rank
self.module1.in_channels = solution.rank
self.module0.out_channels = solution.rank # pyright: ignore[reportGeneralTypeIssues]
self.module1.in_channels = solution.rank # pyright: ignore[reportGeneralTypeIssues]
_apply_solution_to_module_parameters(solution, self.module0, self.module1, transpose=False)

@staticmethod
Expand Down Expand Up @@ -452,8 +452,8 @@ def solution_for_rank(self, input: torch.Tensor, rank: int) -> LowRankSolution:

def apply_solution(self, solution: LowRankSolution) -> None:
self.latent_size = solution.rank
self.module0.out_features = solution.rank
self.module1.in_features = solution.rank
self.module0.out_features = solution.rank # pyright: ignore[reportGeneralTypeIssues]
self.module1.in_features = solution.rank # pyright: ignore[reportGeneralTypeIssues]
_apply_solution_to_module_parameters(solution, self.module0, self.module1, transpose=True)

@staticmethod
Expand All @@ -471,9 +471,10 @@ def max_allowed_latent_channels(in_features: int, out_features: int) -> int:

@staticmethod
def from_linear(module: torch.nn.Linear, module_ix: int = -1, **kwargs) -> FactorizedLinear:
ret = FactorizedLinear(in_features=module.in_features,
out_features=module.out_features,
bias=((module.bias is not None) and (module.bias is not False)),
**kwargs)
ret = FactorizedLinear(
in_features=module.in_features,
out_features=module.out_features,
bias=(module.bias is not None and module.bias is not False), # pyright: ignore[reportUnnecessaryComparison]
**kwargs)
ret.reset_parameters()
return ret
92 changes: 0 additions & 92 deletions composer/algorithms/fused_layernorm/README.md

This file was deleted.

14 changes: 0 additions & 14 deletions composer/algorithms/fused_layernorm/__init__.py

This file was deleted.

Loading

0 comments on commit 76a0e43

Please sign in to comment.