Migrate to pydantic >= 2.0 #613

Vectorrent · 2024-06-04T02:42:16Z

This is a first pass at upgrading Pydantic to 2.7.3, as discussed in #597. Fortunately, Pydantic is barely used in Hivemind - so this should be fairly straightforward.

So far, I have extended the BaseModel class, in order to define a Config object with arbitrary_types_allowed = True. From what I can tell, this was the simplest way to deal with our validator's regex handling. The __get_validators__ method was removed in 2.0, and this function needed some work.

I also removed the _patch_schema calls, because 1) field.required was changed to field.is_required in 2.0, and 2) field.is_required is now read-only. I couldn't figure out how to set it... and the code seems to work with out it? So is it really necessary here?

DO NOT MERGE; this is an initial attempt. Within the next few days, I will test this in a multi-node environment, and I will test it with Petals as well.

Until then, this version of the code seems to work well. I wanted to open the PR, to get us all started.

Vectorrent · 2024-06-05T15:31:38Z

This PR is as far as I can reasonably take it. I fixed the linting errors introduced by my changes (but not the ones already existing in the master branch), and I reverted Pydantic to 2.5.3 (to make it compatible with CI environment). That said, Hivemind is also compatible with 2.7.3.

I tested this change with a multi-node training run, and it works well. I also tested this with Petals, and was able to run inference there as well.

I'd say this is ready for merge.

Vectorrent · 2024-06-05T17:05:24Z

It looks like #612 might fix the errors we're seeing in tests here.

justheuristic · 2024-06-06T08:39:05Z

NB: We'll need to update hivemind to fix incompatibility with the latest torch update to so we can test this (see failed tests). I intend to do this as soon as I have some bandwidth, but if someone's willing to do this earlier, please do.

Vectorrent · 2024-06-07T09:37:42Z

So, it appears that torch.cuda.amp was deprecated, and everything was moved to torch.amp. So, I fixed that.

Strangely, some of these tests will install torch 2.3.1, while others are installing 1.9.0 (and thus, failing). Not sure if that is intentional, but I'll keep digging.

samsja · 2024-06-07T10:05:45Z

It looks like #612 might fix the errors we're seeing in tests here.

yes it should fix all ci error, expect the black/isort one

samsja · 2024-06-07T10:17:13Z

NB: We'll need to update hivemind to fix incompatibility with the latest torch update to so we can test this (see failed tests). I intend to do this as soon as I have some bandwidth, but if someone's willing to do this earlier, please do.

fy: #612

Vectorrent · 2024-06-07T10:59:21Z

It looks like mine and @samsja's PR are addressing the same bug, so we should only merge one.

That said, I have no idea why these tests are failing right now, while theirs worked. All I'm seeing is Error: The operation was canceled.. Tests are not failing with my code; they are simply hanging forever, and cancelled after a while.

Both PRs still have the same bug, relating to the Albert test:

#15 20.13   × python setup.py egg_info did not run successfully.
#15 20.13   │ exit code: 1
#15 20.13   ╰─> [6 lines of output]
#15 20.13       Traceback (most recent call last):
#15 20.13         File "<string>", line 2, in <module>
#15 20.13         File "<pip-setuptools-caller>", line 34, in <module>
#15 20.13         File "/tmp/pip-install-_8jw_ha0/pathtools_37faf27303d54936a074efc743740ba1/setup.py", line 25, in <module>
#15 20.13           import imp
#15 20.13       ModuleNotFoundError: No module named 'imp'
#15 20.13       [end of output]

Apparently, the "imp" module was removed in Python 3.12, so I'm sure that's where the problem lies. I was able to reproduce this bug in Docker locally, so I'll spend some time offline troubleshooting.

Vectorrent · 2024-06-07T11:38:52Z

It seems that tests/test_optimizer.py is hanging at test_progress_tracker(). Since that code references BytesWithPublicKey, and my Pydantic changes touched that code - it's probably related. I just don't know why it hangs without doing anything.

If anyone with Pydantic experience wants to jump in, I'd appreciate it.

samsja · 2024-06-07T13:41:02Z

It seems that tests/test_optimizer.py is hanging at test_progress_tracker(). Since that code references BytesWithPublicKey, and my Pydantic changes touched that code - it's probably related. I just don't know why it hangs without doing anything.

If anyone with Pydantic experience wants to jump in, I'd appreciate it.

I just tried your PR with pydantic 2.5.3 and 2.7.3. Seems to be working fine, at least the test is not hanging for me. Happy to help if you hint me on how to reproduce the hanging behavior

Vectorrent · 2024-06-07T13:59:46Z

The code is not hanging anymore, but we are getting some failures relating to the validator. From what I can tell, match_regex() is never executing, and so - it makes sense that we're getting errors about missing fields. I don't know Pydantic well enough to know how to fix this.

Easiest way to reproduce is with the docker-compose.yml I added. Just cd into that directory, and run two commands:

docker compose build
docker compose up

That will execute the tests specified in docker-compose.yml. You can change that as needed.

Vectorrent · 2024-06-10T15:05:47Z

I reverted to v1 API, since v2 would probably break Pydantic usage in Petals as well... and I don't want to deal with that.

It seems that most tests are passing now. I have no idea why some of them are being cancelled, but it seems like they're timing-out. And these failures are not even consistent; whereas 3.8 was successful on the previous run, it timed-out on this attempt. Not sure what's happening there, but it looks like a Github issue more than anything.

Vectorrent · 2024-06-11T01:13:10Z

These tests keep failing with random, transient errors. What works on this run, will fail on the next. Hivemind is just being unreliable right now.

Vectorrent · 2024-06-11T01:37:25Z

I'd recommend merging, if you're okay with this code. Tests keep failing, but they are failing with transient errors that were probably not introduced by the Pydantic update:

test_averaging.py::test_allreduce_once[0-0] PASSED                       [ 23%]
test_averaging.py::test_allreduce_once[0-1] PASSED                       [ 23%]
test_averaging.py::test_allreduce_once[0-2] PASSED                       [ 24%]
test_averaging.py::test_allreduce_once[1-0] PASSED                       [ 24%]
test_averaging.py::test_allreduce_once[1-1] PASSED                       [ 24%]
test_averaging.py::test_allreduce_once[1-2] PASSED                       [ 25%]
test_averaging.py::test_allreduce_once[2-0] PASSED                       [ 25%]
test_averaging.py::test_allreduce_once[2-1] PASSED                       [ 25%]
test_averaging.py::test_allreduce_once[2-2] PASSED                       [ 25%]

Stuff like this is just a quirk of Hivemind. all_reduce can execute successfully 9 times in a row, before randomly hanging forever (until the test times-out). There is a deeper issue here, but one that should probably be fixed in a different PR.

justheuristic

LGTM

mryab

Thanks for the contribution! However, before merging, please reduce the diff of the PR to make sure all changes are Pydantic-related

LICENSE

docker-compose.yml

mryab

Thank you again for the pull request! I believe we are close to merging, but there is still a small list of changes that need to be made before this

tests/test_utils/p2p_daemon.py

tests/test_dht_schema.py

examples/albert/requirements.txt

benchmarks/benchmark_optimizer.py

mryab · 2024-06-12T21:42:10Z

requirements.txt

@@ -12,5 +12,5 @@ configargparse>=1.2.3
 py-multihash>=0.2.3
 multiaddr @ git+https://github.com/multiformats/py-multiaddr.git@e01dbd38f2c0464c0f78b556691d655265018cce
 cryptography>=3.4.6
-pydantic>=1.8.1,<2.0
+pydantic>=2.5.3


This version is quite recent, can you provide any reasoning behind how it was chosen? Maybe we can bump to just 2.0? Ideally, we should even keep backwards compatibility with older versions

I chose this version because it was the highest possible version I could use, which was still compatible with all of the tests. Remember: the whole reason we're upgrading Pydantic is because this old version has been conflicting with other dependencies, in other projects.

We can revert to 2.0.0 if you still want me to do that, though 2.5.3 seems to be working fine.

IMO, If the code is still compatible with pydantic v1 it might be worth it to just do pydantic>1.8.1

.github/workflows/run-tests.yml

Vectorrent · 2024-06-12T22:55:02Z

#619 and #620 to fix these errors again

Vectorrent · 2024-07-05T19:24:43Z

@mryab - is there anything else you need from me? Between the 3 open PRs, we should be able to get past these failing tests.

samsja · 2024-07-05T20:28:13Z

@mryab - is there anything else you need from me? Between the 3 open PRs, we should be able to get past these failing tests.

I have been using this PR for couple of weeks, seems to be stable.

requirements.txt

codecov · 2024-07-13T10:37:06Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.87%. Comparing base (d20e810) to head (6df0176).
Report is 4 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #613      +/-   ##
==========================================
+ Coverage   85.39%   85.87%   +0.47%     
==========================================
  Files          81       81              
  Lines        8006     8014       +8     
==========================================
+ Hits         6837     6882      +45     
+ Misses       1169     1132      -37

Files	Coverage Δ
hivemind/dht/schema.py	`100.00% <100.00%> (ø)`
hivemind/optim/progress_tracker.py	`97.80% <100.00%> (ø)`

... and 3 files with indirect coverage changes

Vectorrent added 2 commits June 3, 2024 21:27

first pass at migration to pydantic > 2.0

24aabf8

fix linting and tests

62bd489

justheuristic self-requested a review June 6, 2024 08:36

migrate deprecated torch.cuda.amp imports to torch.amp

8801689

add if/else check to handle differences between torch versions

78cfaed

fix linting

51542ed

update wandb package, fixing docker build

b254425

Vectorrent added 2 commits June 7, 2024 07:52

fix validators, and pass tests locally in Docker

945176a

fix deprecated pkg_resources import

6ca96f4

fix 2 tests

5cfe829

Vectorrent added 10 commits June 7, 2024 09:02

fix pkg_resources again

917eda7

pain

c968b58

partial fix of p2pd

c1dc7e2

revert to old failing method for now

42aa111

oops

6f43d74

fix pydantic deprecations, p2pd path handling

46efb9b

update black version

3045008

revert black

d4a4d26

make path handling work across all Python versions

055d1ce

revert path handling

df9f4e9

test 3.12 as well

cdb3de8

Vectorrent added 3 commits June 10, 2024 19:36

this error message has changed slightly

3e22780

update test with unexpected error message

eba80fa

rerun tests

a8ae34b

fix another edge case

84e225d

justheuristic approved these changes Jun 11, 2024

View reviewed changes

mryab requested changes Jun 11, 2024

View reviewed changes

LICENSE Outdated Show resolved Hide resolved

docker-compose.yml Outdated Show resolved Hide resolved

address comments

5a8c31f

mryab requested changes Jun 12, 2024

View reviewed changes

Vectorrent added 3 commits June 12, 2024 17:22

revert changes

47d932c

restore broken amp

da4d798

revert to 2.0.0

3cd9d38

Vectorrent added 4 commits June 13, 2024 18:32

un-fix other things

d3593eb

nit

19004cc

nit

b2d95c8

Restore Max's optimizer commit

ad080ed

Merge branch 'master' into upgrade-pydantic

6df0176

mryab reviewed Jul 13, 2024

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

restore a line break

13bfad3

mryab changed the title ~~migration to pydantic > 2.0~~ Migrate to pydantic > 2.0 Jul 13, 2024

mryab changed the title ~~Migrate to pydantic > 2.0~~ Migrate to pydantic >= 2.0 Jul 13, 2024

mryab approved these changes Jul 13, 2024

View reviewed changes

mryab merged commit 128ee90 into learning-at-home:master Jul 13, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate to pydantic >= 2.0 #613

Migrate to pydantic >= 2.0 #613

Vectorrent commented Jun 4, 2024

Vectorrent commented Jun 5, 2024

Vectorrent commented Jun 5, 2024

justheuristic commented Jun 6, 2024

Vectorrent commented Jun 7, 2024

samsja commented Jun 7, 2024 •

edited

Loading

samsja commented Jun 7, 2024

Vectorrent commented Jun 7, 2024

Vectorrent commented Jun 7, 2024

samsja commented Jun 7, 2024

Vectorrent commented Jun 7, 2024

Vectorrent commented Jun 10, 2024 •

edited

Loading

Vectorrent commented Jun 11, 2024

Vectorrent commented Jun 11, 2024

justheuristic left a comment

mryab left a comment

mryab left a comment

mryab Jun 12, 2024

Vectorrent Jun 12, 2024

samsja Jun 13, 2024

Vectorrent commented Jun 12, 2024

Vectorrent commented Jul 5, 2024

samsja commented Jul 5, 2024

codecov bot commented Jul 13, 2024

Migrate to pydantic >= 2.0 #613

Migrate to pydantic >= 2.0 #613

Conversation

Vectorrent commented Jun 4, 2024

Vectorrent commented Jun 5, 2024

Vectorrent commented Jun 5, 2024

justheuristic commented Jun 6, 2024

Vectorrent commented Jun 7, 2024

samsja commented Jun 7, 2024 • edited Loading

samsja commented Jun 7, 2024

Vectorrent commented Jun 7, 2024

Vectorrent commented Jun 7, 2024

samsja commented Jun 7, 2024

Vectorrent commented Jun 7, 2024

Vectorrent commented Jun 10, 2024 • edited Loading

Vectorrent commented Jun 11, 2024

Vectorrent commented Jun 11, 2024

justheuristic left a comment

Choose a reason for hiding this comment

mryab left a comment

Choose a reason for hiding this comment

mryab left a comment

Choose a reason for hiding this comment

mryab Jun 12, 2024

Choose a reason for hiding this comment

Vectorrent Jun 12, 2024

Choose a reason for hiding this comment

samsja Jun 13, 2024

Choose a reason for hiding this comment

Vectorrent commented Jun 12, 2024

Vectorrent commented Jul 5, 2024

samsja commented Jul 5, 2024

codecov bot commented Jul 13, 2024

Codecov Report

samsja commented Jun 7, 2024 •

edited

Loading

Vectorrent commented Jun 10, 2024 •

edited

Loading