Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to load multiple copies of a model across processes #31052

Merged

Conversation

damccorm
Copy link
Contributor

@damccorm damccorm commented Apr 19, 2024

Sometimes loading one model per process isn't feasible, but you still want multiple models loaded (e.g. if you have a GPU that can hold 3 copies of the model). This gives users the ability to express this.

Design doc - https://docs.google.com/document/d/1FmKrBHkb8YTYz_Dcec7JlTqXwy382ar8Gxicr_s13c0/edit


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@damccorm damccorm marked this pull request as ready for review April 22, 2024 19:42
@damccorm
Copy link
Contributor Author

damccorm commented Apr 22, 2024

R: @tvalentyn @liferoad

Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

sdks/python/apache_beam/ml/inference/base.py Outdated Show resolved Hide resolved
sdks/python/apache_beam/ml/inference/base.py Outdated Show resolved Hide resolved
sdks/python/apache_beam/ml/inference/base.py Outdated Show resolved Hide resolved
sdks/python/apache_beam/ml/inference/base.py Outdated Show resolved Hide resolved
sdks/python/apache_beam/ml/inference/base.py Outdated Show resolved Hide resolved
sdks/python/apache_beam/ml/inference/base.py Outdated Show resolved Hide resolved
sdks/python/apache_beam/ml/inference/base.py Show resolved Hide resolved
sdks/python/apache_beam/ml/inference/base.py Show resolved Hide resolved
@@ -234,6 +235,9 @@ def __init__(
memory pressure if you load multiple copies. Given a model that
consumes N memory and a machine with W cores and M memory, you should
set this to True if N*W > M.
model_copies: The exact number of models that you would like loaded
onto your machine. This can be useful if you exactly know your CPU or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible wording suggestion:

This can be useful if you exactly know your CPU or GPU capacity and want to maximize resource utilization.

If set, large_model becomes a no-op.

Maybe this should be a ValueError if a user specifies both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible wording suggestion:

Updated

Maybe this should be a ValueError if a user specifies both?

I don't think it should be a ValueError - if you change it to True and you set this param, that is kinda reasonable and a no-op makes sense IMO since we're still honoring your choice.

Copy link
Contributor

@tvalentyn tvalentyn Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah my concern was not an incorrect configuration but cognitive burden for users: would they be thinking if they should set only one param, or both in their use case, while in the end it doesn't matter. but now it also seems that large_model becomes redundant as it is equivalent to passing model_copies = 1, right?

Possibly except the fact that using model_copies is currently disallowed with KeyedMH, and large_model might still allow that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but now it also seems that large_model becomes redundant as it is equivalent to passing model_copies = 1, right?

That's right, though I think long term I would like for us to do smart things here (e.g. large_model becomes "pack as many models as you can fit). There's some conversation on this general idea in the design doc

Copy link
Contributor

@tvalentyn tvalentyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left one more comment on the design doc

@@ -234,6 +235,9 @@ def __init__(
memory pressure if you load multiple copies. Given a model that
consumes N memory and a machine with W cores and M memory, you should
set this to True if N*W > M.
model_copies: The exact number of models that you would like loaded
onto your machine. This can be useful if you exactly know your CPU or
Copy link
Contributor

@tvalentyn tvalentyn Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah my concern was not an incorrect configuration but cognitive burden for users: would they be thinking if they should set only one param, or both in their use case, while in the end it doesn't matter. but now it also seems that large_model becomes redundant as it is equivalent to passing model_copies = 1, right?

Possibly except the fact that using model_copies is currently disallowed with KeyedMH, and large_model might still allow that.

@@ -952,6 +977,12 @@ def get_preprocess_fns(self) -> Iterable[Callable[[Any], Any]]:
def should_skip_batching(self) -> bool:
return True

def share_model_across_processes(self) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(cleanup, can be deferred)

we can leverage reflection here and delegate calls to base via __getattr__ like in

https://github.com/apache/beam/blob/37609ba70fab2216edc338121bf2f3a056a1e490/sdks/python/apache_beam/internal/gcp/auth.py

Per https://stackoverflow.com/questions/2405590/how-do-i-override-getattr-without-breaking-the-default-behavior, explicitly defined methods should take priority.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. I think we should do it, but agree deferring is a good idea to keep the PR to a single purpose

sdks/python/apache_beam/ml/inference/base.py Outdated Show resolved Hide resolved
@damccorm damccorm merged commit 3c8a881 into apache:master Apr 25, 2024
73 checks passed
@damccorm damccorm deleted the users/damccorm/runInferenceMultiProcess branch April 25, 2024 15:32
damccorm added a commit that referenced this pull request Apr 25, 2024
* Add ability to load multiple copies of a model across processes

* push changes I had locally not remotely

* Lint

* naming + lint

* Changes from feedback
damccorm added a commit that referenced this pull request Apr 26, 2024
…) (#31104)

* Add ability to load multiple copies of a model across processes

* push changes I had locally not remotely

* Lint

* naming + lint

* Changes from feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants