Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go SDK] RunInference wrapper supporting Sklearn Model Handler #24497

Merged
merged 18 commits into from
Dec 14, 2022

Conversation

riteshghorse
Copy link
Contributor

@riteshghorse riteshghorse commented Dec 2, 2022

This PR adds RunInference Wrapper with Sklearn Model Handler. At present it only supports non-keyed input PCollection. Support for Keyed PCollection and Pytorch Model will be added in a later PR.

The goal of this PR is to mostly finalize the api interaction from user point of view.

Sample Job

(I manually uploaded the tmp model to gcs bucket but I see that Java uses semi-persistent directory (if present) to load that.)

Part of #23382


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@github-actions github-actions bot added the go label Dec 2, 2022
@codecov
Copy link

codecov bot commented Dec 2, 2022

Codecov Report

Merging #24497 (cf6d598) into master (59849d6) will increase coverage by 0.00%.
The diff coverage is 13.51%.

❗ Current head cf6d598 differs from pull request most recent head 03df75e. Consider uploading reports for the commit 03df75e to get more accurate results

@@           Coverage Diff            @@
##           master   #24497    +/-   ##
========================================
  Coverage   73.35%   73.35%            
========================================
  Files         719      719            
  Lines       97137    97246   +109     
========================================
+ Hits        71251    71333    +82     
- Misses      24539    24564    +25     
- Partials     1347     1349     +2     
Flag Coverage Δ
go 51.57% <13.51%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/go/pkg/beam/core/runtime/graphx/translate.go 38.05% <0.00%> (-0.38%) ⬇️
sdks/go/pkg/beam/core/runtime/xlangx/expand.go 0.00% <0.00%> (ø)
sdks/go/pkg/beam/core/runtime/xlangx/namespace.go 90.58% <50.00%> (-5.42%) ⬇️
...ache_beam/runners/dataflow/ptransform_overrides.py 90.90% <0.00%> (-7.76%) ⬇️
sdks/python/apache_beam/io/fileio.py 96.05% <0.00%> (-0.05%) ⬇️
sdks/python/apache_beam/transforms/core.py 92.90% <0.00%> (-0.02%) ⬇️
sdks/python/apache_beam/runners/render.py 49.45% <0.00%> (ø)
...dks/python/apache_beam/options/pipeline_options.py 93.96% <0.00%> (+0.01%) ⬆️
sdks/python/apache_beam/io/filebasedsink.py 95.90% <0.00%> (+0.01%) ⬆️
... and 8 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2022

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @lostluck for label go.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

sdks/go/pkg/beam/transforms/xlang/xlang.go Show resolved Hide resolved

services := integration.NewExpansionServices()
defer func() { services.Shutdown() }()
addr, err := services.GetAddr("python_transform")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still don't have this expansion service generated/set by the integration script, so neither this nor the dataframe test will be running.

exit_background_processes () {

inputRow := [][]int64{{0, 0}, {1, 1}}
input := beam.CreateList(s, inputRow)
kwargs := inference.SklearnKwargs{
ModelURI: "/tmp/staged/sklearn_model",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where/how is this staged? where is this file?
Is it automatically done from the python code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting it to be staged by python code and then we could have used the semi_persist_dir to load the model from there. Seems like java uses semi_persist_dir. For the test, I manually uploaded the model file to gcs bucket and passed the gs://... address there. Updated the PR description with job link.

Tagging @chamikaramj to know how java loads the model. If it is loaded from semi_persist_dir do we have to change something on Go SDK side or is it staged by the python code?

@riteshghorse
Copy link
Contributor Author

I have updated the api to have generic options. I'm yet to try out remaining open comments.
Please let me know how the new API looks

@riteshghorse
Copy link
Contributor Author

Did changes for named outputs, PTAL

Copy link
Contributor

@lostluck lostluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo, but otherwise LGTM!

I like the "model basis" object for building the pipeline. I think it makes the usage clear.

sdks/go/pkg/beam/transforms/xlang/inference/inference.go Outdated Show resolved Hide resolved
@riteshghorse
Copy link
Contributor Author

riteshghorse commented Dec 14, 2022

In the last commit I added the functionality to infer extra packages when starting an automated expansion service. Please let me know any comments on that. Sorry for the late addition.

Sample Job with Automated Expansion Service (It uses Python's release 2.43.0 and the dev version of Go SDK tagged as 2.43.0)

@lostluck lostluck merged commit bdfd27e into apache:master Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants