Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updagrade the version of mxnet to 1.4.* for mxnetcontainerdockerfile. #675

Merged
merged 11 commits into from
May 14, 2019

Conversation

rkooo567
Copy link
Collaborator

@rkooo567 rkooo567 commented Apr 23, 2019

This PR handles an issue #674. Because of internal changes of mxnet, cloudpickle is not able to deserialize a function with mxnet. I resolved this issue by updating the mxnet version to 1.4.*.

Test summary

https://github.com/rkooo567/clipper/tree/mxnet-version-upgrade

  1. Created a custom dockerfile with new mxnet version
  2. When deploying using mxmodel_deployer, I pulled a custom dockerfile from my dockerhub.
  3. Run deploy_mxnet_models.py from integration_test

I verified it worked for mxnet==1.4.0 and mxnet==1.1.*

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1907/
Test FAILed.

@rkooo567
Copy link
Collaborator Author

rkooo567 commented Apr 23, 2019

mxnet tests that are passed locally (in both my Mac and VM) fails on CI. I will add more logging logic to the tests and try to find reasons.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1908/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1909/
Test FAILed.

@rkooo567 rkooo567 closed this Apr 23, 2019
@rkooo567 rkooo567 reopened this Apr 23, 2019
@simon-mo

This comment has been minimized.

@rkooo567
Copy link
Collaborator Author

rkooo567 commented Apr 27, 2019

I pulled an image and ran it. This is the log I found ()

Not sure why CLIPPER_MODEL_NAME is not set. It is weird because it should've been set when add_replicas is called. I could not reproduce it locally.

-> nvm. I had to run the image using clipper, not just run the container using docker command (because env variables are set when we deploy the model). Also I could not reproduce locally in my mac environment, I will check it out again within vm.

I will try to investigate it soon

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1911/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1913/
Test FAILed.

@rkooo567
Copy link
Collaborator Author

rkooo567 commented Apr 28, 2019

Interesting.. It fails at integration3_docker_metric and passed mxnet_test. maybe it is a jenkins issue now? Also, idk why mxnet test all of sudden succeed..

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1914/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1915/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1919/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1920/
Test FAILed.

@simon-mo
Copy link
Contributor

@rkooo567 can you resolve the merge conflict?

@rkooo567
Copy link
Collaborator Author

@simon-mo Yep. Since we merged fluentd PR, now we can easily see broken models' logs. The conflicted part is test_utils, so instead of resolving conflicts, I will probably close this PR and recreate the one that I use fluentd to debug why mxnet tests are broken when we upgrade mxnet's version.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1940/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1957/
Test FAILed.

@simon-mo
Copy link
Contributor

Jenkins test this please

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1959/
Test PASSed.

@simon-mo simon-mo merged commit 7d1d769 into ucbrise:develop May 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants