-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model container testing function to clipper admin (Vanilla python) #394
Conversation
…elop Merging after pulling
Can one of the admins verify this patch? |
jenkins ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you delete all the extra Python files that got committed?
This implementation looks good! The only thing missing now is an integration test. Add a test to [clipper_admin_tests.py
] that deploys a predict function (using deploy python closure) that sums the elements in an input and check to make sure that your newly added test function returns the same result as the deployed model.
Parameters | ||
---------- | ||
query: JSON or list of dicts | ||
Input that the user sends to the query frontend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Inputs to test the prediction function on."
query: JSON or list of dicts | ||
Input that the user sends to the query frontend | ||
func: function | ||
Predict function that the user's model is using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Predict function to test."
if input_type == "strings": | ||
for x in flattened_data: | ||
if type(x) != str: | ||
return "Invalid input type" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add checks for the other two accepted input types: (floats and bytes)
try: | ||
assert reloaded_func | ||
except AssertionError as e: | ||
return 'Function does not properly serialize and reload' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's log this error as well logger.error("Function ...")
except AssertionError as e: | ||
return 'Function does not properly serialize and reload' | ||
|
||
numpy_data = list(np.array(x) for x in query_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only tricky thing here is to ensure that the numpy array as the right dtype. The mapping is defined here.
Test FAILed. |
jenkins add to whitelist |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Jenkins is failing because of the format checker. Run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just a few minor changes.
@@ -4,6 +4,8 @@ | |||
import os | |||
import posixpath | |||
import shutil | |||
import pickle | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert the changes to this file
Predict function to test. | ||
input_type: str | ||
The input_type to be associated with the registered app and deployed model. | ||
One of "integers", "floats", "doubles", "bytes", or "strings". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an example in the method comment?
query_data = list(x for x in list(query.values())) | ||
|
||
if type(query_data[0][0]) == list: | ||
query_data = query_data[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are you checking for here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm checking the nesting of the query data, whether it is a list or a list of lists.
The input_type to be associated with the registered app and deployed model. | ||
One of "integers", "floats", "doubles", "bytes", or "strings". | ||
""" | ||
query_data = list(x for x in list(query.values())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be checking the JSON key as well, to ensure that their input is properly formatted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that the query is in a JSON/dict structure, I'm getting the values of each key. I'm then checking to make sure the values are of the right input_type before then converting it into the respective numpy array of right dtype. What formatting of the key input needs to be checked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user wants to provide a single input, the key should be "input"
. If they want to provide a list of inputs, the key should be "input_batch"
. See the website for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks for the clarification.
headers = {"Content-type": "application/json"} | ||
test_input = list(np.random.random(10)) | ||
pred = requests.post("http://localhost:1337/hello-world/predict", headers=headers, data=json.dumps({"input": test_input})).json() | ||
test_predict_result = test_predict_function(self.clipper_conn, query={"input": test_input}, func=predict_func, input_type="doubles") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should call the function like this self.clipper_conn.test_predict_function(query=...)
, not by passing in self.clipper_conn
as an argument.
Can you add a second test prediction that uses the batch_predict
interface? e.g.
batch_input = [list(np.random.random(10)) for _ in range(4)]
batch_pred = requests.post("http://localhost:1337/hello-world/predict", headers=headers, data=json.dumps({"input_batch": batch_input})).json()
test_batch_predict_result = self.clipper_conn.test_predict_function(query={"input_batch": test_input}, func=predict_func, input_type="doubles")
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
jenkins test this please |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
jenkins test this please |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Closes #360
Added a function test_predict_function in clipper_admin/clipper_admin/clipper_admin.py that verifies input type and function serialization/reloading for vanilla python models. Will resolve PyTorch and TensorFlow after this has been approved.