-
-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow inference #4
Comments
Hi! I'm happy that someone else is using tfgo, I really appreciate it (also, if you have any feedback or you wanna contribute you're welcome)! Could you please show your Python and Go code? In this way, we can compare both and maybe together find out from where the slow down comes from. The first thing that comes to my mind is that the slowdown could come from the session instantiation, but it's only a feeling. Post your codes and let's dig through them |
Awesome, thanks for digging into this! Here's the python version: from models import SSD300
from models.utils import BBoxUtility
import models.weights
import cv2
import time
import tensorflow as tf
import numpy as np
from keras.applications.imagenet_utils import preprocess_input
weights, classes = models.weights.ssd300()
bbox_util = BBoxUtility(len(classes))
model = SSD300((300, 300, 3), num_classes=len(classes))
model.load_weights(weights, by_name=True)
model._make_predict_function()
graph = tf.get_default_graph()
im = cv2.imread("image.jpg")
if len(im.shape) == 2:
im = cv2.cvtColor(im, cv2.COLOR_GRAY2BGR)
start = time.time()
I = cv2.resize(im, (300, 300))
after_resize = time.time()
with graph.as_default():
after_default_graph = time.time()
inputs = preprocess_input(np.array([I.astype('float')]))
after_preprocess = time.time()
preds = model.predict(inputs, batch_size=1, verbose=0)
after_pred = time.time()
results = bbox_util.detection_out(preds)
after_decode = time.time()
print("resize: %f sec" % (after_resize - start))
print("set graph: %f sec" % (after_default_graph - after_resize))
print("preprocess: %f sec" % (after_preprocess - after_default_graph))
print("inference: %f sec" % (after_pred - after_preprocess))
print("box decode: %f sec" % (after_decode - after_pred)) The go version is in two parts -- I build a Detector which loads the model, and then I have a method on that object to do inference: // build detector
func NewDetector(classes []string) *Detector {
model := tg.LoadModel("saved_model_ssd300", []string{"serve"}, nil)
d := &Detector{model: model, classes: classes}
return d
}
// process an image
func (detector *Detector) Detect(img image.Image, verbose bool) Predictions {
tick := time.Now()
bounds := img.Bounds().Size()
resized := resizeImage(img)
if verbose {
log.Printf("resize: %s", time.Since(tick))
tick = time.Now()
}
input, _ := tf.NewTensor(imageToArray(resized))
if verbose {
log.Printf("preprocess: %s", time.Since(tick))
tick = time.Now()
}
results := detector.model.Exec([]tf.Output{
detector.model.Op("predictions/concat", 0),
}, map[tf.Output]*tf.Tensor{
detector.model.Op("input_1", 0): input,
})
if verbose {
log.Printf("inference: %s", time.Since(tick))
tick = time.Now()
}
raw := results[0].Value().([][][]float32)
preds := map[string]PredList{}
for idx, cls := range detector.classes {
if idx > 0 { // skip background
preds[cls] = PredList{}
}
}
for _, pred := range decodeBoxes(raw[0], scale(bounds)) {
cls := detector.classes[pred.Label]
preds[cls] = append(preds[cls], pred)
}
for idx, cls := range detector.classes {
if idx > 0 { // skip background
sort.Sort(preds[cls])
}
}
if verbose {
log.Printf("box decode: %s", time.Since(tick))
tick = time.Now()
}
return preds
} There are several timing differences that I understand (image preprocessing is much faster in Python since it is vectorized; my go box decoder "cheats" by using heuristics instead of true NMS to collapse box predictions; etc), but I am ignoring all of those. If I just look at inference time, the python version averages 1.7 sec over 10 consecutive runs, while the go version averages 2.5 sec. (running on my CPU) Since that part of the code is executed by tensorflow in both cases, I would expect the timing to be similar. Hopefully I'm just doing something wrong! |
The inference time should be similar, you're right. The only thing that I see is that you're measuring the inference time actually before executing the real inference. You're invoking results := detector.model.Exec([]tf.Output{
detector.model.Op("predictions/concat", 0),
}, map[tf.Output]*tf.Tensor{
detector.model.Op("input_1", 0): input,
}) Try moving out the 2 input, _ := tf.NewTensor(imageToArray(resized))
modelOutput := detector.model.Op("predictions/concat", 0)
modelInput := detector.model.Op("input_1", 0)
if verbose {
log.Printf("preprocess: %s", time.Since(tick))
tick = time.Now()
}
results := detector.model.Exec([]tf.Output{
modelOutput,
}, map[tf.Output]*tf.Tensor{
modelInput: input,
})
if verbose {
log.Printf("inference: %s", time.Since(tick))
tick = time.Now()
} Let me know if this will make the inference time similar |
I changed it to: modelOutput := detector.model.Op("predictions/concat", 0)
modelInput := detector.model.Op("input_1", 0)
if verbose {
log.Printf("load ops: %s", time.Since(tick))
tick = time.Now()
}
results := detector.model.Exec([]tf.Output{
modelOutput,
}, map[tf.Output]*tf.Tensor{
modelInput: input,
})
if verbose {
log.Printf("inference: %s", time.Since(tick))
tick = time.Now()
} The inference time looks similar, and over 10 runs, my "load ops" times were all between 20 and 35 µs. |
Nice! We do have already found out the problem? :D |
Sorry, that was poor wording. The inference time did not change much from what it was before -- between 2.4 and 2.9 sec over 10 runs. |
Ah! I guessed that the go inference time looked similar to the python inference time. My bad. We have to dig more into the code. Can you please share the code of the We have to spot the differences between that code and the code of |
Sure, that's just the standard keras |
Uhm, I'm trying to figure out what the problem is but I can't see it. Maybe we should open an issue on https://github.com/tensorflow/tensorflow/issues referencing this one? I guess the inference time difference is due to something they do and not related to tfgo. |
Ok, no worries -- thank you for being another set of eyes! I'll poke around a bit more and open an issue there if I can't find anything. |
Keep me up to date. Thank you! |
First, thanks so much for working on this library! I'm using it for serving a couple of models with good success.
I ran into something that surprised me today; it's not necessarily an issue with this library, but wanted to get your thoughts. I'm comparing execution of an SSD detector via a python script and a go executable. Just comparing the elapsed wall clock time with unix
time
command, the go executable is quite a bit faster -- around 4 seconds vs around 13 seconds for the python script. However, I wanted finer-grained timing data, so I measured just themodel.Exec
call in go vs themodel.predict
call in python, and I found that the go version is roughly twice as slow (3 sec vs 1.7 sec). My guess is that for a single run, parsing all the python code accounts for all of the extra time.Both of these are using the same model, although the python version is defining the model in code and loading weights from an hdf5 file, while the go version is loading model + weights from the SavedModel file (.pb format) -- not sure if that would make any difference.
Do you have any ideas about why the graph execution would be slower under go, or how I could speed it up?
Thanks!
The text was updated successfully, but these errors were encountered: