Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow inference #4

Closed
rubysolo opened this issue Sep 13, 2017 · 11 comments
Closed

Slow inference #4

rubysolo opened this issue Sep 13, 2017 · 11 comments

Comments

@rubysolo
Copy link

First, thanks so much for working on this library! I'm using it for serving a couple of models with good success.

I ran into something that surprised me today; it's not necessarily an issue with this library, but wanted to get your thoughts. I'm comparing execution of an SSD detector via a python script and a go executable. Just comparing the elapsed wall clock time with unix time command, the go executable is quite a bit faster -- around 4 seconds vs around 13 seconds for the python script. However, I wanted finer-grained timing data, so I measured just the model.Exec call in go vs the model.predict call in python, and I found that the go version is roughly twice as slow (3 sec vs 1.7 sec). My guess is that for a single run, parsing all the python code accounts for all of the extra time.

Both of these are using the same model, although the python version is defining the model in code and loading weights from an hdf5 file, while the go version is loading model + weights from the SavedModel file (.pb format) -- not sure if that would make any difference.

Do you have any ideas about why the graph execution would be slower under go, or how I could speed it up?

Thanks!

@galeone
Copy link
Owner

galeone commented Sep 14, 2017

Hi! I'm happy that someone else is using tfgo, I really appreciate it (also, if you have any feedback or you wanna contribute you're welcome)!

Could you please show your Python and Go code? In this way, we can compare both and maybe together find out from where the slow down comes from.

The first thing that comes to my mind is that the slowdown could come from the session instantiation, but it's only a feeling. Post your codes and let's dig through them

@rubysolo
Copy link
Author

Awesome, thanks for digging into this!

Here's the python version:

from models import SSD300
from models.utils import BBoxUtility
import models.weights

import cv2
import time
import tensorflow as tf
import numpy as np
from keras.applications.imagenet_utils import preprocess_input

weights, classes = models.weights.ssd300()
bbox_util = BBoxUtility(len(classes))

model = SSD300((300, 300, 3), num_classes=len(classes))
model.load_weights(weights, by_name=True)
model._make_predict_function()
graph = tf.get_default_graph()

im = cv2.imread("image.jpg")
if len(im.shape) == 2:
    im = cv2.cvtColor(im, cv2.COLOR_GRAY2BGR)

start = time.time()
I = cv2.resize(im, (300, 300))
after_resize = time.time()
with graph.as_default():
    after_default_graph = time.time()
    inputs = preprocess_input(np.array([I.astype('float')]))
    after_preprocess = time.time()
    preds = model.predict(inputs, batch_size=1, verbose=0)
    after_pred = time.time()
    results = bbox_util.detection_out(preds)
    after_decode = time.time()
    print("resize: %f sec" % (after_resize - start))
    print("set graph: %f sec" % (after_default_graph - after_resize))
    print("preprocess: %f sec" % (after_preprocess - after_default_graph))
    print("inference: %f sec" % (after_pred - after_preprocess))
    print("box decode: %f sec" % (after_decode - after_pred))

The go version is in two parts -- I build a Detector which loads the model, and then I have a method on that object to do inference:

// build detector
func NewDetector(classes []string) *Detector {
	model := tg.LoadModel("saved_model_ssd300", []string{"serve"}, nil)
	d := &Detector{model: model, classes: classes}
	return d
}

// process an image
func (detector *Detector) Detect(img image.Image, verbose bool) Predictions {
	tick := time.Now()

	bounds := img.Bounds().Size()
	resized := resizeImage(img)

	if verbose {
		log.Printf("resize: %s", time.Since(tick))
		tick = time.Now()
	}

	input, _ := tf.NewTensor(imageToArray(resized))

	if verbose {
		log.Printf("preprocess: %s", time.Since(tick))
		tick = time.Now()
	}

	results := detector.model.Exec([]tf.Output{
		detector.model.Op("predictions/concat", 0),
	}, map[tf.Output]*tf.Tensor{
		detector.model.Op("input_1", 0): input,
	})

	if verbose {
		log.Printf("inference: %s", time.Since(tick))
		tick = time.Now()
	}

	raw := results[0].Value().([][][]float32)

	preds := map[string]PredList{}
	for idx, cls := range detector.classes {
		if idx > 0 { // skip background
			preds[cls] = PredList{}
		}
	}

	for _, pred := range decodeBoxes(raw[0], scale(bounds)) {
		cls := detector.classes[pred.Label]
		preds[cls] = append(preds[cls], pred)
	}

	for idx, cls := range detector.classes {
		if idx > 0 { // skip background
			sort.Sort(preds[cls])
		}
	}

	if verbose {
		log.Printf("box decode: %s", time.Since(tick))
		tick = time.Now()
	}

	return preds
}

There are several timing differences that I understand (image preprocessing is much faster in Python since it is vectorized; my go box decoder "cheats" by using heuristics instead of true NMS to collapse box predictions; etc), but I am ignoring all of those. If I just look at inference time, the python version averages 1.7 sec over 10 consecutive runs, while the go version averages 2.5 sec. (running on my CPU) Since that part of the code is executed by tensorflow in both cases, I would expect the timing to be similar. Hopefully I'm just doing something wrong!

@galeone
Copy link
Owner

galeone commented Sep 14, 2017

The inference time should be similar, you're right.

The only thing that I see is that you're measuring the inference time actually before executing the real inference.

You're invoking *Model.Op to extract tensors by name, here:

	results := detector.model.Exec([]tf.Output{
		detector.model.Op("predictions/concat", 0),
	}, map[tf.Output]*tf.Tensor{
		detector.model.Op("input_1", 0): input,
	})

Try moving out the 2 .Op invocations and then measure the performance, just like that:

	input, _ := tf.NewTensor(imageToArray(resized))
        modelOutput := detector.model.Op("predictions/concat", 0)
        modelInput := detector.model.Op("input_1", 0)

	if verbose {
		log.Printf("preprocess: %s", time.Since(tick))
		tick = time.Now()
	}

	results := detector.model.Exec([]tf.Output{
		modelOutput,
	}, map[tf.Output]*tf.Tensor{
		modelInput: input,
	})

	if verbose {
		log.Printf("inference: %s", time.Since(tick))
		tick = time.Now()
	}

Let me know if this will make the inference time similar

@rubysolo
Copy link
Author

I changed it to:

	modelOutput := detector.model.Op("predictions/concat", 0)
	modelInput := detector.model.Op("input_1", 0)

	if verbose {
		log.Printf("load ops: %s", time.Since(tick))
		tick = time.Now()
	}

	results := detector.model.Exec([]tf.Output{
		modelOutput,
	}, map[tf.Output]*tf.Tensor{
		modelInput: input,
	})

	if verbose {
		log.Printf("inference: %s", time.Since(tick))
		tick = time.Now()
	}

The inference time looks similar, and over 10 runs, my "load ops" times were all between 20 and 35 µs.

@galeone
Copy link
Owner

galeone commented Sep 14, 2017

Nice! We do have already found out the problem? :D

@rubysolo
Copy link
Author

Sorry, that was poor wording. The inference time did not change much from what it was before -- between 2.4 and 2.9 sec over 10 runs.

@galeone
Copy link
Owner

galeone commented Sep 14, 2017

Ah! I guessed that the go inference time looked similar to the python inference time. My bad.

We have to dig more into the code. Can you please share the code of the predict method used in preds = model.predict(inputs, batch_size=1, verbose=0) ?

We have to spot the differences between that code and the code of model.Exec

@rubysolo
Copy link
Author

Sure, that's just the standard keras predict: https://github.com/fchollet/keras/blob/2.0.5/keras/engine/training.py#L1553

@galeone
Copy link
Owner

galeone commented Sep 15, 2017

Uhm, I'm trying to figure out what the problem is but I can't see it.
The python code is just a standard python code, nothing special, the Go code is just this model.saved.Session.Run(feedDict, tensors, nil) that only uses operations from the github.com/tensorflow/tensorflow/tensorflow/go package.

Maybe we should open an issue on https://github.com/tensorflow/tensorflow/issues referencing this one? I guess the inference time difference is due to something they do and not related to tfgo.

@rubysolo
Copy link
Author

Ok, no worries -- thank you for being another set of eyes! I'll poke around a bit more and open an issue there if I can't find anything.

@galeone
Copy link
Owner

galeone commented Sep 15, 2017

Keep me up to date. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants