Skip to content
This repository has been archived by the owner on Jan 24, 2024. It is now read-only.

TfLite Survey

qingqing01 edited this page Nov 30, 2017 · 28 revisions

Architecture

  • Architecture Introduction

    See the architecture graph: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite

    • Lite Converter Also called freeze Graph, it will merge the checkpoint values with the graph structure.
    • Android APP
      • Jave API
      • C++ API
      • Interpreter: The main executive engines
      • Android Neural Network API.
  • What is the relationship between TensorFlow and TfLite?

    There is no relationship between TensorFlow and TfLite. TfLite is another lightweight inference framework.

C++ API

The simple usage is as follows:

// 1. Load Model
tflite::FlatBufferModel model(path_to_model);

// 2. Init and Build Interpreter
tflite::ops::builtin::BuiltinOpResolver resolver;
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(*model, resolver)(&interpreter);

// 3. Resize input tensors, if desired.
// Allocate Tensors and fill `input`.
interpreter->AllocateTensors();
float* input = interpreter->typed_input_tensor<float>(0);

// 4. Inference
interpreter->Invoke();

// 5. Read the output
float* output = interpreter->type_output_tensor<float>(0);

Operator Pruning and BuiltinOpResolver

Interpreter && InterpreterBuilder

Optimization

How to Integrate Android Neural Networks API

The Android Neural Networks API (NNAPI) is an Android C API designed for running computationally intensive operators for machine learning on mobile devices. Tensorflow Lite is designed to use the NNAPI to perform hardware-accelerated inference operators on supported devices.

For the details about NNAPI, you can refer to Android NN survey. The way how to integrate NNAPI in TfLite is described as following.

  • In C++ API, the TfLite will init and build Interpreter. This process will detect whether the NNAPI existed or not.
// 2. Init and Build Interpreter
tflite::ops::builtin::BuiltinOpResolver resolver;
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(*model, resolver)(&interpreter);
Interpreter::Interpreter(ErrorReporter* error_reporter)
    : arena_(kDefaultArenaAlignment),
      persistent_arena_(kDefaultArenaAlignment),
      error_reporter_(error_reporter ? error_reporter
                                     : DefaultErrorReporter()) {
  context_.impl_ = static_cast<void*>(this);
  context_.ResizeTensor = ResizeTensor;
  context_.ReportError = ReportError;
  context_.AddTensors = AddTensors;
  context_.tensors = nullptr;
  context_.tensors_size = 0;
  context_.gemm_context = nullptr;
  // Reserve some space for the tensors to avoid excessive resizing.
  tensors_.reserve(kSlotsToReserve);
  nodes_and_registration_.reserve(kSlotsToReserve);
  next_allocate_node_id_ = 0;
  UseNNAPI(false);
}
TfLiteStatus NNAPIDelegate::Invoke(Interpreter* interpreter) {
  if (!nn_model_) {
    // Adds the operations and their parameters to the NN API model.
    TF_LITE_ENSURE_STATUS(BuildGraph(interpreter));
  }

  ANeuralNetworksExecution* execution = nullptr;
  CHECK_NN(ANeuralNetworksExecution_create(nn_compiled_model_, &execution));

  // Currently perform deep copy of input buffer
  for (size_t i = 0; i < interpreter->inputs().size(); i++) {
    int input = interpreter->inputs()[i];
    // TODO(aselle): Is this what we want or do we want input instead?
    // TODO(aselle): This should be called setInputValue maybe to be cons.
    TfLiteTensor* tensor = interpreter->tensor(input);
    CHECK_NN(ANeuralNetworksExecution_setInput(
        execution, i, nullptr, tensor->data.raw, tensor->bytes));
  }
  // Tell nn api where to place final data.
  for (size_t i = 0; i < interpreter->outputs().size(); i++) {
    int output = interpreter->outputs()[i];
    TfLiteTensor* tensor = interpreter->tensor(output);
    CHECK_NN(ANeuralNetworksExecution_setOutput(
        execution, i, nullptr, tensor->data.raw, tensor->bytes));
  }
  // Currently use blocking compute.
  ANeuralNetworksEvent* event = nullptr;
  CHECK_NN(ANeuralNetworksExecution_startCompute(execution, &event));
  CHECK_NN(ANeuralNetworksEvent_wait(event));
  ANeuralNetworksEvent_free(event);
  ANeuralNetworksExecution_free(execution);

  return kTfLiteOk;
}