Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add design doc of inference API for fluid. #7315

Closed
wants to merge 12 commits into from
Closed
178 changes: 178 additions & 0 deletions doc/design/inference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# Design Doc: InferenceEngine
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InferenceEngine => Inference Engine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


The main goal of inference API is easy to use.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main goal of inference API is easy to use. => The main goal of an inference API is to make it easy to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

In Fluid, a neural network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the Python wrapper of which is [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/framework.py).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protobuf message => protobuf message called
the Python wrapper of which is => the Python wrapper for which is a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Given a [inference program](#inference-program), it can run inside any execution environment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a => an
it can run inside => it can be executed inside

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

In Fluid, we call the execution environment runtime, which includes [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h), [Scope](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) and [Executor](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/executor.md).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

environment runtime => environment a runtime
which includes => which includes a
[Scope] => a [Scope]
[Executor] => an [Executor]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


## Inference Program

A simple inference program may be defined in Python API as:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be => can be
as => as the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


```python
image = fluid.layers.data(name='x', shape=[784], dtype='float32')
predict = fluid.layers.fc(input=image,
size=10,
act='softmax')
```

After training for several epochs, the parameters can be saved using the method [fluid.io.save_inference_model](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/io.py), which will save the binary proto string of the program at the same time.

```python
fluid.io.save_inference_model(
"./inference_model/", ["x"], [predict],
exe)
```

Like training, there is a `main_program` and a `startup_program` for inference.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like training => Similar to training
inference => inference as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- The `main_program` defines the computational operators and all variables and can be evaluated as many times as users want. The protobuf message of it is saved by `fluid.io.save_inference_model`. Thus, it can be initilized from file or a pre-loaded buffer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as users => as the users
and all variables and can be => and all variables, and can be
message of it is => message of the main_program is
saved by fluid.io.save_inference_model. => saved using fluid.io.save_inference_model method.
initilized from file or a pre-loaded buffer => initialized from a file or from a pre-loaded buffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- The `startup_program` program is responsible for initializing all parameters. Since all the parameters are saved to files, the `startup_program` is composed of `load_op`s and needs to be evaluated for a specified executor only one time. There is no need to save the protobuf message of the `startup_program` because it can be easily derived from the `main_program`. The `startup_program`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all parameters => all the parameters
only one time => only once.
The startup_program. => remove ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


### Introduce of ProgramBuilder

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce => Introduction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


We introduce the concept of `ProgramBuilder`, which will collect all the metadata of inference program and support transform and optimization of the inference program.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concept of => concept of a
metadata of inference => metadata of an inference
transform => transformation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


```cpp
class ProgramBuilder {
public:
// Initialize from file
ProgramBuilder(const std::string& filename);
// Initialize from buffer
ProgramBuilder(const char* buffer, const size_t num_bytes);

// Some utility interface maybe required by users
std::vector<std::string>& GetFeedVarNames() const;
std::vector<std::string>& GetFecthVarNames() const;
std::vector<int64_t> GetFeedVarShape(const size_t index);
std::vector<int64_t> GetFetchVarShape(const size_t index);

void AppendFetchVariables(const std::string& var_name);
...

// Do transform to the inference program

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do transform => Perform transformation of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ProgramBuilder* operator()(/* some optimizing strategy */);
ProgramBuilder* operator()(const std::vector<std::string>& feed_var_names,
const std::vector<std::string>& fetch_var_names,
/* some optimizing strategy */);

private:
framework::ProgramDesc* main_program_;
framework::ProgramDesc* startup_program_;
std::vector<std::string> feed_var_names_;
std::vector<std::string> fetch_var_names_;
};
```

In the first design, `ProgramBuilder` contains all the elements memtioned above, and is instanced by protobuf message of the `main_program`. Other members `startup_program`, `feed_var_names` and `fetch_var_names` will also be derived in the constructor.
Copy link
Contributor

@sidgoyal78 sidgoyal78 Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this protobuf of main_program (which will be used to instantiate the ProgramBuilder) have feed,fetch ops added to the original program-desc?

Copy link
Contributor Author

@Xreki Xreki Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main_program should have feed_ops and fetch_ops, or we'll need to clone the main_program and insert feed_ops and fetch_ops to the copy in Run(), like in the Python implementation. I think it is redundant.

However, how the feed_ops and fetch_ops come depends on the storing format. It may be inserted in the c++ code or may be initialized from protobuf message file.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memtioned => mentioned
instanced => instantiated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


There are two advantages of introducing an independent concept `ProgramBuilder`:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concept => concept of a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- It is easy to add utility interfaces to support other requirements.
For example,
- `GetFeed/FetchVarNames`. It can be used to help users verify how many inputs and outputs there need and what the names are.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inputs and outputs there need and what the names are => inputs and outputs are required and what the names of those are

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- `GetFeed/FetchVarShape`. It can be used to help users verify the size of each input and output.
- `AppendFetchVariables`. Normally, all fetching variables' names should be included in the protobuf message of the `main_program`. However, sometimes users may want to fetch extra variables for other use or debugging purposes, they can use this interface directly and have no need to regenerate the protobuf message again. Note that `main_program` may be modified in this interface.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all fetching variables' names => the names of all the variables to be fetched
have no need => there would be no need

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- It is possible to support online optimization of the inference program.
We will design an inference transpiler to do offline optimization for inference, which produce an optimized inference `ProgramDesc` for a given `ProgramDesc`. However, some optimization can be done online, such as

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which produce => which will result into
Since the things mentioned in the list are just specific examples, lets re-phrase:
such as => for example:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- changing the layout from `NCHW` to `NHWC`
- merging the computation of batch normalization layer to the front fc layer or conv layer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xreki : Can you explain this merging of computation for batch norm (may not be needed for this doc)?

Copy link
Contributor Author

@Xreki Xreki Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I just list something we may do in the future.
About merging the computation of batch norm layer, you can find some detail here and here. After merging the batch norm layer, mobilenet can get a 30% speedup, without loss of precision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice. Thank you.


`ProgramBuilder` overrides the `()` operator to support this kind of optimization, in which both `main_program` and `startup_program` may be modified. Thus, users may specify some optimizing stategy and will get a new instance of `ProgramBuilder`.

## Execution Runtime

There are three key concepts in Fluid: `Place`, `Scope` and `Executor`.
- `Place` is used to decide which device the program will run on. There are two types of `Place` in the current framework, `CPUPlace` for CPU and `CUDAPlace` for CUDA GPU.
- `Scope` in Fluid likes the concept in programming languages. It is an association of a name to variable. Global variables in the same `Scope` should have different names. However, there is no restrictions on names of variables in different local scopes. Users have to specify a `Scope` to run a program.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likes the concept => is similar to the concept of a Scope

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- `Executor` can be constructed by a user specified place, and provides a unified way to execute a `ProgramDesc` in a `Scope`.

All the three concepts compose the execution environment, that is `Runtime` for inference.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the three => All three
that is Runtime => that is a Runtime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


```c++
class Runtime {
public:
Runtime(/* CPU or GPU */);

private:
platform::Place* place;
framework::Scope* scope;
framework::Executor* executor;
};
```

1. A program can run on different `Runtime`s.
Users can define a runtime for CPU and another runtime for CUDA GPU, and the inference program can run on the two runtimes at the same time. Or users can define two runtimes for CUDA GPU to run the inference program on different GPU devices.
1. It is possible to share parameters among different programs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. => 2.
    among => amongst

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is OK to use 1. for all list items, and Markdown will translate them to 1. 2. 3. ... automatically.
Else done.

Different program can run on the same `Runtime`, so that parameters with the same name will be shared.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

program = > programs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

1. Program running on different threads can share parameters.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. => 3.
    Program => Programs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is OK to use 1. for all list items, and Markdown will translate them to 1. 2. 3. ... automatically.
Else done.

Multi-threads can be launched to run an inference program in parallel on the same `Runtime`.

## Inference Engine

### Why need an Inference Engine?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need => Why do we need

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


With `ProgramBuilder` and `Runtime`, user can write codes to do inference.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With ProgramBuilder and Runtime, user can write codes to do inference. => Using a ProgramBuilder and a Runtime, users can write code to perform inference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Apart from the concepts introduced specially for inference in this design doc, users need handle the details of feed and fetch data, by calling `framework::SetFeedVariable` and `framework::GetFetchVariable`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need handle => need to handle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

In addition, users need to run the `startup_program` manually to load parameters for each runtime.
A simple example is listed as following.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

following. => follows:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


```cpp
ProgramBuilder builder("mnist.paddle");
Runtime runtime("CPU");

// Run the startup_program once to load parameters for the specified runtime

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameters => all the parameters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

runtime.Executor()->Run(builder.StartupProgram(), runtime.Scope(), 0, true, true);

// Run the main_program many times

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many => multiple

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for (...) {
framework::LoDTensor input;
framework::SetFeedVariable(runtime.Scope(), input, ...);
runtime.Executor()->Run(builder.MainProgram(), runtime.Scope(), 0, true, true);
framework::LoDTensor output;
framework::GetFetchVariable(runtime.Scope(), output, ...);
}
```

To simplify the interfaces, we design a new structure, `InferenceEngine`.

### Design of Inference Engine

1. An `InferenceEngine` can be constructed by a `ProgramBuilder`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constructed by => constructed using

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

1. An `InferenceEngine` also holds pointer to the current `Runtime`. Users can call `SetRuntime()` to set the current runtime, and the `startup_program` will be run once to initialize parameters for this runtime.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the sequence numbers here ? Replace all 1. to 2. , 3. 4. etc
holds pointer => holds a pointer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is OK to use 1. for all list items, and Markdown will translate them to 1. 2. 3. ... automatically.
Else done.

1. After setting the current runtime, users can call `Run()` to run the inference program as many times as they required.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

required => require

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

1. Data structure, [framework::Tensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.md) and [framework::LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md), are used in user codes to feed input data and fetch output data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user codes => user implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


```c++
class InferenceEngine {
public:
InferenceEngine(const ProgramBuilder& builder);

void SetRuntime(Runtime* runtime);

void Run(const std::vector<framework::Tensor>& feeds,
std::vector<framework::Tensor>& fetchs);

private:
ProgramBuilder builder;
Runtime* runtime;
};
```

### Example

Here is the simplest example to use `InferenceEngine` to build a inference program directly from file and run on a single CPU.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a inference => an inference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


```cpp
ProgramBuilder builder("mnist.paddle");
Runtime runtime("CPU");

InferenceEngine engine(builder);
// Set the runtime, in which the startup_program will be ran to initialize parameters for the runtime

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ran => run

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

engine.SetRuntime(&runtime);

// Run the main_program many times

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many => multiple

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for (...) {
framework::LoDTensor input;
framework::LoDTensor output;
engine.Run({input}, {output});
}
```