Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add design doc of inference API for fluid. #7315

Closed
wants to merge 12 commits into from
Closed
105 changes: 105 additions & 0 deletions doc/design/inference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Design Doc: Inferencer

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inferencer => Let's decide on a new term for this, maybe Inference Engine ?
I am replacing this with Inference Engine in my review right now, but if people decide on something else, we can replace it later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for Inference Engine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about it and will improve it in following commit.


In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "nueral" => neural

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fluid => Fluid

This mistake appears in many places in this document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be aware that Fluid doesn't represent a network at all. The protobuf message represents the program, not the network.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no network in Fluid. However, neural network seems a phrase in deep learning. I'll think about a better expression.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python => Python

This mistake appears in many other places in the document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Given a `ProgramDesc`, it can be run on any execution environment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on => inside

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime => runtime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

In fluid, we call the execution environment `Runtime`, which includes `Place`, `Scope` and `Executor`.

## Representation of the Inference Network
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, here it is the inference program, not the inference network.


In python, an inference network is defined as:

```python
image = fluid.layers.data(name='x', shape=[784], dtype='float32')
predict = fluid.layers.fc(input=image,
size=10,
act='softmax')
```

After training for serval passes, the parameters can be saved use `fluid.io.save_inference_model`, which will save the binary proto string of the network at the same time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type "serval" => several
"use" => using the method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass => epoch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use passive voice (被动语态), which is highly suppressed in English writing, unless it is really necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do in following commits.

```python
fluid.io.save_inference_model(
"./inference_model/", ["x"], [predict],
exe)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do in following commits.

```

The saved model contains everything of the inference network, including all operators and variables. Thus, the `inference_program` should be initilized by the model file or a pre-loaded buffer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"everything of the" => everything required by the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


Given a `inference_program`, it is easy to derive a `load_program` which is composed of `load_op` and is responsible for initializing all the parameter variables in `inference_program`. `load_program` will be executed once and `inference_program` will be executed as many times as you need.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a" => an
"as many times as you need." => as many times as the user needs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"will" => can

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


To summerize, a inferencer should:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "summerize" => summarize
"inferencer" => inference engine
ProgramDesc => ProgramDescs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, except the inference engine which will be improved in following commits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a inferencer -> an inference module

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- be initialized from files or from buffers
- be composed of two ProgramDesc, namely the `inference_program` and `load_program`

All the initialization is designed to be done in constructor.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"All the initialization is designed to be done in constructor." => All the initialization should be done in the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


## Support of Switching Runtime

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of => for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


In fluid, the execution environment is composed of three key concepts: `Place`, `Scope` and `Executor`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some links to Place, Scope and Executor, if users want to know more details of these three key concepts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. For Scope and Executor, I add the link to the design doc. For Place, there is no design doc so I add the link to the C++ header file.


There are two types of Place in current framework, `CPUPlace` for CPU and `CUDAPlace` for CUDA GPU. `Scope` is independent to `Place`. Given the place, you need to define a `Executor`, and run the `Executor` among the `Scope`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Place => Place
in => in the
independent to => independent of
a Executor => an Executor
among the => in the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a Executor -> an Executor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


In Inferencer, the `Runtime` is declared as follows:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inferencer => Inference Engine


```c++
class Runtime {
platform::Place* place;
framework::Scope* scope;
framework::Executor* executor;
};
```

With the definition of `Runtime`, the `Inferencer` will has following features:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inferencer => Inference Engine
Let's use `` around this term consistently. I am open to both : using it or not using it. Whichever we decide , we should be consistent.

will has => will have the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- **Switch runtime**. Different `Runtime` can have either different of the same type of `Place`, with different `Scope` and `Executor`. An `Inferencer` can run on different `Runtime` at the same time independently.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase: Switch runtime. Different Runtime can have different attributes of the same type of Place, with different Scope and Executor. An Inference Engine can run on different Runtimes at the same time independently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- **Share parameters among different networks**. Users can run different `Inferencer`, which means different network, on the same `Runtime`, parameters with the same name will be shared.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inferencer => Inference Engine
which means different network => which represents different networks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- **Share parameters among different threads**. Multi-threads can be launched to run an `Inferencer` in parallel on the same `Runtime`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's decide on a standard term for Inferencer and be consistent with it throughout


## Overview of the Inference API

A simple design, users can use the core data structure, `Tensor` and `LoDTensor`, to feed input data and fetch output data.
An `Inferencer` should enable the following members and public interfaces:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enable => support

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- Members:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pointer of the => pointer to the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- the pointer of the `inference_program`
- the pointer of the `load_program`
- vectors of string to record the `feed_var_names` and `fetch_var_names`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to record the => to store the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- the pointer of current `Runtime`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- Important interfaces:
- constructor, to initialize the `inference_program` and `load_program`. Once initialized, they cannot be changed.
- `Run`, to run the inference based on the current runtime.
- `SetRuntime`, to set the current runtime. When the runtime is set, the `load_program` will be run once to load parameters from files.
- Utility interfaces:
- `GetFeed/FetchVarNames`, to help users to debug.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to help users to debug => to help users debug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- `GetFeed/FetchVarShape`, to help users to verify the size of input and output data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to help users to verify => to help users verify

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


```c++
class Inferencer {
public:
// Initialize from file
Inferencer(const std::string& filename);
// Initialize from buffer
Inferencer(const char* buffer, const size_t num_bytes);

void SetRuntime(Runtime* runtime);

void Run(const std::vector<framework::Tensor>& feeds,
std::vector<framework::Tensor>& fetchs);

// utility inferfaces
std::vector<std::string>& GetFeedVarNames() const;
std::vector<std::string>& GetFecthVarNames() const;
std::vector<int64_t> GetFeedVarShape(const size_t index);
std::vector<int64_t> GetFetchVarShape(const size_t index);

private:
framework::ProgramDesc* inference_program_;
framework::ProgramDesc* load_program_;
std::vector<std::string> feed_var_names_;
std::vector<std::string> fetch_var_names_;

Runtime* runtime;
};
```

### Issues

- Normally, all fetching variables' names should be written in the ProgramDesc and read from file. If users want to add some extra fetching variables for debug, or for some other use, they need to regenerate the file again. Do we need to allow user to append extra fetching variables?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for debug => for debugging purposes
extra fetching => extra fetch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- How to support multi-devices?