Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add design doc of inference API for fluid. #7315

Closed
wants to merge 12 commits into from

Conversation

Xreki
Copy link
Contributor

@Xreki Xreki commented Jan 8, 2018

Fix #7314

@Xreki Xreki added the 预测 原名Inference,包含Capi预测问题等 label Jan 8, 2018
@@ -0,0 +1,105 @@
# Design Doc: Inferencer

In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fluid => Fluid

This mistake appears in many places in this document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,105 @@
# Design Doc: Inferencer

In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be aware that Fluid doesn't represent a network at all. The protobuf message represents the program, not the network.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no network in Fluid. However, neural network seems a phrase in deep learning. I'll think about a better expression.

@@ -0,0 +1,105 @@
# Design Doc: Inferencer

In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python => Python

This mistake appears in many other places in the document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# Design Doc: Inferencer

In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.
Given a `ProgramDesc`, it can be run on any execution environment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime => runtime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Given a `ProgramDesc`, it can be run on any execution environment.
In fluid, we call the execution environment `Runtime`, which includes `Place`, `Scope` and `Executor`.

## Representation of the Inference Network
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, here it is the inference program, not the inference network.

act='softmax')
```

After training for serval passes, the parameters can be saved use `fluid.io.save_inference_model`, which will save the binary proto string of the network at the same time.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass => epoch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

act='softmax')
```

After training for serval passes, the parameters can be saved use `fluid.io.save_inference_model`, which will save the binary proto string of the network at the same time.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use passive voice (被动语态), which is highly suppressed in English writing, unless it is really necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do in following commits.


Given a `inference_program`, it is easy to derive a `load_program` which is composed of `load_op` and is responsible for initializing all the parameter variables in `inference_program`. `load_program` will be executed once and `inference_program` will be executed as many times as you need.

To summerize, a inferencer should:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a inferencer -> an inference module

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


In fluid, the execution environment is composed of three key concepts: `Place`, `Scope` and `Executor`.

There are two types of Place in current framework, `CPUPlace` for CPU and `CUDAPlace` for CUDA GPU. `Scope` is independent to `Place`. Given the place, you need to define a `Executor`, and run the `Executor` among the `Scope`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a Executor -> an Executor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

@kavyasrinet kavyasrinet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. I have added my review comments. Most of it is rephrasing and typos.
We should decide on a new term for Inferencer. I have proposed Inference Engine in my review, but I am open to other suggestions.

@@ -0,0 +1,105 @@
# Design Doc: Inferencer

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inferencer => Let's decide on a new term for this, maybe Inference Engine ?
I am replacing this with Inference Engine in my review right now, but if people decide on something else, we can replace it later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for Inference Engine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about it and will improve it in following commit.

@@ -0,0 +1,105 @@
# Design Doc: Inferencer

In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "nueral" => neural

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# Design Doc: Inferencer

In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.
Given a `ProgramDesc`, it can be run on any execution environment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on => inside

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

act='softmax')
```

After training for serval passes, the parameters can be saved use `fluid.io.save_inference_model`, which will save the binary proto string of the network at the same time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type "serval" => several
"use" => using the method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

exe)
```

The saved model contains everything of the inference network, including all operators and variables. Thus, the `inference_program` should be initilized by the model file or a pre-loaded buffer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"everything of the" => everything required by the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- Members:
- the pointer of the `inference_program`
- the pointer of the `load_program`
- vectors of string to record the `feed_var_names` and `fetch_var_names`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to record the => to store the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- the pointer of the `inference_program`
- the pointer of the `load_program`
- vectors of string to record the `feed_var_names` and `fetch_var_names`
- the pointer of current `Runtime`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- `Run`, to run the inference based on the current runtime.
- `SetRuntime`, to set the current runtime. When the runtime is set, the `load_program` will be run once to load parameters from files.
- Utility interfaces:
- `GetFeed/FetchVarNames`, to help users to debug.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to help users to debug => to help users debug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- `SetRuntime`, to set the current runtime. When the runtime is set, the `load_program` will be run once to load parameters from files.
- Utility interfaces:
- `GetFeed/FetchVarNames`, to help users to debug.
- `GetFeed/FetchVarShape`, to help users to verify the size of input and output data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to help users to verify => to help users verify

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


### Issues

- Normally, all fetching variables' names should be written in the ProgramDesc and read from file. If users want to add some extra fetching variables for debug, or for some other use, they need to regenerate the file again. Do we need to allow user to append extra fetching variables?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for debug => for debugging purposes
extra fetching => extra fetch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

abhinavarora
abhinavarora previously approved these changes Jan 8, 2018
Copy link
Contributor

@abhinavarora abhinavarora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Appart from the English corrections suggested by Kavya and Yi, the first draft of the design doc looks good and is a good starting point. Thank you for the great work.

@sidgoyal78
Copy link
Contributor

Thanks for the PR, this is helpful.

```python
fluid.io.save_inference_model(
"./inference_model/", ["x"], [predict],
exe)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do in following commits.


## Support of Switching Runtime

In fluid, the execution environment is composed of three key concepts: `Place`, `Scope` and `Executor`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some links to Place, Scope and Executor, if users want to know more details of these three key concepts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. For Scope and Executor, I add the link to the design doc. For Place, there is no design doc so I add the link to the C++ header file.

Copy link
Contributor Author

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all of your reviews. I fixed all the typos and the English corrections.
After discussed with @qingqing01 , we may introduce the concept of ProgramBuilder which will support the developing transpiler. I'll update the design doc as soon as possible.

Several issues list here to remind me:

  • There is no network in Fluid.
  • No passive voice in design doc.
  • Rename Inferencer to Inference Engine.

@@ -0,0 +1,105 @@
# Design Doc: Inferencer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about it and will improve it in following commit.

@@ -0,0 +1,105 @@
# Design Doc: Inferencer

In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,105 @@
# Design Doc: Inferencer

In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,105 @@
# Design Doc: Inferencer

In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no network in Fluid. However, neural network seems a phrase in deep learning. I'll think about a better expression.

@@ -0,0 +1,105 @@
# Design Doc: Inferencer

In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- Members:
- the pointer of the `inference_program`
- the pointer of the `load_program`
- vectors of string to record the `feed_var_names` and `fetch_var_names`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- the pointer of the `inference_program`
- the pointer of the `load_program`
- vectors of string to record the `feed_var_names` and `fetch_var_names`
- the pointer of current `Runtime`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- `Run`, to run the inference based on the current runtime.
- `SetRuntime`, to set the current runtime. When the runtime is set, the `load_program` will be run once to load parameters from files.
- Utility interfaces:
- `GetFeed/FetchVarNames`, to help users to debug.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- `SetRuntime`, to set the current runtime. When the runtime is set, the `load_program` will be run once to load parameters from files.
- Utility interfaces:
- `GetFeed/FetchVarNames`, to help users to debug.
- `GetFeed/FetchVarShape`, to help users to verify the size of input and output data.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


### Issues

- Normally, all fetching variables' names should be written in the ProgramDesc and read from file. If users want to add some extra fetching variables for debug, or for some other use, they need to regenerate the file again. Do we need to allow user to append extra fetching variables?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@sidgoyal78 sidgoyal78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for adding more details. This is helpful, I have 2 questions as of now. Will probably have more tomorrow :)

- It is possible to support online optimization of the inference program.
We will design an inference transpiler to do offline optimization for inference, which produce an optimized inference `ProgramDesc` for a given `ProgramDesc`. However, some optimization can be done online, such as
- changing the layout from `NCHW` to `NHWC`
- merging the computation of batch normalization layer to the front fc layer or conv layer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xreki : Can you explain this merging of computation for batch norm (may not be needed for this doc)?

Copy link
Contributor Author

@Xreki Xreki Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I just list something we may do in the future.
About merging the computation of batch norm layer, you can find some detail here and here. After merging the batch norm layer, mobilenet can get a 30% speedup, without loss of precision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice. Thank you.

To summarize, an inferencer module should:
- be initialized from files or from buffers
- be composed of two `ProgramDesc`s, namely the `inference_program` and `load_program`
In the first design, `ProgramBuilder` contains all the elements memtioned above, and is instanced by protobuf message of the `main_program`. Other members `startup_program`, `feed_var_names` and `fetch_var_names` will also be derived in the constructor.
Copy link
Contributor

@sidgoyal78 sidgoyal78 Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this protobuf of main_program (which will be used to instantiate the ProgramBuilder) have feed,fetch ops added to the original program-desc?

Copy link
Contributor Author

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sidgoyal78 your questions are welcome. There may be some weak point in this design doc. Please remind me. And any proposal will be appreciated and helpful for me.

- It is possible to support online optimization of the inference program.
We will design an inference transpiler to do offline optimization for inference, which produce an optimized inference `ProgramDesc` for a given `ProgramDesc`. However, some optimization can be done online, such as
- changing the layout from `NCHW` to `NHWC`
- merging the computation of batch normalization layer to the front fc layer or conv layer
Copy link
Contributor Author

@Xreki Xreki Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I just list something we may do in the future.
About merging the computation of batch norm layer, you can find some detail here and here. After merging the batch norm layer, mobilenet can get a 30% speedup, without loss of precision.

};
```

In the first design, `ProgramBuilder` contains all the elements memtioned above, and is instanced by protobuf message of the `main_program`. Other members `startup_program`, `feed_var_names` and `fetch_var_names` will also be derived in the constructor.
Copy link
Contributor Author

@Xreki Xreki Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main_program should have feed_ops and fetch_ops, or we'll need to clone the main_program and insert feed_ops and fetch_ops to the copy in Run(), like in the Python implementation. I think it is redundant.

However, how the feed_ops and fetch_ops come depends on the storing format. It may be inserted in the c++ code or may be initialized from protobuf message file.

Copy link
Collaborator

@wangkuiyi wangkuiyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me that this design separates inference from training -- I don't see the necessity of having startup and main programs for inference as there are for training.

Please make sure that we can write an online training program, which means a training program can also provide the inference serving at the same time.

@@ -0,0 +1,178 @@
# Design Doc: InferenceEngine
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InferenceEngine => Inference Engine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

@kavyasrinet kavyasrinet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for revising the design doc. I have added few comments for certain parts. I might have few design questions too, will post them in a separate review, so it is isn't too cluttered.

@@ -0,0 +1,178 @@
# Design Doc: InferenceEngine

The main goal of inference API is easy to use.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main goal of inference API is easy to use. => The main goal of an inference API is to make it easy to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# Design Doc: InferenceEngine

The main goal of inference API is easy to use.
In Fluid, a neural network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the Python wrapper of which is [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/framework.py).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protobuf message => protobuf message called
the Python wrapper of which is => the Python wrapper for which is a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


The main goal of inference API is easy to use.
In Fluid, a neural network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the Python wrapper of which is [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/framework.py).
Given a [inference program](#inference-program), it can run inside any execution environment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a => an
it can run inside => it can be executed inside

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

The main goal of inference API is easy to use.
In Fluid, a neural network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the Python wrapper of which is [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/framework.py).
Given a [inference program](#inference-program), it can run inside any execution environment.
In Fluid, we call the execution environment runtime, which includes [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h), [Scope](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) and [Executor](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/executor.md).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

environment runtime => environment a runtime
which includes => which includes a
[Scope] => a [Scope]
[Executor] => an [Executor]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


## Inference Program

A simple inference program may be defined in Python API as:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be => can be
as => as the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


1. An `InferenceEngine` can be constructed by a `ProgramBuilder`.
1. An `InferenceEngine` also holds pointer to the current `Runtime`. Users can call `SetRuntime()` to set the current runtime, and the `startup_program` will be run once to initialize parameters for this runtime.
1. After setting the current runtime, users can call `Run()` to run the inference program as many times as they required.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

required => require

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

1. An `InferenceEngine` can be constructed by a `ProgramBuilder`.
1. An `InferenceEngine` also holds pointer to the current `Runtime`. Users can call `SetRuntime()` to set the current runtime, and the `startup_program` will be run once to initialize parameters for this runtime.
1. After setting the current runtime, users can call `Run()` to run the inference program as many times as they required.
1. Data structure, [framework::Tensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.md) and [framework::LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md), are used in user codes to feed input data and fetch output data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user codes => user implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


### Example

Here is the simplest example to use `InferenceEngine` to build a inference program directly from file and run on a single CPU.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a inference => an inference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Runtime runtime("CPU");

InferenceEngine engine(builder);
// Set the runtime, in which the startup_program will be ran to initialize parameters for the runtime

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ran => run

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Set the runtime, in which the startup_program will be ran to initialize parameters for the runtime
engine.SetRuntime(&runtime);

// Run the main_program many times

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many => multiple

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for @kavyasrinet to correct the English again.

@@ -0,0 +1,178 @@
# Design Doc: InferenceEngine
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,178 @@
# Design Doc: InferenceEngine

The main goal of inference API is easy to use.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# Design Doc: InferenceEngine

The main goal of inference API is easy to use.
In Fluid, a neural network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the Python wrapper of which is [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/framework.py).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


The main goal of inference API is easy to use.
In Fluid, a neural network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the Python wrapper of which is [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/framework.py).
Given a [inference program](#inference-program), it can run inside any execution environment.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

The main goal of inference API is easy to use.
In Fluid, a neural network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the Python wrapper of which is [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/framework.py).
Given a [inference program](#inference-program), it can run inside any execution environment.
In Fluid, we call the execution environment runtime, which includes [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h), [Scope](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) and [Executor](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/executor.md).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


1. An `InferenceEngine` can be constructed by a `ProgramBuilder`.
1. An `InferenceEngine` also holds pointer to the current `Runtime`. Users can call `SetRuntime()` to set the current runtime, and the `startup_program` will be run once to initialize parameters for this runtime.
1. After setting the current runtime, users can call `Run()` to run the inference program as many times as they required.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

1. An `InferenceEngine` can be constructed by a `ProgramBuilder`.
1. An `InferenceEngine` also holds pointer to the current `Runtime`. Users can call `SetRuntime()` to set the current runtime, and the `startup_program` will be run once to initialize parameters for this runtime.
1. After setting the current runtime, users can call `Run()` to run the inference program as many times as they required.
1. Data structure, [framework::Tensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.md) and [framework::LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md), are used in user codes to feed input data and fetch output data.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


### Example

Here is the simplest example to use `InferenceEngine` to build a inference program directly from file and run on a single CPU.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Runtime runtime("CPU");

InferenceEngine engine(builder);
// Set the runtime, in which the startup_program will be ran to initialize parameters for the runtime
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Set the runtime, in which the startup_program will be ran to initialize parameters for the runtime
engine.SetRuntime(&runtime);

// Run the main_program many times
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Xreki
Copy link
Contributor Author

Xreki commented Jan 16, 2018

@wangkuiyi

It looks to me that this design separates inference from training -- I don't see the necessity of having startup and main programs for inference as there are for training.

For an inference system where there is no training program and the inference program is initialized from file, there needs a main_program, and a startup_program is kept for optimization.

Please make sure that we can write an online training program, which means a training program can also provide the inference serving at the same time.

I update the PR. I think the API is easy to be extended to a common C++ API which supports online training and inference in the future.

@Xreki Xreki force-pushed the core_inference_api_design_doc branch from ca97606 to baf2802 Compare January 16, 2018 12:09
There are three ways to define an inference program.
- **Case 1**, split from a training program. A training program can provide the inference serving at the same time, in which case the inference program is part of the training program, and all the parameters have been set correctly. There is no need of an extra `startup_program` for this kind of inferencing now and the need of an separate `main_program` for inference may be removed in the future which depends on the implementation of `Executor.Run()`.
- **Case 2**, write an inference program directly using API. In this case, parameters are stored in files.
- **Case 3**, read a pre-trained inference program from file. In this case, both the `ProgramDesc` and parameters are stored in files. We can get a complete `ProgramDesc` straightway and keeping a `main_program` and a `startup_program` make it possible to perform some online optimization (discussed [below](#introduction-of-program-builder)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this saved inference ProgramDesc exactly the same as the training ProgramDesc (for this case, we can let user specify the pruning target, feed/fetch var names on the C++ side) or is it obtained after prune, inference_optimize and prepend/append feed/fetch operator to the training ProgramDesc (since we don't want to change the framework.proto to add new fields to ProgramDesc, we directly prepend/append feed fetch before saving the model)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can support both inference and training ProgramDesc.

  • If supporting inference ProgramDesc, then we need to prepend/append feed_op and fetch_op in fluid.io.save_inference_model
  • If supporting training ProgramDesc, we can call the operator()(std::vector<std::string>& feed_var_names, std::vector<std::string>& fetch_var_names) to get a inference program, and users need to specify the feed var names and fetch var names.

- **Case 3**, read a pre-trained inference program from file. In this case, both the `ProgramDesc` and parameters are stored in files. We can get a complete `ProgramDesc` straightway and keeping a `main_program` and a `startup_program` make it possible to perform some online optimization (discussed [below](#introduction-of-program-builder)).

In this design doc, we mainly detail the interfaces for the **Case 3**.
- The protobuf message of the `main_program` is saved using `fluid.io.save_inference_model` method. Thus, it can be initilized from file or from a pre-loaded buffer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xreki : I discussed something with @kexinzhao regarding the protobuf message, and wrote it here: https://github.com/sidgoyal78/paddle_notes/blob/master/inference.md
Can you please take a look and maybe prefer one or the other approaches described?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. I see you post this thought in #7580 . I'll have a look.
So, @sidgoyal78 @kexinzhao I wonder if you have any idea about the design doc? In fact, I need some suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ProgramBuilder class is necessary (maybe we can think of a better name). But this class is necessary (it is just an analogous to the Program class in Python). Same is the case with Resolver class (again a better name can be thought, maybe MetaExecutor, or something), i think it is necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @kexinzhao , it seems that maybe Runtime class could be avoided, and we could just get away with Builder and Resolver classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For names: few suggestions:
ProgramBuilder -> ProgramMaker / ProgramFactory
ProgramResolver -> ProgramRunner

Copy link
Contributor

@sidgoyal78 sidgoyal78 Jan 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other: ProgramBuilder -> InferenceEngineInitializer
ProgramResolver -> InferenceEngineRunner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sidgoyal78 Thanks very much. I introduced Runtime so that users just need to know Runtime, no need to care Place, Executor and Scope. However, we can remove Runtime and use the core concept just like Python.

@Xreki Xreki force-pushed the core_inference_api_design_doc branch from 46bbd7d to 339f4ed Compare February 11, 2018 03:03
@luotao1
Copy link
Contributor

luotao1 commented Feb 1, 2019

感谢您给PaddlePaddle贡献文档。由于文档已迁移至FluidDoc repo,因此关闭您的PR,欢迎您向FluidDoc Repo贡献文档。
Thanks for contributing to PaddlePaddle! Since documents have been moved to FluidDoc repo, we close this PR. Welcome to contribute to FluidDoc repo.

@luotao1 luotao1 closed this Feb 1, 2019
@Xreki Xreki deleted the core_inference_api_design_doc branch October 29, 2019 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
预测 原名Inference,包含Capi预测问题等
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants