Add design doc of inference API for fluid. #7315

Xreki · 2018-01-08T12:01:58Z

wangkuiyi · 2018-01-08T20:32:32Z

doc/design/inference.md

@@ -0,0 +1,105 @@
+# Design Doc: Inferencer
+
+In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.


fluid => Fluid

This mistake appears in many places in this document.

wangkuiyi · 2018-01-08T20:33:04Z

doc/design/inference.md

@@ -0,0 +1,105 @@
+# Design Doc: Inferencer
+
+In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.


Please be aware that Fluid doesn't represent a network at all. The protobuf message represents the program, not the network.

There is no network in Fluid. However, neural network seems a phrase in deep learning. I'll think about a better expression.

wangkuiyi · 2018-01-08T20:33:13Z

doc/design/inference.md

@@ -0,0 +1,105 @@
+# Design Doc: Inferencer
+
+In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.


python => Python

This mistake appears in many other places in the document.

wangkuiyi · 2018-01-08T20:33:43Z

doc/design/inference.md

+# Design Doc: Inferencer
+
+In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.
+Given a `ProgramDesc`, it can be run on any execution environment.


Runtime => runtime

wangkuiyi · 2018-01-08T20:35:22Z

doc/design/inference.md

+Given a `ProgramDesc`, it can be run on any execution environment.
+In fluid, we call the execution environment `Runtime`, which includes `Place`, `Scope` and `Executor`.
+
+## Representation of the Inference Network


Again, here it is the inference program, not the inference network.

wangkuiyi · 2018-01-08T20:35:37Z

doc/design/inference.md

+                          act='softmax')
+```
+
+After training for serval passes, the parameters can be saved use `fluid.io.save_inference_model`, which will save the binary proto string of the network at the same time.


pass => epoch

wangkuiyi · 2018-01-08T20:36:42Z

doc/design/inference.md

+                          act='softmax')
+```
+
+After training for serval passes, the parameters can be saved use `fluid.io.save_inference_model`, which will save the binary proto string of the network at the same time.


Don't use passive voice (被动语态), which is highly suppressed in English writing, unless it is really necessary.

Will do in following commits.

abhinavarora · 2018-01-08T20:43:06Z

doc/design/inference.md

+
+Given a `inference_program`, it is easy to derive a `load_program` which is composed of `load_op` and is responsible for initializing all the parameter variables in `inference_program`. `load_program` will be executed once and `inference_program` will be executed as many times as you need.
+
+To summerize, a inferencer should:


a inferencer -> an inference module

abhinavarora · 2018-01-08T20:43:49Z

doc/design/inference.md

+
+In fluid, the execution environment is composed of three key concepts: `Place`, `Scope` and `Executor`.
+
+There are two types of Place in current framework, `CPUPlace` for CPU and `CUDAPlace` for CUDA GPU. `Scope` is independent to `Place`. Given the place, you need to define a `Executor`, and run the `Executor` among the `Scope`.


a Executor -> an Executor

kavyasrinet

Thanks for doing this. I have added my review comments. Most of it is rephrasing and typos.
We should decide on a new term for Inferencer. I have proposed Inference Engine in my review, but I am open to other suggestions.

kavyasrinet · 2018-01-08T19:59:24Z

doc/design/inference.md

@@ -0,0 +1,105 @@
+# Design Doc: Inferencer


Inferencer => Let's decide on a new term for this, maybe Inference Engine ?
I am replacing this with Inference Engine in my review right now, but if people decide on something else, we can replace it later.

I vote for Inference Engine.

I am thinking about it and will improve it in following commit.

kavyasrinet · 2018-01-08T19:59:47Z

doc/design/inference.md

@@ -0,0 +1,105 @@
+# Design Doc: Inferencer
+
+In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.


typo "nueral" => neural

kavyasrinet · 2018-01-08T20:01:40Z

doc/design/inference.md

+# Design Doc: Inferencer
+
+In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.
+Given a `ProgramDesc`, it can be run on any execution environment.


on => inside

kavyasrinet · 2018-01-08T20:02:23Z

doc/design/inference.md

+                          act='softmax')
+```
+
+After training for serval passes, the parameters can be saved use `fluid.io.save_inference_model`, which will save the binary proto string of the network at the same time.


type "serval" => several
"use" => using the method

kavyasrinet · 2018-01-08T20:06:29Z

doc/design/inference.md

+                exe)
+```
+
+The saved model contains everything of the inference network, including all operators and variables. Thus, the `inference_program` should be initilized by the model file or a pre-loaded buffer.


"everything of the" => everything required by the

kavyasrinet · 2018-01-08T20:41:36Z

doc/design/inference.md

+- Members:
+  - the pointer of the `inference_program`
+  - the pointer of the `load_program`
+  - vectors of string to record the `feed_var_names` and `fetch_var_names`


to record the => to store the

kavyasrinet · 2018-01-08T20:41:46Z

doc/design/inference.md

+  - the pointer of the `inference_program`
+  - the pointer of the `load_program`
+  - vectors of string to record the `feed_var_names` and `fetch_var_names`
+  - the pointer of current `Runtime`


kavyasrinet · 2018-01-08T20:42:15Z

doc/design/inference.md

+  - `Run`, to run the inference based on the current runtime. 
+  - `SetRuntime`, to set the current runtime. When the runtime is set, the `load_program` will be run once to load parameters from files. 
+- Utility interfaces:
+  - `GetFeed/FetchVarNames`, to help users to debug.


to help users to debug => to help users debug

kavyasrinet · 2018-01-08T20:42:30Z

doc/design/inference.md

+  - `SetRuntime`, to set the current runtime. When the runtime is set, the `load_program` will be run once to load parameters from files. 
+- Utility interfaces:
+  - `GetFeed/FetchVarNames`, to help users to debug.
+  - `GetFeed/FetchVarShape`, to help users to verify the size of input and output data.


to help users to verify => to help users verify

kavyasrinet · 2018-01-08T20:43:28Z

doc/design/inference.md

+
+### Issues
+
+- Normally, all fetching variables' names should be written in the ProgramDesc and read from file. If users want to add some extra fetching variables for debug, or for some other use, they need to regenerate the file again. Do we need to allow user to append extra fetching variables?


for debug => for debugging purposes
extra fetching => extra fetch

abhinavarora

LGTM! Appart from the English corrections suggested by Kavya and Yi, the first draft of the design doc looks good and is a good starting point. Thank you for the great work.

sidgoyal78 · 2018-01-09T00:50:41Z

Thanks for the PR, this is helpful.

luotao1 · 2018-01-09T02:52:58Z

doc/design/inference.md

+```python
+fluid.io.save_inference_model(
+                "./inference_model/", ["x"], [predict],
+                exe)


exe->executor?

Can you give a source code link to save_inference_model (since we don't have io in http://www.paddlepaddle.org/docs/develop/documentation/zh/api/v2/fluid.html), in order to explain the meaning of each parameter？Or give a short explanation here? Thus, we can understand why feed_var_names and fetch_var_names in members come from.

Will do in following commits.

luotao1 · 2018-01-09T03:00:54Z

doc/design/inference.md

+
+## Support of Switching Runtime
+
+In fluid, the execution environment is composed of three key concepts: `Place`, `Scope` and `Executor`.


Can you add some links to Place, Scope and Executor, if users want to know more details of these three key concepts?

Done. For Scope and Executor, I add the link to the design doc. For Place, there is no design doc so I add the link to the C++ header file.

Xreki

Thanks for all of your reviews. I fixed all the typos and the English corrections.
After discussed with @qingqing01 , we may introduce the concept of ProgramBuilder which will support the developing transpiler. I'll update the design doc as soon as possible.

Several issues list here to remind me:

There is no network in Fluid.
No passive voice in design doc.
Rename Inferencer to Inference Engine.

Xreki · 2018-01-09T08:05:39Z

doc/design/inference.md

@@ -0,0 +1,105 @@
+# Design Doc: Inferencer


I am thinking about it and will improve it in following commit.

Xreki · 2018-01-09T08:05:49Z

doc/design/inference.md

@@ -0,0 +1,105 @@
+# Design Doc: Inferencer
+
+In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.


Xreki · 2018-01-09T08:05:59Z

doc/design/inference.md

@@ -0,0 +1,105 @@
+# Design Doc: Inferencer
+
+In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.


Xreki · 2018-01-09T08:09:11Z

doc/design/inference.md

@@ -0,0 +1,105 @@
+# Design Doc: Inferencer
+
+In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.


There is no network in Fluid. However, neural network seems a phrase in deep learning. I'll think about a better expression.

Xreki · 2018-01-09T08:09:20Z

doc/design/inference.md

@@ -0,0 +1,105 @@
+# Design Doc: Inferencer
+
+In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.


Xreki · 2018-01-09T08:18:06Z

doc/design/inference.md

+- Members:
+  - the pointer of the `inference_program`
+  - the pointer of the `load_program`
+  - vectors of string to record the `feed_var_names` and `fetch_var_names`


Xreki · 2018-01-09T08:18:12Z

doc/design/inference.md

+  - the pointer of the `inference_program`
+  - the pointer of the `load_program`
+  - vectors of string to record the `feed_var_names` and `fetch_var_names`
+  - the pointer of current `Runtime`


Xreki · 2018-01-09T08:18:20Z

doc/design/inference.md

+  - `Run`, to run the inference based on the current runtime. 
+  - `SetRuntime`, to set the current runtime. When the runtime is set, the `load_program` will be run once to load parameters from files. 
+- Utility interfaces:
+  - `GetFeed/FetchVarNames`, to help users to debug.


Xreki · 2018-01-09T08:18:28Z

doc/design/inference.md

+  - `SetRuntime`, to set the current runtime. When the runtime is set, the `load_program` will be run once to load parameters from files. 
+- Utility interfaces:
+  - `GetFeed/FetchVarNames`, to help users to debug.
+  - `GetFeed/FetchVarShape`, to help users to verify the size of input and output data.


Xreki · 2018-01-09T08:18:38Z

doc/design/inference.md

+
+### Issues
+
+- Normally, all fetching variables' names should be written in the ProgramDesc and read from file. If users want to add some extra fetching variables for debug, or for some other use, they need to regenerate the file again. Do we need to allow user to append extra fetching variables?


…amBuilder, and rename Inferencer to InferenceEngine.

sidgoyal78

Thanks very much for adding more details. This is helpful, I have 2 questions as of now. Will probably have more tomorrow :)

sidgoyal78 · 2018-01-11T09:50:52Z

doc/design/inference.md

+- It is possible to support online optimization of the inference program.
+  We will design an inference transpiler to do offline optimization for inference, which produce an optimized inference `ProgramDesc` for a given `ProgramDesc`. However, some optimization can be done online, such as
+  - changing the layout from `NCHW` to `NHWC`
+  - merging the computation of batch normalization layer to the front fc layer or conv layer


@Xreki : Can you explain this merging of computation for batch norm (may not be needed for this doc)?

Here I just list something we may do in the future.
About merging the computation of batch norm layer, you can find some detail here and here. After merging the batch norm layer, mobilenet can get a 30% speedup, without loss of precision.

Oh nice. Thank you.

sidgoyal78 · 2018-01-11T09:57:13Z

doc/design/inference.md

-To summarize, an inferencer module should:
- be initialized from files or from buffers
- be composed of two `ProgramDesc`s, namely the `inference_program` and  `load_program`
+In the first design, `ProgramBuilder` contains all the elements memtioned above, and is instanced by protobuf message of the `main_program`. Other members `startup_program`, `feed_var_names` and `fetch_var_names` will also be derived in the constructor.


Will this protobuf of main_program (which will be used to instantiate the ProgramBuilder) have feed,fetch ops added to the original program-desc?

Xreki

@sidgoyal78 your questions are welcome. There may be some weak point in this design doc. Please remind me. And any proposal will be appreciated and helpful for me.

Xreki · 2018-01-11T10:51:01Z

doc/design/inference.md

+- It is possible to support online optimization of the inference program.
+  We will design an inference transpiler to do offline optimization for inference, which produce an optimized inference `ProgramDesc` for a given `ProgramDesc`. However, some optimization can be done online, such as
+  - changing the layout from `NCHW` to `NHWC`
+  - merging the computation of batch normalization layer to the front fc layer or conv layer


Here I just list something we may do in the future.
About merging the computation of batch norm layer, you can find some detail here and here. After merging the batch norm layer, mobilenet can get a 30% speedup, without loss of precision.

Xreki · 2018-01-11T10:57:25Z

doc/design/inference.md

+};
+```
+
+In the first design, `ProgramBuilder` contains all the elements memtioned above, and is instanced by protobuf message of the `main_program`. Other members `startup_program`, `feed_var_names` and `fetch_var_names` will also be derived in the constructor.


I think the main_program should have feed_ops and fetch_ops, or we'll need to clone the main_program and insert feed_ops and fetch_ops to the copy in Run(), like in the Python implementation. I think it is redundant.

However, how the feed_ops and fetch_ops come depends on the storing format. It may be inserted in the c++ code or may be initialized from protobuf message file.

wangkuiyi

It looks to me that this design separates inference from training -- I don't see the necessity of having startup and main programs for inference as there are for training.

Please make sure that we can write an online training program, which means a training program can also provide the inference serving at the same time.

wangkuiyi · 2018-01-11T18:57:17Z

doc/design/inference.md

@@ -0,0 +1,178 @@
+# Design Doc: InferenceEngine


InferenceEngine => Inference Engine

kavyasrinet

Thank you so much for revising the design doc. I have added few comments for certain parts. I might have few design questions too, will post them in a separate review, so it is isn't too cluttered.

kavyasrinet · 2018-01-11T19:57:28Z

doc/design/inference.md

@@ -0,0 +1,178 @@
+# Design Doc: InferenceEngine
+
+The main goal of inference API is easy to use.


The main goal of inference API is easy to use. => The main goal of an inference API is to make it easy to use.

kavyasrinet · 2018-01-11T19:58:31Z

doc/design/inference.md

+# Design Doc: InferenceEngine
+
+The main goal of inference API is easy to use.
+In Fluid, a neural network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the Python wrapper of which is [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/framework.py).


protobuf message => protobuf message called
the Python wrapper of which is => the Python wrapper for which is a

kavyasrinet · 2018-01-11T19:58:58Z

doc/design/inference.md

+
+The main goal of inference API is easy to use.
+In Fluid, a neural network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the Python wrapper of which is [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/framework.py).
+Given a [inference program](#inference-program), it can run inside any execution environment.


a => an
it can run inside => it can be executed inside

kavyasrinet · 2018-01-11T20:00:01Z

doc/design/inference.md

+The main goal of inference API is easy to use.
+In Fluid, a neural network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the Python wrapper of which is [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/framework.py).
+Given a [inference program](#inference-program), it can run inside any execution environment.
+In Fluid, we call the execution environment runtime, which includes [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h), [Scope](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) and [Executor](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/executor.md).


environment runtime => environment a runtime
which includes => which includes a
[Scope] => a [Scope]
[Executor] => an [Executor]

kavyasrinet · 2018-01-11T20:00:12Z

doc/design/inference.md

+
+## Inference Program
+
+A simple inference program may be defined in Python API as:


may be => can be
as => as the

kavyasrinet · 2018-01-11T21:23:52Z

doc/design/inference.md

+
+1. An `InferenceEngine` can be constructed by a `ProgramBuilder`.
+1. An `InferenceEngine` also holds pointer to the current `Runtime`. Users can call `SetRuntime()` to set the current runtime, and the `startup_program` will be run once to initialize parameters for this runtime.
+1. After setting the current runtime, users can call `Run()` to run the inference program as many times as they required.


required => require

kavyasrinet · 2018-01-11T21:24:14Z

doc/design/inference.md

+1. An `InferenceEngine` can be constructed by a `ProgramBuilder`.
+1. An `InferenceEngine` also holds pointer to the current `Runtime`. Users can call `SetRuntime()` to set the current runtime, and the `startup_program` will be run once to initialize parameters for this runtime.
+1. After setting the current runtime, users can call `Run()` to run the inference program as many times as they required.
+1. Data structure, [framework::Tensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.md) and [framework::LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md), are used in user codes to feed input data and fetch output data.


user codes => user implementation

kavyasrinet · 2018-01-11T21:24:41Z

doc/design/inference.md

+
+### Example
+
+Here is the simplest example to use `InferenceEngine` to build a inference program directly from file and run on a single CPU.


a inference => an inference

kavyasrinet · 2018-01-11T21:24:57Z

doc/design/inference.md

+Runtime runtime("CPU");
+
+InferenceEngine engine(builder);
+// Set the runtime, in which the startup_program will be ran to initialize parameters for the runtime


kavyasrinet · 2018-01-11T21:25:08Z

doc/design/inference.md

+// Set the runtime, in which the startup_program will be ran to initialize parameters for the runtime
+engine.SetRuntime(&runtime);
+
+// Run the main_program many times


many => multiple

Xreki

Thanks for @kavyasrinet to correct the English again.

Xreki · 2018-01-15T02:46:15Z

doc/design/inference.md

@@ -0,0 +1,178 @@
+# Design Doc: InferenceEngine


Xreki · 2018-01-15T02:46:23Z

doc/design/inference.md

@@ -0,0 +1,178 @@
+# Design Doc: InferenceEngine
+
+The main goal of inference API is easy to use.


Xreki · 2018-01-15T02:46:30Z

doc/design/inference.md

+# Design Doc: InferenceEngine
+
+The main goal of inference API is easy to use.
+In Fluid, a neural network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the Python wrapper of which is [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/framework.py).


Xreki · 2018-01-15T02:46:37Z

doc/design/inference.md

+
+The main goal of inference API is easy to use.
+In Fluid, a neural network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the Python wrapper of which is [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/framework.py).
+Given a [inference program](#inference-program), it can run inside any execution environment.


Xreki · 2018-01-15T02:46:45Z

doc/design/inference.md

+The main goal of inference API is easy to use.
+In Fluid, a neural network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the Python wrapper of which is [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/framework.py).
+Given a [inference program](#inference-program), it can run inside any execution environment.
+In Fluid, we call the execution environment runtime, which includes [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h), [Scope](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) and [Executor](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/executor.md).


Xreki · 2018-01-15T02:51:18Z

doc/design/inference.md

+
+1. An `InferenceEngine` can be constructed by a `ProgramBuilder`.
+1. An `InferenceEngine` also holds pointer to the current `Runtime`. Users can call `SetRuntime()` to set the current runtime, and the `startup_program` will be run once to initialize parameters for this runtime.
+1. After setting the current runtime, users can call `Run()` to run the inference program as many times as they required.


Xreki · 2018-01-15T02:51:26Z

doc/design/inference.md

+1. An `InferenceEngine` can be constructed by a `ProgramBuilder`.
+1. An `InferenceEngine` also holds pointer to the current `Runtime`. Users can call `SetRuntime()` to set the current runtime, and the `startup_program` will be run once to initialize parameters for this runtime.
+1. After setting the current runtime, users can call `Run()` to run the inference program as many times as they required.
+1. Data structure, [framework::Tensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.md) and [framework::LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md), are used in user codes to feed input data and fetch output data.


Xreki · 2018-01-15T02:51:34Z

doc/design/inference.md

+
+### Example
+
+Here is the simplest example to use `InferenceEngine` to build a inference program directly from file and run on a single CPU.


Xreki · 2018-01-15T02:51:42Z

doc/design/inference.md

+Runtime runtime("CPU");
+
+InferenceEngine engine(builder);
+// Set the runtime, in which the startup_program will be ran to initialize parameters for the runtime


Xreki · 2018-01-15T02:51:49Z

doc/design/inference.md

+// Set the runtime, in which the startup_program will be ran to initialize parameters for the runtime
+engine.SetRuntime(&runtime);
+
+// Run the main_program many times


Xreki · 2018-01-16T12:04:43Z

@wangkuiyi

It looks to me that this design separates inference from training -- I don't see the necessity of having startup and main programs for inference as there are for training.

For an inference system where there is no training program and the inference program is initialized from file, there needs a main_program, and a startup_program is kept for optimization.

Please make sure that we can write an online training program, which means a training program can also provide the inference serving at the same time.

I update the PR. I think the API is easy to be extended to a common C++ API which supports online training and inference in the future.

…aining in C++ API.

…eki/Paddle into core_inference_api_design_doc

kexinzhao · 2018-01-17T01:16:08Z

doc/design/inference.md

+There are three ways to define an inference program.
+- **Case 1**, split from a training program. A training program can provide the inference serving at the same time, in which case the inference program is part of the training program, and all the parameters have been set correctly. There is no need of an extra `startup_program` for this kind of inferencing now and the need of an separate `main_program` for inference may be removed in the future which depends on the implementation of `Executor.Run()`.
+- **Case 2**, write an inference program directly using API. In this case, parameters are stored in files.
+- **Case 3**, read a pre-trained inference program from file. In this case, both the `ProgramDesc` and parameters are stored in files. We can get a complete `ProgramDesc` straightway and keeping a `main_program` and a `startup_program` make it possible to perform some online optimization (discussed [below](#introduction-of-program-builder)).


Is this saved inference ProgramDesc exactly the same as the training ProgramDesc (for this case, we can let user specify the pruning target, feed/fetch var names on the C++ side) or is it obtained after prune, inference_optimize and prepend/append feed/fetch operator to the training ProgramDesc (since we don't want to change the framework.proto to add new fields to ProgramDesc, we directly prepend/append feed fetch before saving the model)?

I think we can support both inference and training ProgramDesc.

If supporting inference ProgramDesc, then we need to prepend/append feed_op and fetch_op in fluid.io.save_inference_model

If supporting training ProgramDesc, we can call the operator()(std::vector<std::string>& feed_var_names, std::vector<std::string>& fetch_var_names) to get a inference program, and users need to specify the feed var names and fetch var names.

sidgoyal78 · 2018-01-17T01:51:15Z

doc/design/inference.md

+- **Case 3**, read a pre-trained inference program from file. In this case, both the `ProgramDesc` and parameters are stored in files. We can get a complete `ProgramDesc` straightway and keeping a `main_program` and a `startup_program` make it possible to perform some online optimization (discussed [below](#introduction-of-program-builder)).
+
+In this design doc, we mainly detail the interfaces for the **Case 3**.
+- The protobuf message of the `main_program` is saved using `fluid.io.save_inference_model` method. Thus, it can be initilized from file or from a pre-loaded buffer.


@Xreki : I discussed something with @kexinzhao regarding the protobuf message, and wrote it here: https://github.com/sidgoyal78/paddle_notes/blob/master/inference.md
Can you please take a look and maybe prefer one or the other approaches described?

Great. I see you post this thought in #7580 . I'll have a look.
So, @sidgoyal78 @kexinzhao I wonder if you have any idea about the design doc? In fact, I need some suggestions.

I think the ProgramBuilder class is necessary (maybe we can think of a better name). But this class is necessary (it is just an analogous to the Program class in Python). Same is the case with Resolver class (again a better name can be thought, maybe MetaExecutor, or something), i think it is necessary.

After discussing with @kexinzhao , it seems that maybe Runtime class could be avoided, and we could just get away with Builder and Resolver classes.

For names: few suggestions:
ProgramBuilder -> ProgramMaker / ProgramFactory
ProgramResolver -> ProgramRunner

Other: ProgramBuilder -> InferenceEngineInitializer
ProgramResolver -> InferenceEngineRunner

@sidgoyal78 Thanks very much. I introduced Runtime so that users just need to know Runtime, no need to care Place, Executor and Scope. However, we can remove Runtime and use the core concept just like Python.

luotao1 · 2019-02-01T05:51:09Z

感谢您给PaddlePaddle贡献文档。由于文档已迁移至FluidDoc repo，因此关闭您的PR，欢迎您向FluidDoc Repo贡献文档。
Thanks for contributing to PaddlePaddle! Since documents have been moved to FluidDoc repo, we close this PR. Welcome to contribute to FluidDoc repo.

Add design doc for inference.

5f734c1

Xreki mentioned this pull request Jan 8, 2018

Plan to develop the inference library of fluid #7145

Closed

Xreki added the 预测原名Inference，包含Capi预测问题等 label Jan 8, 2018

wangkuiyi requested changes Jan 8, 2018

View reviewed changes

abhinavarora reviewed Jan 8, 2018

View reviewed changes

kavyasrinet reviewed Jan 8, 2018

View reviewed changes

abhinavarora previously approved these changes Jan 8, 2018

View reviewed changes

luotao1 reviewed Jan 9, 2018

View reviewed changes

Xreki commented Jan 9, 2018

View reviewed changes

Follow comments to fix all the typos and English problems.

83aa491

Xreki dismissed abhinavarora’s stale review via 83aa491 January 9, 2018 08:27

Xreki added 2 commits January 10, 2018 07:51

Merge branch 'develop' into core_inference_api_design_doc

0bfb9eb

Update the design doc of inference API, introduce a new concept Progr…

9337473

…amBuilder, and rename Inferencer to InferenceEngine.

sidgoyal78 reviewed Jan 11, 2018

View reviewed changes

Xreki commented Jan 11, 2018

View reviewed changes

wangkuiyi reviewed Jan 11, 2018

View reviewed changes

kavyasrinet reviewed Jan 11, 2018

View reviewed changes

Merge branch 'develop' into core_inference_api_design_doc

499ed28

Xreki commented Jan 15, 2018

View reviewed changes

Xreki added 2 commits January 15, 2018 10:53

Follow comments to correct the English writtens.

f08f698

Merge branch 'develop' into core_inference_api_design_doc

0d18fb7

Update the design doc, mainly add discussion about the support for tr…

baf2802

…aining in C++ API.

Xreki force-pushed the core_inference_api_design_doc branch from ca97606 to baf2802 Compare January 16, 2018 12:09

Xreki added 2 commits January 16, 2018 12:09

Merge branch 'core_inference_api_design_doc' of https://github.com/Xr…

a666080

…eki/Paddle into core_inference_api_design_doc

Merge branch 'develop' into core_inference_api_design_doc

acd8131

kexinzhao reviewed Jan 17, 2018

View reviewed changes

sidgoyal78 reviewed Jan 17, 2018

View reviewed changes

sidgoyal78 mentioned this pull request Jan 17, 2018

Input to Inference Engine #7580

Closed

Merge branch 'develop' into core_inference_api_design_doc

6d8226b

Xreki force-pushed the core_inference_api_design_doc branch from bf0823c to 46bbd7d Compare January 18, 2018 11:49

sidgoyal78 mentioned this pull request Jan 19, 2018

Implement basic Load() and modify example based on updated inference design #7690

Merged

Merge branch 'develop' into core_inference_api_design_doc

339f4ed

Xreki force-pushed the core_inference_api_design_doc branch from 46bbd7d to 339f4ed Compare February 11, 2018 03:03

luotao1 closed this Feb 1, 2019

Xreki deleted the core_inference_api_design_doc branch October 29, 2019 00:41

		@@ -0,0 +1,105 @@
		# Design Doc: Inferencer

		In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.


		Given a `inference_program`, it is easy to derive a `load_program` which is composed of `load_op` and is responsible for initializing all the parameter variables in `inference_program`. `load_program` will be executed once and `inference_program` will be executed as many times as you need.

		To summerize, a inferencer should:


		In fluid, the execution environment is composed of three key concepts: `Place`, `Scope` and `Executor`.

		There are two types of Place in current framework, `CPUPlace` for CPU and `CUDAPlace` for CUDA GPU. `Scope` is independent to `Place`. Given the place, you need to define a `Executor`, and run the `Executor` among the `Scope`.


		### Issues

		- Normally, all fetching variables' names should be written in the ProgramDesc and read from file. If users want to add some extra fetching variables for debug, or for some other use, they need to regenerate the file again. Do we need to allow user to append extra fetching variables?


		## Support of Switching Runtime

		In fluid, the execution environment is composed of three key concepts: `Place`, `Scope` and `Executor`.

		@@ -0,0 +1,178 @@
		# Design Doc: InferenceEngine

		The main goal of inference API is easy to use.


		## Inference Program

		A simple inference program may be defined in Python API as:


		### Example

		Here is the simplest example to use `InferenceEngine` to build a inference program directly from file and run on a single CPU.

Add design doc of inference API for fluid. #7315

Add design doc of inference API for fluid. #7315

Conversation

Xreki commented Jan 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kavyasrinet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abhinavarora left a comment • edited Loading

Choose a reason for hiding this comment

sidgoyal78 commented Jan 9, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xreki left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sidgoyal78 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xreki Jan 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sidgoyal78 Jan 11, 2018 • edited Loading

Choose a reason for hiding this comment

Xreki left a comment • edited Loading

Choose a reason for hiding this comment

Xreki Jan 11, 2018 • edited Loading

Choose a reason for hiding this comment

Xreki Jan 11, 2018 • edited Loading

Choose a reason for hiding this comment

wangkuiyi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abhinavarora left a comment •

edited

Loading

Xreki left a comment •

edited

Loading

Xreki Jan 11, 2018 •

edited

Loading

sidgoyal78 Jan 11, 2018 •

edited

Loading

Xreki left a comment •

edited

Loading

Xreki Jan 11, 2018 •

edited

Loading

Xreki Jan 11, 2018 •

edited

Loading