Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add design doc for lookup remote table in Fluid #9068

Merged
merged 12 commits into from
Jul 5, 2018
26 changes: 26 additions & 0 deletions doc/fluid/design/dist_train/distributed_lookup_table_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,32 @@ optimization algorithm $f$ runs on the storage service.
- Con: the storage service needs to be able to run the optimization
algorithm.

## Distributed Sparse Table in Fluid

For another design, we can implement a distributed sparse table in Fluid,
and don't need to maintain an external storage component while training.

You may need to read Fluid [Distributed Training Architecture](./distributed_architecture.md)
and [Parameter Server](./parameter_server.md) before going on.

![fluid lookup remote table](./src/fluid_lookup_remote_table.png)

Partition a large table into multiple pserver instances
1. `DistributeTranspiler` would split the table partitioned into some small
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only use mod for now, but never mind, it's a design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mod is used for finding the right pserver according to the input Id, and we also need to initialize the shape of the table block on each PServer by split.

table blocks with some partitioned algorithms such as
[RoundRobin](https://en.wikipedia.org/wiki/Round-robin_scheduling),
[Hash](https://en.wikipedia.org/wiki/Hash) and etc...
1. For some cases, the range of input `Ids` is very wide and unpredictable, so the sparse
table would be able to fill a new value for the id that didn't appear before with
zero, uniform random or Gaussian distribution.

For each Trainer's training process:
1. In the forward pass, we use `pre-fetch` op to pre-fetch parameter blocks according to the
input `Ids` from PServers instead of the local `lookup_table` op, and then merge the blocks
into a parameter `W`.
1. Compute `GRAD@W'` in the backward pass using the pre-fetched `W` and send it to PServer to
execute the optimize pass.

## Conclusion

Let us do the "storage service does not optimize" solution first, as a
Expand Down
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.