Mind you that this simulator is based on google's tpu while Eyeriss is from MIT/NVIDIA

The processing element

memory

Each PE can access/store data in:

local register (scratchpad):
spatial flow (accessing data from a neighboring PE)
Global memory

each has a corresponding latency

operations

each PE is able to:

load data from the its own register, neighbor PEs, global memory
muliply data
add data
store data in its register, pass it to neighbor PEs, broadcast to global memory

data

data can be either:

input feature map (typically the largest)
weights/filters (multiply element wise filter by input, add them up, save them as output, then moves in strides)
output feature map (smaller than input, can get smaller with large stride or large filter)

dataflow

Accelerator is composed of

PEs arranged in a big array
Global memory that is specified for output, input and filters(aka weights)

dataflow can be:

Weight stationary: (reuse weights, load input)

weights are loaded once and stored in the register different PEs
inputs are loaded from global memory everytime
after parallely multiplying, psums are spacially shared to neighbors for addition

inputs are discarded and finall output is shared back to memory. New input is fetched from memory and local register keeps weight. This is also typically the least efficient as per yet another eyeriss paper. But this one claims that OS is more efficient than WS

Output Stationary: (reuse input, load weitghts)

weights are loaded from global memory everytime
inputs are loaded once, then after used, they are spatially shared to other
after multiplying, psums are accumulated and stored in register

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ideas.md

ideas.md

The processing element

memory

operations

data

dataflow

Weight stationary: (reuse weights, load input)

Output Stationary: (reuse input, load weitghts)

Files

ideas.md

Latest commit

History

ideas.md

File metadata and controls

The processing element

memory

operations

data

dataflow

Weight stationary: (reuse weights, load input)

Output Stationary: (reuse input, load weitghts)