- Uses Izhikevich integrate and fire model to simulate neurons
- (todo...)
- (describe how everything works)
- Requires WSL or Ubuntu Linux
- Run
pip install -r requirements.txt
- Requires Icarus Verilog, run
sudo apt-get install iverilog
- Vivado Studio for loading FPGA programs
- Thonny for Raspberry Pi Pico
- Input parameters of model through spi (or ethernet)
- Input adjacency matrix through spi (or ethernet)
- Write adjacency matrix to ram
- (Calculate inputs if layered neural network)
- Calculate inputs based on adjacency matrix
- Run izhikevich core calculaton
- (Run stdp, or r-stdp)
- (Write state to ram)
- Repeat
- Send relevant information back through spi (or ethernet)
- Equation high level synthesis
- Fix synthesis to use wires instead of registers as intermediates
- Configurable number of bits
- Generation
- Multiplication with configurable number of bits
- Preprocessing
- To the power of an integer expansion
- Simplification of expressions with only constants
- Nest equations correctly based on order of operations
- Addition/Subtraction
- Multiplication
- Basic multiplication
- Optimized for LUTs (Booth's algorithm)
- Code simulation
- Integers
- Fixed point decimals (balanced)
- Fixed point decimals (unbalanced)
- Hardware simulation
- Comparison of LUT utilization
- Code simulation
- Negation
- Reciprocal
- Division
- Absolute value
- High precision
$e^x$ - Limited range
$e^x$ - Preprocessing of equation with tree
- Operation machine
- Variable indexing to stack
- Doing operation and saving output
- Machine that determines which is the next operation and where to store it (state driving machine)
- High level synthesis of state driving machine
- Apply a function to a given set of numbers
- Keep track of the new number, get rid of old numbers if not needed
- Determine next function and repeat
- Multiple operation machines could read from the same state machine
- Operations verification on chip
- Addition/Subtraction
- Multiplication
- Negation
- Reciprocal
- Division
- Absolute value
- Limited range
$e^x$
- Izhikevich core
- Voltage change
- Adaptive value change
- Is spiking signal indicator
- Clocked operations
- Using one operator at a time (use finite state machine to keep track of which operators to use, keep track of numbers in a stack)
- 16-bit processor (with slower clock speed) (found to be inaccurate)
- 8|8 split
- 10|6 split
- 18-bit processor (found to be inaccurate)
- 18-bit processor with scaling down to 0 to 1 range (to prevent overflow)
- Equations reference
- If still suffering from inaccuracies, redo it but use the method the Cornell documentation does, such that the calcuation before the step is applied is multiplied by one fourth on each term and the the step is just one fourth for a total of 1/16 step
- Verilog
- Finding correct parameters
- Currently the w parameter will reach 0 and then never change, need to find a set of parameters that does not engage in this behavior
- Valid parameters could be generated computationaly
- Could be that some of the parameters are so small the fixed point approximation is 0
- Simulate the fixed point operations (do add and mult and then calculate overflow) to see when the equation starts to fail
- Synthesis
- Place and route
- Timing constraints (lower clock speed until timing constraints met, maybe 50 MHz)
- 20 bit processor
- 24 bit processor (found to be more accurate but not enough)
- Verification on chip
- Vivado synthesis
- Without pins
- With pins
- Vivado synthesis
- Cornell processor
- Code verification
- Schematic
- Synthesis
- Compare LUT usage
- Place and route
- Vivado synthesis redo with specification of which is the top module
- Preliminary core testing
- Voltage change calculation
- Adaptive value change calculation
- Plotting adaptive values and voltage values
- Code verification
- Coupled Izhikevich cores
- Gap junction
- Coupling with spike train
- Potentially (8-bit) minifloat FPUs for computation, one per neuron core
- Neurotransmission
- Neurotransmitter core
- NMDA modifier
- Neurotranmission module per row of neurons to handle entire row's calculations
- Using fit parameters of Izhikevich neuron to emulate neurotranmission
- Neurotransmitter core
- Hodgkin Huxley core
- Ion channels
- Poisson neuron
- Psuedo-random number generation
- Should work with a bit length parameter from 16 to 32, could probably index through multiplying max bit length by certain values rather than just subtracting in order to ensure a spread
- Could also try gaussian number generation, would need to approximate distribution function
- Psuedo-random number generation
- Preset spike train
- Neural refractoriness function
- Exponential decay approximation or use delta dirac approximation
- Only needs to concern itself with the positive domain
- Exponential decay approximation or use delta dirac approximation
- RAM interface
- BRAM
- BRAM access from different controllers
- BRAM could store weights for every neuron (until RAM interface is developed)
- If 20 bits, first 4 can store whether there is a connection, and if it is inhibitory or excitatory
- SDRAM
- SDRAM controller
- Try with one memory controller and multiple memory controllers
- BRAM
- Communication protocol
- Basic pin testing
- SPI
- Simulation
- Computer to FPGA (display incoming string on leds)
- FPGA to computer (invert incoming string)
- Synthesis
- On chip verification
- Computer to FPGA
- FPGA to computer
- Simulation
- FPGA side communication
- Raspberry Pi side communication
- Distributed communication
- Izhikevich matrix
- Interwoven matrix
- reference FPGA neural networks
- Potentially could refactor with asynchronous execution of neurons
- For now, try implementing with one memory controller and see if you can get it to work with multiple memory controllers
- Could also try using a crossbar instead of storing voltages in memory
- Save each output of neuron to memory
- Calculate inputs for each neuron by row from memory and save to hardware matrix
- Run iteration of lattice
- Send voltage values back to controller for visualization/history
- Send other variables (neurotransmission values,
a
,b
,c
, andd
values)
- Feedforward network
- Interwoven matrix
- Fast Fourier transform
- On chip comparison of spectral analyses
- This can either be done by finding the most prominent frequency and comparing, binning the analyses and comparing with mean squared error, or approximating the earth-moving-distance (or calculate by solving transport problem)
- On chip comparison of spectral analyses
- STDP
- Internals of update weight functionality should be fit to specific
$\tau_{-}$ and$\tau_{+}$ values to linear piecewise functions- In this scheme the linear piecewise function parameters should take the place of
$\tau_{-}$ and$\tau_{+}$ as well as$a_{-}$ and$a_{+}$ - A
$\Delta{t}$ should return 0 for the change in weight
- In this scheme the linear piecewise function parameters should take the place of
- Since spikes are not likely to occur at the same time, spikes can be handled by a single module being fed the spike times
- There should be an iteration counter, every time a spike occurs, the time is equal to the iteration counter
- Iteration counter is 15 bits + sign bit, when inputting spike time into any equation it is shifted to -1 to 1 range in a 32 bit number
- If the iteration counter resets, any spike that has occured before the reset is set to the spike time - maximum integer value
- If the spike is still negative, it is set to the maximum negative value
- Coupled test
- Internals of update weight functionality should be fit to specific
- R-STDP
- Input values to feed forward version
- Classifier
- Fit to eeg signaling
Should be named hardware-tb
.
├── Makefile
├── hardware.sv
└── test.py
When testing in CocoTB, use dut._log.info(string)
for logging during the simulation
(hardware verilog file within directory is optional)
module add #( parameter N=32, parameter Q=16 )( input [N-1:0] a, input [N-1:0] b, output [N-1:0] c )
[N-1:0] a
: First fixed point term[N-1:0] b
: Second fixed point term[N-1:0] c
: Output in fixed point form
module negator #( parameter N = 32 )( input logic signed [N-1:0] a, output logic signed [N-1:0] out )
[N-1:0] a
: Input fixed point term[N-1:0] out
: Output in fixed point form
module mult #( parameter N = 32, parameter F = 16 )( input logic [N-1:0] a, input logic [N-1:0] b, output logic [N-1:0] c )
[N-1:0] a
: First fixed point term[N-1:0] b
: Second fixed point term[N-1:0] c
: Output in fixed point form
module reciprocal #( parameter N = 32 )( input [N-1:0] a, output reg [N-1:0] out )
(not implemented for N != 32)
(need to refactor with casez
and genvar
)
[N-1:0] a
: Input fixed point term[N-1:0] out
: Output in fixed point form
module div #( parameter N = 32, parameter F = 16 )( input logic [N-1:0] a, input logic [N-1:0] b, output logic [N-1:0] c )
[N-1:0] a
: First fixed point term[N-1:0] b
: Second fixed point term[N-1:0] c
: Output in fixed point form
module abs #( parameter N = 32 )( input [N-1:0] x, output reg [N-1:0] out )
[N-1:0] x
: Input fixed point term[N-1:0] out
: Output in fixed point form
module exp #( parameter N = 32 )( input [N-1:0] x, output reg [N-1:0] out )
(not implemented for N != 32)
(needs to be re-implemented for higher precision)
(relevant link to cordic method, calculate q by taking integer part of x * 1/ln(2), multiply by
[N-1:0] x
: Input fixed point term[N-1:0] out
: Output in fixed point form
module linear_piecewise #( parameter N = 32 ) ( input [N-1:0] x, input [N-1:0] m1, input [N-1:0] m2, input [N-1:0] b1, input [N-1:0] b2, input [N-1:0] split, output [N-1:0] out )
$ f(x)= \begin{cases} {m}{1}x + {b}{1} & x < q\ {m}{2}x + {b}{2} & x \geq q\ \end{cases} $
- (can be refactored to use less LUTs)
-
[N-1:0] x
: Input to function in fixed point representation -
[N-1:0] m1
: Fixed point slope of first half -
[N-1:0] m2
: Fixed point slope of second half -
[N-1:0] b1
: Fixed point intercept of first half -
[N-1:0] b2
: Fixed point intercept of second -
[N-1:0] split
: Where to split piecewise in fixed point form ($q$ ) -
[N-1:0] out
: Output in fixed point form
// todo
module exp #( parameter N = 32, parameter P=power )( input [N-1:0] x, output reg [N-1:0] out )
- todo
- Should be expanded to a processable form in the equation high level synthesis
- Expansion process:
x^0
should be1
x^1
should just bex
x^2
should be(x*x)
x^3
should be((x*x)*x)
x^4
should be((x*x)*(x*x))
x^5
should be(((x*x)*(x*x))*x)
x^6
should be(((x*x)*(x*x))*(x*x))
- ... etc
fixed_point_to_decimal(binary_str: str, integer_bits: int, fractional_bits: int)
: Converts a fixed point represention of a number into a decimalbinary_str: str
: Fixed point representation of a number as a stringinteger_bits: int
: Number of integer bits in fixed point representationfractional_bits: int
: Number of fractional bits in fixed point representation
decimal_to_fixed_point(number: float, integer_bits: float, fractional_bits: float)
: Converts a decimal to a fixed point representationnumber: float
: Number to convert to a fixed point representationinteger_bits: int
: Number of integer bits in fixed point representationfractional_bits: int
: Number of fractional bits in fixed point representation
check_with_tolerance(expected: float, actual: float, tolerance=1e-5)
: Checks how close an expected value is to an actual given tolerance with overflowexpected: float
: Expected numeric valueactual: float
: Actual numeric valuetolerance: float
: Degree of acceptable error
adder_model(a: int, b: int, n_bits: int = 4) -> int
: Performs fixed point addition on two integers with overflowa: int
: First integer termb: int
: Second integer termn_bits: int
: Number of bits in binary representation (must be >=1)
multiplier_model(a: int, b: int, n_bits: int = 4) -> int
: Performs fixed point multiplication on two integers with overflowa: int
: First integer termb: int
: Second integer termn_bits: int
: Number of bits in binary representation (must be >=1)
divider_model(a: int, b: int, n_bits: int = 4) -> int
: Performs fixed point division on two integers with overflowa: int
: First integer termb: int
: Second integer termn_bits: int
: Number of bits in binary representation (must be >=1)
python3 equation_to_module.py <filename>.json
<filename>.json example:
{
"name": "name",
"equation": "((x+y)*z)",
"variables": ["x", "y", "z"],
"out_variable": "out",
"integer_bits": 16,
"fractional_bits": 16,
"lower_bound": -4,
"upper_bound": 4,
"tolerance": 1
}
name: String
: Name of moduleequation: String
: Equation to translate to hardware (see operations)variables: Array[String]
: Variables within equation (e
is a reserved variable)out_variable: String
: What to name output registerinteger_bits: Integer
: Number of integer bits in fixed point representationfractional_bits: Integer
: Number of fractional bits in fixed point representationlower_bound: Float
: Lower bound of numbers to test in simulationupper_bound: Float
: Upper bound of numbers to test in simulationtolerance: Float
: Maximum error allowed within test
All operations must be enclosed by parentheses, x
and y
can either be variables specified in the .json
file, constants within the specified integer_bits
and fractional_bits
minimum and maximum, or the result of other nested operations
- Addition/Subtraction :
(x+y)
- Negation :
(-1*x)
or(x*-1)
- Multiplication :
(x*y)
- Division :
(x/y)
- Exponentation :
(e^x)
- To use a linear piecewise approximation of
$e^x$ , specify$m_1$ ,$m_2$ ,$b_1$ ,$b_2$ , andsplit
in arguments
- To use a linear piecewise approximation of
- Absolute value:
(abs|x)
- (todo) Power:
(x^n)
wheren
must be a constant
- todo
- Should take in regular plain text or Latex equation and convert it to one parseable by the CLI tool
module calc_dv #( parameter N=32, parameter Q=16 ) ( input [N-1:0] v, input [N-1:0] w, input [N-1:0] i, input [N-1:0] step, output [N-1:0] out )
-
[N-1:0] v
: Current voltage -
[N-1:0] w
: Current adaptive value -
[N-1:0] i
: Input voltage -
[N-1:0] step
: Timestep value divided by${\tau}_{m}$ -
[N-1:0] out
: Calculated change in voltage
module calc_dw #( parameter N=32, parameter Q=16 )( input [N-1:0] a, input [N-1:0] b, input [N-1:0] v, input [N-1:0] w, input [N-1:0] step, output [N-1:0] out )
-
[N-1:0] a
: Alpha value -
[N-1:0] b
: Beta value -
[N-1:0] v
: Current voltage -
[N-1:0] w
: Current adaptive value -
[N-1:0] i
: Input voltage -
[N-1:0] step
: Timestep value divided by${\tau}_{m}$ -
[N-1:0] out
: Calculated change in adaptive value
module izhikevich_core #( parameter N=32, parameter Q=16 )( input clk, input [N-1:0] i, input [N-1:0] v_init, input [N-1:0] w_init, input [N-1:0] v_th, input [N-1:0] step, input [N-1:0] a, input [N-1:0] b, input [N-1:0] c, input [N-1:0] d, input apply, input rst, output reg [N-1:0] voltage, output reg [N-1:0] w )
Given Izhikevich neuron parameters, does iterations on the neuron at each clock cycle if apply
signal is on
-
clk
: Clock signal -
[N-1:0] i
: Input voltage -
[N-1:0] v_init
: Initial voltage value -
[N-1:0] w_init
: Initial adaptive value -
[N-1:0] v_th
: Voltage reset threshold -
[N-1:0] step
: Timestep value divided by${\tau}_{m}$ -
[N-1:0] a
: Alpha value -
[N-1:0] b
: Beta value -
[N-1:0] c
: C value -
[N-1:0] d
: D value -
apply
: Whether to change voltage and adaptive value this clock cycle -
rst
: Active high reset back to initial voltage and adaptive values -
[N-1:0] voltage
: Current voltage -
[N-1:0] w
: Current adaptive value
- todo
- Needs to calculate each gate current
- Needs to update each gate state
- Needs to add gate currents together
- Needs to encapsulate ligand gated channel currents
- todo
- Needs getting addresses and data
- Needs editing addresses and data
- todo
- Needs to generate neurons in a grid
- Implementation must allow for Verilog to automatically generate different grid sizes on build
- Calculate neuron inputs based off voltages and connected inputs
- How neurons are connected stored in RAM
- Apply voltage and adaptive value changes on clock cycle and apply signal
- Needs to be adaptable to allow an input layer and feedforward structure
- Potentially needs to be able to allow certain constants to change depending on neurotransmitter
module spi_peripheral ( input rst, input ss, input mosi, output reg miso, input sck, output reg done_rx, output reg done_tx, input [7:0] din, output reg [7:0] dout )
Peripheral for an SPI interface that writes and recieves one byte at a time and depends on the sck
clock signal from a controller
rst
: Active high resetss
: Select signalmosi
: Bit from controller to peripheralmiso
: Bit from peripheral to controllersck
: Controller clock signaldone_rx
: Whether byte is recieveddone_tx
: Whether byte is transmitteddin
: Byte to senddout
: Byte received
- todo
- Needs to transmit bytes
- Needs to receive bytes
- Needs to transmit and recieve to multiple peripherals
- todo
- Needs to interface a CPU via AXI (test with Raspberry Pi Pico/Zero)
- Needs to transmit data
- Needs to recieve data
- todo
- Either needs to transmit and recieve ethernet data from CPU through AXI or directly interface it with the FPGA
- todo
- Needs to display voltages of lattice as iterations progress