Nebula: The Cloud-Native Microarchitecture Simulation Framework!

Nebula is a framework for developing cycle-accurate microprocessor simulators comprised of multiple, independent processes that communicate over a network.

The framework includes Python libraries for decoding and executing statically linked RISC-V binaries, as well as several sample pipeline implementations built using the framework.

Quick Start

The following instructions should help you to get a simulation up and running with minimal effort. For more detailed analysis of Nebula, how it works, its sample pipeline implementations, software architecture, design philosophy, using it for running multicore simulations, running distributed simulations, etc., refer to the Documentation below.

Step 1. Clone the git repo:

mkdir -p ${HOME}/src
cd ${HOME}/src
git clone https://github.com/jhaskinsjr/nebula.git

Step 2. Install pyelftools:

pip3 install pyelftools

Step 3. Set up passwordless SSH access to your host machine. A guide for doing this can be found at

https://www.ibm.com/support/pages/configuring-ssh-login-without-password

Step 4. Create a folder for the execution artifacts:

mkdir -p /tmp/pompia/sieve

Step 5. Enter the directory of one of the sample pipelines; for this example, we will use the Pompia pipeline:

cd ${HOME}/src/nebula/pipelines/pompia

Step 6. Execute:

python3 ../../launcher.py \
    --log /tmp/pompia/sieve \
    --service ../../toolbox/stats.py:localhost:22:-1:-1 ../../components/simplemainmem/mainmem.py:localhost:22:-1:-1 \
    --config stats:output_filename:/tmp/pompia/sieve/stats.json \
    mainmem:filename:/tmp/pompia/sieve/mainmem.raw \
    mainmem:capacity:$(( 2**32 )) \
    --max_instructions $(( 10**5 )) \
    -- \
    12345 \
    localhost.nebula \
    ../../examples/bin/sieve 10

The "python3" command executes the Nebula launcher (launcher.py), which begins by opening a socket and accepting connections on port 12345, executing the script localhost.nebula, and loading the binary (${HOME}/src/nebula/examples/bin/sieve) together with its command-line parameter "10", into the simulated main memory. With this foundation established, the simulator will execute a maximum of 100,000 simulated instructions.

Step 7. Examine the output:

The output emitted to the console from the simulator should be all the prime integers less than or equal to 10; to wit:

2 3 5 7

For a more in-depth analysis of the actions taken by the simulator, each module emits its own log file:

/tmp/pompia/sieve/launcher.py.log: detailed information about the operation of the Pompia pipeline
/tmp/pompia/sieve/mainmem.py.log: loading the binary and placing the command line arguments
/tmp/pompia/sieve/stats.py.log: module configuration and the final JSON object
/tmp/pompia/sieve/0000_regfile.py.log: initial- and final states of the register file on core 0
/tmp/pompia/sieve/0000_alu.py.log: instructions that executed on core 0 (irrespective of whether they ultimately retire)
/tmp/pompia/sieve/0000_commit.py.log: all instructions that retired on core 0
/tmp/pompia/sieve/stats.json: counts of key events that occurred during the simulation for each core

These log files assist with debugging and gaining deeper insights about the operation of the simulator and the pipeline designs that the simulator models. Some modules' log files (e.g., 0000_fetch.py.log, 0000_decode.py.log) will be empty. This does not signify a malfunction; the module just did not report on any events, but could be modified to do so.

Documentation

There is much more to Nebula than what is covered in the Quick Start guide (supra). Consider, for instance, that the Nebula framework uses multiple, loosely coupled, independent processes to effect cycle-accurate simulations; see: Software Architecture. Consider further that these independent processes need not all execute on the same node; see: Distributed Simulation. Consider further still that the Nebula framework also includes tools to facilitate large-scale studies via MongoDB, Pandas, and Jupyter Notebooks ; see: Large-Scale Studies. Information about these topics and much more are covered in the links below.

Software Architecture
Sample Pipelines
JSON Output
Watchdog Tool
Snapshots
Simulator Scripts
Distributed Simulation
Multicore Simulation
Sample Binaries
Large-Scale Studies
Simulation Speed
Instruction Implementation Tests
Future Features

Recent Revision History

Git tag v2.0.1 (20241005):

remove ability of Etrog, Tangerine modules to run standalone
remove superfluous pipelines/*/implementation/stats.py files
make executor.py verify all *.exec scripts are present before processing them

Git tag v2.0.0 (20240930):

tuned per-core microservices architecture (versus per-component); see: Software Architecture
new Tangerine pipeline model: faster (3x) reimplementation of Tangelo
decode:toolchain now used for Bergamot and Etrog
updated Simulation Speed documentation

Git tag v1.5.1 (20240921):

BUG FIX: 128e/localhost.nebula now correctly points to core.py (instead of simplecore.py)
new SimpleCore class
Etrog creates Core class that inherits from SimpleCore

Git tag v1.5.0 (20240919):

first-pass per-core microservices architecture; see: Software Architecture
new Etrog pipeline model: faster reimplementation of Bergamot

Git tag v1.4.0 (20240819):

BUG FIX: Tangelo now executes correctly when alu:forward_results is False
BUG FIX: MMU page reclamation
shared cache module (see: components/sharedcache/init.py) now handles multiple concurrent operations
significant performance improvement by deferred stat reporting

Git tag v1.3.1 (20240625):

Dhrystone benchmark
gzip benchmark

Git tag v1.3.0 (20240608):

new Tangelo pipeline model with V2P, TLB, shared caches
MMU service
cache block marking, fine-grained invalidation (see: components/simplecache/init.py)
shared caches (among multiple cores) (see: components/sharedcache/init.py)
arbitrarily deep cache hierarchy
heterogenous cores simulations (e.g.: chips/4t4b/localhost.nebula; 4 Tangelo cores + 4 Bergamot cores)
massively multicore simulations (e.g.: chips/128b/localhost.nebula; 128 Bergamot cores)

Lessons Learned

Here are some insights I have gained on the path from a blank vi session to a fully operable cycle-accurate simulator framework.

Prioritization is important, but judgement is vital. Make sure the most important tasks get done first, and, more importantly, make sure unimportant tasks don't get done.
Automate repetitive tasks. Failure to do so wastes time and increases the probability of errors.
Think globally. Before drilling down on how a task is to be completed, first list as many tasks as you can think of, then consider interdependencies between tasks, e.g., what must precede what else, what groups of tasks may be performed concurrently (and, if I'd had a team working with me, what tasks I should do, what tasks I should delegate).
Health is crucial. Exercise, rest, good food, and laughter make good engineers great, and great engineers superb.

Contact

I consult on the Nebula framework's use, development and deployment, and microprocessor simulation generally.

Email: john.haskins.jr@gmail.com | LinkedIn: https://www.linkedin.com/in/john-haskins-jr-925235a1/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Nebula: The Cloud-Native Microarchitecture Simulation Framework!

Quick Start

Documentation

Recent Revision History

Lessons Learned

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

Nebula: The Cloud-Native Microarchitecture Simulation Framework!

Quick Start

Documentation

Recent Revision History

Lessons Learned

Contact