Run machines directly from backing storage #281

edubart · 2024-10-10T22:55:41Z

This PR is still WIP and in research, it's my take on #277 .

This PR completely reworks how memory mapping is done. It also introduces many routines for handling file operations that depends on the host OS and filesystem.

Progress

Throughput benchmark

benchmark	ext4 (before)	ext4	btrfs	xfs
load config + memory storage + discard	436.18	3407.30	3512.25	3481.82
load config + memory storage + store	4.82	16.05	14.73	17.83
load config + backing storage + discard	N/A	11.44	577.46	1155.42
load config + backing storage + store	N/A	12.49	535.93	1038.09
load snapshot + memory storage + discard	99.10	13668.00	14463.61	14639.84
load snapshot + memory storage + store	4.66	13.76	12.35	14.74
load snapshot + backing storage + discard	N/A	10.48	866.25	1812.44
load snapshot + backing storage + store	N/A	11.39	885.68	1467.20
roll snapshot + memory storage + discard	223289.05	130662.59	211550.67	131551.25
roll snapshot + memory storage + store	5.26	19.44	15.86	20.14
roll snapshot + backing storage + discard	N/A	478350.80	199130.24	328092.06
roll snapshot + backing storage + store	N/A	17.24	4476.19	2311.65

Benchmark notes:

The numbers are number of operations/second, every operation advances 1000 machine cycles.
The used machine has 512MB of root flash drive, 64MB on RAM and its total snapshot size is 583MB.
The used machine is just booting the Linux and doing nothing.
Root hash computation was disabled for load/store.
"backing storage" means the machine runs on a memory map of a shared file.
"memory storage" means the machine runs on a private memory map.
"load config" means it creates a machine from scratch from a new config.
"load snapshot" means it creates a machine from previously stored snapshot.
"roll snapshot" means it rolls forward the machine state from a snapshot.
"discard" means the machine is discarded after advancing machine state.
"store" means a machine snapshot is stored to a new directory after every advance.
"ext (before") is before this PR.
"ext4" is not a copy-on-write filesystem, so it performs slow file copy.
"btrfs" and "xfs" are copy-on-write-filesystem, so it performs fast copy with reflink.

Special benchmarks:

"roll snapshot + memory storage + store" is the strategy rollups advancer node were to use if had to save a machine snapshot to disk after every advance to be used externally.
"roll snapshot + backing storage + store" is the strategy rollups advancer node could use to make its entire state available externally (e.g, give it to a reader node).
"load snapshot + memory storage + discard" is the strategy reader node could use to serve inspects from a machine state available in the disk.

Disk usage benchmark

benchmark	ext4 (before)	ext4	btrfs	xfs
load config + memory storage + store	583.03 MB	123.64 MB	123.90 MB	123.65 MB
load config + backing storage + store		124.64 MB	1.14 MB	1.38 MB
load snapshot + memory storage + store	583.03 MB	123.64 MB	123.89 MB	123.65 MB
load snapshot + backing storage + store		123.64 MB	0.05 MB	0.16 MB
roll snapshot + memory storage + store	583.03 MB	123.65 MB	123.91 MB	123.65 MB
roll snapshot + backing storage + store		123.65 MB	0.05 MB	0.05 MB

Notes:

The numbers are amount of MB used in the filesystem per machine snapshot stored.
"btrfs" and "xfs" filesystem uses less disk usage on "backing storage" because it uses copy reflink.
"ext4" before this PR uses more disk space because it doesn't take advantage of sparse files.

Larger machines

The following is a benchmark for a machine with 256GB/1TB of RAM running on a host filesystem with just 8GB of space.

benchmark (BTRFS - 256GB of RAM)	ops/sec	MB/snapshot
load config + backing storage + discard	517.80
load config + backing storage + store	489.64	1.14 MB
load snapshot + backing storage + discard	790.93
load snapshot + backing storage + store	790.24	0.05 MB
roll snapshot + backing storage + discard	274723.31
roll snapshot + backing storage + store	4799.38	0.05 MB

benchmark (BTRFS - 1TB of RAM)	ops/sec	MB/snapshot
load config + backing storage + discard	204.35
load config + backing storage + store	193.15	1.14 MB
load snapshot + backing storage + discard	234.92
load snapshot + backing storage + store	222.32	0.05 MB
roll snapshot + backing storage + discard	275655.38
roll snapshot + backing storage + store	4737.27	0.05 MB

edubart added the enhancement New feature or request label Oct 10, 2024

edubart added this to the v0.20.0 milestone Oct 10, 2024

edubart self-assigned this Oct 10, 2024

edubart changed the base branch from main to refactor/c-api October 10, 2024 22:55

edubart marked this pull request as draft October 10, 2024 22:59

edubart mentioned this pull request Oct 10, 2024

Add support for snapshot/rollback in local machines #237

Closed

7 tasks

edubart force-pushed the feature/backing-storage branch from a99ed4d to d03411e Compare October 11, 2024 18:22

feat: add backing storage runtime option for running machines from disk

8619295

edubart force-pushed the feature/backing-storage branch from d03411e to 8619295 Compare October 13, 2024 19:52

Base automatically changed from refactor/c-api to main December 10, 2024 16:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run machines directly from backing storage #281

Run machines directly from backing storage #281

edubart commented Oct 10, 2024 •

edited

Loading

Run machines directly from backing storage #281

Are you sure you want to change the base?

Run machines directly from backing storage #281

Conversation

edubart commented Oct 10, 2024 • edited Loading

Progress

Throughput benchmark

Disk usage benchmark

Larger machines

edubart commented Oct 10, 2024 •

edited

Loading